On this page

    Dremio - The Data Lake Engine

    Dremio's Data Lake Engine delivers lightning fast query speed and a self-service semantic layer operating directly against your data lake storage.

    • Lightning-fast queries
    • Self-service semantic layer
    • Flexibility and open source technology
    • Powerful JOIN capability

    Lightning-Fast Queries

    These queries operate directly on data lake storage; connect to S3, ADLS, Hadoop, or wherever your data is.

    Dremio technologies like Data Reflections, Columnar Cloud Cache (C3) and Predictive Pipelining work alongside Apache Arrow to make queries on your data lake storage very, very fast.

    Lightning Fast Queries

    Accelerate reads with Predictive Pipelining and Columnar Cloud Cache

    Dremio’s Predictive Pipelining technology fetches data just before the execution engine needs it, dramatically reducing the time the engine spends waiting for data. And our real-time Columnar Cloud Cache (C3) automatically caches data on local NVMe as it’s being accessed, enabling NVMe-level performance on your data lake storage.

    A modern execution engine, built for the cloud

    Dremio’s execution engine is built on Apache Arrow, the standard for columnar, in-memory analytics, and leverages Gandiva to compile queries to vectorized code that’s optimized for modern CPUs. A single Dremio cluster can scale elastically to meet any data volume or workload, and you can even have multiple clusters with automatic query routing.

    Data Reflections – the ON switch for extreme speed

    With a few clicks, Dremio lets you create a Data Reflection, a physically optimized data structure that can accelerate a variety of query patterns. Create as many or as few as you want; Dremio invisibly and automatically incorporates Reflections in query plans and keeps their data up to date.

    Arrow Flight moves data 1,000x faster

    ODBC and JDBC were designed in the 1990s for small data, requiring all records to be serialized and deserialized. Arrow Flight replaces them with a high-speed, distributed protocol designed to handle big data, providing a 1,000x increase in throughput between client applications and Dremio. You can now populate a client-side Python or R data frame with millions of records in seconds.

    Self-Service Semantic Layer

    An abstraction layer enables IT to apply security and business meaning, while enabling analysts and data scientists to explore data and derive new virtual datasets.

    Dremio technologies like Data Reflections, Columnar Cloud Cache (C3) and Predictive Pipelining work alongside Apache Arrow to make queries on your data lake storage very, very fast.

    Semantic Layer

    A semantic layer generated by your users

    Dremio’s semantic layer is an integrated, searchable catalog that indexes all of your metadata, so business users can easily make sense of your data. Virtual datasets and spaces make up the semantic layer, and are all indexed and searchable.

    Data curation, without copies

    By managing data curation in a virtual context, Dremio makes it fast, easy, and cost effective to filter, transform, join, and aggregate data from one or more sources. And virtual datasets are defined with standard SQL, so you can take advantage of your existing skills and tools.

    Use your existing BI and data science tools

    Dremio appears just like a relational database, and exposes ODBC, JDBC, REST and Arrow Flight interfaces. So you can connect any BI or data science tool – Tableau, Power BI, Looker and Jupyter Notebooks to name a few.

    Fine-grained access control

    Dremio provides row and column-level permissions, and lets you mask sensitive data. Role-based access control makes sure that everyone has access to exactly what they need, and SSO enables a seamless authentication experience.

    Data lineage

    The relationships between your data sources, virtual datasets, and all your queries are maintained in Dremio’s data graph, telling you exactly where each dataset came from.

    Flexible and Open

    Dremio works directly with your data lake storage. You don’t have to send your data to Dremio, or have it stored in proprietary formats that lock you in. Dremio is built on open source technologies such as Apache Arrow, and can run in any cloud or data center.

    Avoid vendor lock-in, query across clouds, and keep your data in storage that you control.

    Flexible and Open

    No vendor lock-in

    Dremio works directly with your data lake storage so that you don’t have to load the data into a proprietary data warehouse and deal with skyrocketing costs. Your data stays in its existing systems and formats so that you can always use any technology to access it without using Dremio.

    Multi-cloud and hybrid cloud

    Run Dremio on AWS, Azure, or on-premise. You can even query data across disparate regions or clouds. And the abstraction provided by Dremio’s semantic layer enables you to migrate data from one location to another, without impacting your analysts or data scientists.

    Best-of-breed technology

    With Dremio, your data can stay in data lake storage that you control. You can use Dremio alongside hundreds of other technologies that also work with data lake storage, including ETL services, data science tools and compute engines.

    Apache Arrow inside

    Apache Arrow, which was originally Dremio’s internal memory format, has become the industry standard for in-memory columnar analytics with millions of monthly downloads. Arrow-enabled applications realize a dramatic increase in processing and data transport speeds. The Gandiva kernel, developed at Dremio, provides 80x speedups on top of Arrow’s other innovations, and Arrow Flight provides a modern, industry-standard way to share data across distributed systems and data science tools.

    Join with Anything

    Powerful joining abilities mean that your data is always accessible without ETL. Dremio ships with over a dozen connectors, and Dremio Hub includes many other community-developed connectors.

    While a lot of your data may already be in data lake storage, you probably have data in other places too. Dremio makes it easy to join your data lake storage with all the other places you’re keeping your data, without ETL.

    Join with Anything

    Connect to any database

    With many built-in and community-developed connectors, Dremio can easily and securely connect to your existing databases, and even join that data to your data lake storage or other places your data is stored.

    Powerful query pushdowns

    Dremio has the most powerful query pushdowns in the industry, featuring the Advanced Relational Pushdown (ARP) engine. Dremio has a deep understanding of the database’s capabilities and query language, enabling partial and complete pushdowns for even the most complex query plans.

    Dremio Hub

    In addition to the native connectors built into Dremio, Dremio Hub provides a marketplace of community-provided connectors to download, making it easy to join your Data Lake storage with all other places you keep your data, without ETL.

    Dremio Connector SDK

    Want to connect to a source we don’t support yet? Connectors can be built to any data source with a JDBC driver and are template-based, making it simple and easy to define new connectors without complex coding.