Dremio is a platform that enables you to perform high-performance queries from a data lake. It helps you manage that data in a sophisticated way. The use cases are broad, but it allows you to make extremely good use of data in a data lake. It gives you data warehouse capabilities with data lake data.
Founder/ CEO at Morphogen Data Inc.
It enables you to manage changes more effectively than any other platform.
Pros and Cons
- "Dremio enables you to manage changes more effectively than any other data warehouse platform. There are two things that come into play. One is data lineage. If you are looking at data in Dremio, you may want to know the source and what happened to it along the way or how it may have been transformed in the data pipeline to get to the point where you're consuming it."
- "They have an automated tool for building SQL queries, so you don't need to know SQL. That interface works, but it could be more efficient in terms of the SQL generated from those things. It's going through some growing pains. There is so much value in tools like these for people with no SQL experience. Over time, Dermio will make these capabilities more accessible to users who aren't database people."
What is our primary use case?
What is most valuable?
Dremio enables you to manage changes more effectively than any other data warehouse platform. There are two things that come into play. One is data lineage. If you are looking at data in Dremio, you may want to know the source and what happened to it along the way or how it may have been transformed in the data pipeline to get to the point where you're consuming it.
There's another thing called data providence. They're tied together. Data providence allows you to go back and recreate the data at any particular point in time. It's extremely important for compliance and governance issues because data changes all time. How did it change? What was it three days or months ago? You may have made some decisions based on data that was three months old, so you might need to revisit those.
It's essential for things like machine learning and deep learning, where you are generating AI models off data. When the model stops working or doesn't work as expected, you need to figure out why. You have to go back and adjust the datasets used to train the model. We do that through an open-source project called Nessie, which is their basis for providing data lineage and data province capabilities. It's super powerful.
Arrow is another open-source project for storing data in memory and performing data query operations. Data sits on a disk in one format. If you want to do anything with data, you have to load it into your computer and put it into memory so you can work with it. Arrow provides a format in memory that enables the whole library to perform various operations on that data. Every vendor has its own way of representing data in memory.
They've latched onto an industry standard and developed it so it's open. Now people can use the exact same format in memory to do operations and use the library set to perform functions on data. New developers can decide if they want to develop their own memory format or use one that's already there.
Data transfer is a massive problem when you're working with large datasets, doing advanced analytics, and trying to train machine learning or deep learning models. What happens often is companies downsample their data sets to do training on models because transferring and managing data on a deep learning or machine learning platform is too much.
What needs improvement?
They have an automated tool for building SQL queries, so you don't need to know SQL. That interface works, but it could be more efficient in terms of the SQL generated from those things. It's going through some growing pains. There is so much value in tools like these for people with no SQL experience. Over time, Dermio will make these capabilities more accessible to users who aren't database people.
What do I think about the stability of the solution?
The core functionality is rock solid in my experience.
Buyer's Guide
Dremio
May 2026
Learn what your peers think about Dremio. Get advice and tips from experienced pros sharing their opinions. Updated: May 2026.
894,830 professionals have used our research since 2012.
What do I think about the scalability of the solution?
Dremio is designed for massive scale-building.
How are customer service and support?
I rate Dremio eight out of 10. Dremio focuses on enterprise customers, so they have a solid support organization. They have professional services and full support. They have a great customer-centered team, but there are some challenges. For example, they don't have enough staff. As more customers buy Dremio, they need more integrators to help them get started. They can't do it all themselves.
How was the initial setup?
The initial setup is pretty straightforward. I've set it up in two different ways. The first was a three-node configuration. I used virtual machines to set up three different nodes and cluster data. That worked fairly well. I also created containerized versions using Kubernetes. It was much easier to do that. Dremio also has cloud versions where you don't need to do anything. You can go to the cloud and consume it.
What other advice do I have?
I rate Dermio 10 out of 10. Data lake 2.0 or 3.0 is gradually replacing data warehouses, but it's not quite there yet. We can't get rid of data warehouses, but it's heading in that direction. Dremio is a phenomenal place to start with that mindset shift and architecture shift. Despite the shortcomings, Dremio has made better progress in moving the industry forward than anybody else out there.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Senior Data Engineer at AppsFlyer
Stable self-service data tool used for service to service integration and to manage simple ad hoc queries
Pros and Cons
- "Dremio gives you the ability to create services which do not require additional resources and sterilization."
- "The native error interface has been most valuable."
- "Dremio doesn't support the Delta connector. Dremio writes the IT support for Delta, but the support isn't great. There is definitely room for improvement."
- "The community version lacks information or documentation."
What is our primary use case?
I have used this solution as an ETL tool to create data marks on data lakes for bridging. I have used it as a greater layer for ad-hoc queries and for some services which do not require sub-second latency to credit data from very big data lakes. I have also used it to manage simple ad-hoc queries similar to Athena, Presto or BigQuery.
We do not have a large number of people using this solution because it's mainly setup as a service to service integration. We integrated a big workload when we started using Dremio and this was very expensive. The migration is still in progress. As soon as this migration is finished, we plan to migrate ad-hoc queries from our analytical team.
What is most valuable?
The native error interface has been most valuable. With Apache Airflow, which Dremio supports, you don't need to digitalize data. Dremio gives you the ability to create services which do not require additional resources and serialization.
What needs improvement?
Dremio doesn't support the Delta connector. Dremio writes the IT support for Delta, but the support isn't great. There is definitely room for improvement.
The community version lacks information or documentation. If you want to create a more dynamic scaling policy for your cluster, it's always a problem because you need an additional custom code to be sure that Dremio shuts down gracefully and does not get interrupted.
The support of the Delta I/O connector could be improved because the current support is very basic and the performance isn't great.
For how long have I used the solution?
I have used this solution for three years in a non-production environment and for nine months in a production environment.
What do I think about the stability of the solution?
This is a stable solution.
How are customer service and support?
I have used the community forum to get most of my answers for Dremio. This is best suited for simple queries but does not work as well for advanced questions that need input from a developer.
How was the initial setup?
The complexity of the setup depends on the use case. If you are working with a cluster that you plan to shut down after a week, the setup is simple. For more complicated scenarios where you need to connect parquet table, it is more complicated and the documentation to support this setup is not comprehensive.
Experienced DevOps teams need to spend at least three to five days to understand the level of ability between multiple masters and how the scheduling works.
I tried to integrate my Dremio metrics and logs to DataDog and it wasn't simple.
What's my experience with pricing, setup cost, and licensing?
Right now the cluster costs approximately $200,000 per month and is based on the volume of data we have.
Which other solutions did I evaluate?
We considered Apache Drill, Apache Presto, Flink and Spark. The main reason we chose Dremio was cost and multitenancy.
What other advice do I have?
I would advise others not try to use Dremio as an ETL framework from day one. I also recommend do not use reflections, or any other tool which make dremio cluster statefull.
The installation for a production environment is complex and I would not recommend this solution for small businesses.
I would rate this solution a seven out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Buyer's Guide
Dremio
May 2026
Learn what your peers think about Dremio. Get advice and tips from experienced pros sharing their opinions. Updated: May 2026.
894,830 professionals have used our research since 2012.
Database Engineer at a tech services company with 201-500 employees
Beneficial memory competition, good support, and price well
Pros and Cons
- "The most valuable feature of Dremio is it can sit on top of any other data storage, such as Amazon S3, Azure Data Factory, SGFS, or Hive. The memory competition is good. If you are running any kind of materialized view, you'd be running in memory."
- "The most valuable feature of Dremio is it can sit on top of any other data storage, such as Amazon S3, Azure Data Factory, SGFS, or Hive."
- "Dremio takes a long time to execute large queries or the executing of correlated queries or nested queries. Additionally, the solution could improve if we could read data from the streaming pipelines or if it allowed us to create the ETL pipeline directly on top of it, similar to Snowflake."
- "Dremio takes a long time to execute large queries or the executing of correlated queries or nested queries."
What is our primary use case?
We were using Dremio as a data lake query engine tool. We were creating our PDSs and VDSs on top of our S3 buckets, and our data lake and the data-scientist teams were using the data for further processing. We didn't use it for any ETL jobs. We were using it as a data-lake tool.
What is most valuable?
The most valuable feature of Dremio is it can sit on top of any other data storage, such as Amazon S3, Azure Data Factory, SGFS, or Hive. The memory competition is good. If you are running any kind of materialized view, you'd be running in memory.
What needs improvement?
Dremio takes a long time to execute large queries or the executing of correlated queries or nested queries. Additionally, the solution could improve if we could read data from the streaming pipelines or if it allowed us to create the ETL pipeline directly on top of it, similar to Snowflake.
For how long have I used the solution?
I have been using Dremio for approximately two years.
What do I think about the stability of the solution?
Dremio is highly stable.
What do I think about the scalability of the solution?
Dremio is scalable which is a benefit of the solution. You can scale up to the number of instances you want in case you are feeling the load, and in case you feel your query is running low or you are receiving extra traffic.
You can set the configuration while installing and while setting it up in the Dremio. In the configuration file, you can set up a lot of settings, such as what time.
We have approximately 18 people using the solution in my organization.
How are customer service and support?
We have been in touch with the support from Dremio when we had some internal issues. This happened approximately two times. The support is good.
Which solution did I use previously and why did I switch?
This is the first tool in this category that I have used.
How was the initial setup?
Dremio's initial setup took one or two days, one day is sufficient and typical.
What about the implementation team?
There was one DevOps person used for the deployment and maintenance of the solution.
What's my experience with pricing, setup cost, and licensing?
Dremio is less costly competitively to Snowflake or any other tool.
What other advice do I have?
My advice to others is if they are creating a data lake for a customer, Dremio would be useful for a data engineering team. If they're willing to create a data lake and they wanted to use it with the cloud-agnostic tool, then it is a good choice. If this solution meets their requirement they should try it out.
I rate Dremio an eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Buyer's Guide
Download our free Dremio Report and get advice and tips from experienced pros
sharing their opinions.
Updated: May 2026
Popular Comparisons
Databricks
Teradata
Snowflake
SAP Business Data Cloud
KNIME Business Hub
IBM SPSS Statistics
Alteryx
Dataiku
Amazon SageMaker
OpenText Analytics Database (Vertica)
Altair RapidMiner
Amazon Redshift
Microsoft Azure Synapse Analytics
IBM Watson Studio
IBM SPSS Modeler
Buyer's Guide
Download our free Dremio Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which are the best end-to-end data science platforms?
- What enterprise data analytics platform has the most powerful data visualization capabilities?
- What Data Science Platform is best suited to a large-scale enterprise?
- When evaluating Data Science Platforms, what aspect do you think is the most important to look for?
- How can ML platforms be used to improve business processes?
- Why is Data Science Platforms important for companies?















