

Amazon EMR and Dremio both provide robust solutions in the data processing space. However, Amazon EMR holds a notable advantage in cost efficiency and scalability for handling large-scale deployments, whereas Dremio offers a more user-friendly platform with strong data management capabilities.
Features: Amazon EMR is renowned for its scalability, enabling users to efficiently handle large-scale data workloads. It seamlessly integrates with various big data tools including Hive and Spark and offers a reliable managed environment with auto-scaling capabilities. Dremio shines with its Virtual DataSet (VDS) feature, which provides visualization capabilities without altering original data. It also offers comprehensive data lineage and provenance, making it ideal for compliance and governance tasks.
Room for Improvement: Amazon EMR users often encounter complexities in setup and face challenges with the learning curve and cost management. Improvement is needed in areas like stability, version management, and customer support. Dremio users cite the need for better connector support, enhanced handling of complex queries, and scalability improvements. There are also calls for a more streamlined interface and expanded documentation.
Ease of Deployment and Customer Service: Amazon EMR is predominantly utilized in public cloud settings, benefiting from AWS's extensive infrastructure, although experiences with support are mixed regarding response times and cross-platform integrations. Dremio provides deployment flexibility across cloud and on-premises setups, offering versatile options, though customer support experiences vary, with some users noting inconsistencies in service quality.
Pricing and ROI: Amazon EMR is deemed cost-effective, especially for extensive deployments, with a pay-as-you-go model that demands careful cost monitoring. Users often report a high ROI, particularly when transitioning from on-premise systems. Dremio, though competitively priced against solutions like Snowflake, sometimes faces criticism for higher licensing costs needed for scaling. Regardless, it is recognized for offering good savings and ROI, notably for users migrating from traditional systems.
Dremio surely saves time, reduces costs, and all those things because we don't have to worry so much about the infrastructure to make the different tools communicate.
We get all call support, screen sharing support, and immediate support, so there are no problems.
They help with billing, cost determination, IAM properties, security compliance, and deployment and migration activities.
I would rate the technical support from Amazon as ten out of ten.
We have had to reach out for customer support many times, and they respond, so they are pretty supportive about some long-term issues.
Scalability can be provisioned using the auto-scaling feature, EC2 instances, on-demand instances, and storage locations like block storage, S3, or file storage.
Dremio's scalability can handle growing data and user demands easily.
Internally, if it's on Docker or Kubernetes, scalability will be built into the system.
Regular updates, patch installations, monitoring, logging, alerting, and disaster recovery activities are crucial for maintaining stability.
I rate Dremio a nine in terms of stability.
The cost factor differs significantly. When you run Spark application on EKS, you run at the pod level, so you can control the compute cost. But in Amazon EMR, when you have to run one application, you have to launch the entire EC2.
There is room for improvement with respect to retries, handling the volume of data on S3 buckets, cluster provisioning, scaling, termination, security, and integration between services like S3, Glue, Lake Formation, and DynamoDB.
I have thoughts on what would be great to see in the product, such as AI/ML features or additional options.
Starburst comes with around 50 connectors now.
It should be easier to get Arctic or an open-source version of Arctic onto the software version so that development teams can experiment with it.
I see that many times the new versions of Dremio have not fixed old bugs, and in some new versions, old problems that were previously fixed come back again, so I think the upgrade part could use improvement.
Cost optimization can be achieved through instance usage, cluster sharing, and auto-scaling.
I would rate the price for Amazon EMR, where one is high and ten is low, as a good one.
Amazon EMR helps in scalability, real-time and batch processing of data, handling efficient data sources, and managing data lakes, data stores, and data marts on file systems and in S3 buckets.
Amazon EMR provides out-of-the-box solutions with Spark and Hive.
The features at Amazon EMR that I have found most valuable are fully customizable functions.
Having everything under one system and an easier-to-work-with interface, along with having API integrations, adds significant value to working with Dremio.
Dremio has positively impacted my organization as nowadays we are connected to multiple databases from multiple environments, multiple APIs, and applications, and Dremio organizes everything in an amazing way for me.
You just get the source, connect the data, get visualization, get connected, and do whatever you want.
| Product | Market Share (%) |
|---|---|
| Dremio | 7.1% |
| Amazon EMR | 3.4% |
| Other | 89.5% |

| Company Size | Count |
|---|---|
| Small Business | 6 |
| Midsize Enterprise | 5 |
| Large Enterprise | 12 |
| Company Size | Count |
|---|---|
| Small Business | 1 |
| Midsize Enterprise | 5 |
| Large Enterprise | 5 |
Dremio offers a comprehensive platform for data warehousing and data engineering, integrating seamlessly with data storage systems like Amazon S3 and Azure. Its main features include scalability, query federation, and data reflection.
Dremio's core strength lies in its ability to function as a robust data lake query engine and data warehousing solution. It facilitates the creation of complex queries with ease, thanks to its support for Apache Airflow and query federation across endpoints. Despite challenges with Delta connector support, complex query execution, and expensive licensing, users find it valuable for managing ad-hoc queries and financial data analytics. The platform aids in SQL table management and BI traffic visualization while reducing storage costs and resolving storage conflicts typical in traditional data warehouses.
What are Dremio's most valuable features?Dremio is primarily implemented in industries requiring extensive data engineering and analytics, including finance and technology. Companies use it for constructing data frameworks, efficiently processing financial analytics, and visualizing BI traffic. It acts as a viable alternative to AWS Glue and Apache Hive, integrating seamlessly with multiple databases, including Oracle and MySQL, offering robust solutions for data-driven strategies. Despite some challenges, its ability to reduce data storage costs and manage complex queries makes it a favorable choice among enterprise users.
We monitor all Cloud Data Warehouse reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.