Amazon EMR Reviews

Name: Amazon EMR
Brand: Amazon Web Services (AWS)
Rating: 3.9 (25 reviews)

Vendor: Amazon Web Services (AWS)

3.9 out of 5

25 reviews
83% willing to recommend

Leave a review

What is Amazon EMR?

Amazon EMR simplifies big data processing by offering integration with popular tools. It's scalable and cost-efficient, enabling fast processing while managing infrastructure effortlessly. It's designed for users aiming to streamline data workflows and leverage its batch processing capabilities effectively.

Get the Amazon EMR Buyer's Guide and find out what your peers are saying about Amazon EMR, Databricks, Teradata and more!

Amazon EMR is the #3 ranked solution in top Hadoop solutions and #13 ranked solution in top Cloud Data Warehouse solutions. PeerSpot users give Amazon EMR an average rating of 7.8 out of 10. Amazon EMR is most commonly compared to Databricks: Amazon EMR vs Databricks. Amazon EMR is popular among the large enterprise segment, accounting for 51% of users researching this solution on PeerSpot. The top industry researching this solution are professionals from a financial services firm, accounting for 19% of all views.

Helped 900,228 peers since 2012

Featured Amazon EMR reviews

reviewer1343079

Senior Chief Engineer (Enterprise System Presales/Postsales) at a tech vendor with 10,001+ employees

I have used AWS Glue with S3 for making tables and databases, but regarding Amazon EMR, I do not remember much as we are currently using it very minimally. This is my observation: In EKS, we have had to deploy by ourselves because EKS does not provide the Hadoop framework, Spark, Hive, and everything, but we have completed all the deployment ourselves. Whereas Amazon EMR provides all these things. The cost factor differs significantly. When you run Spark application on EKS, you run at the pod level, so you can control the compute cost. But in Amazon EMR, when you have to run one application, you have to launch the entire EC2. In Qubole, the interface was very good. I could see many details because in Amazon EMR console, very few details are available. In Qubole, at one link, you can get all the details of what is happening, how the processes are running, and the cost decreased by using Qubole. I found Qubole more user-friendly and cost-effective. From the security point of view, we had to open some access rights to Qubole, which might be a drawback in comparison to Amazon EMR which is native to AWS.

Read full review

reviewer2043696

Senior Technical Engineer at a transportation company with 5,001-10,000 employees

The features at Amazon EMR that I have found most valuable are fully customizable functions. I am using Amazon EMR for data transformation, where we load the data. The entire ETL process is encompassed within it. The first thing is the egress part, where we pull the data from various sources, which include our main core databases, the systems that include Oracle, MySQL, SQL Server, and Postgres. We bring data from all these sources and use Amazon EMR specifically to combine all this data, clean the data, and combine it at a particular level. We bring everything to a particular level to make it more production-ready, and we simplify the data because every system has its own problems. We are using it to clean the data and transform the data in such a way that the end-user can get the insights faster.

Read full review

Mirza Mujtaba Baig

Lead AWS Data Engineer at Fission Labs

The cluster properties are the most recognized feature, and the integration with other tools in Big Data, Data Science, and AI areas is crucial. Amazon EMR helps in scalability, real-time and batch processing of data, handling efficient data sources, and managing data lakes, data stores, and data marts on file systems and in S3 buckets. It supports frameworks for handling structured and unstructured data.

Read full review

Amazon EMR mindshare

Product category:

As of June 2026, the mindshare of Amazon EMR in the Hadoop category stands at 9.9%, down from 13.8% compared to the previous year, according to calculations based on PeerSpot user engagement data.

Hadoop Mindshare Distribution
Product	Mindshare (%)
Amazon EMR	9.9%
Cloudera Distribution for Hadoop	14.7%
Apache Spark	13.9%
Other	61.5%

Hadoop

PeerResearch reports based on Amazon EMR reviews

Type	Title	Date
Category	Hadoop	Jun 22, 2026	Download
Product	Reviews, tips, and advice from real users	Jun 22, 2026	Download
Comparison	Amazon EMR vs Apache Spark	Jun 22, 2026	Download
Comparison	Amazon EMR vs Cloudera Distribution for Hadoop	Jun 22, 2026	Download
Comparison	Amazon EMR vs HPE Data Fabric	Jun 22, 2026	Download

Valuable Features

Amazon EMR offers scalability, auto-scaling capabilities, and cost-effectiveness. It integrates easily with Hadoop and HDFS, using Amazon EC2 and Amazon S3 for dynamic storage. The platform simplifies management tasks, supports data transformation with ETL processes, and provides robust security. Users appreciate the flexible cluster resizing, stability, affordability, and versatility in managing big data. With managed services and out-of-the-box functionality, EMR is accessible for diverse data tasks.

"It's made life very easy."
"One of the valuable features about this solution is that it's managed services, so it's pretty stable, and scalable as much as you wish."
"This tool is simple to use and it's really accessible to many dev teams."

Room for Improvement

Amazon EMR could improve its user interface and simplify the initial learning curve. Cost management and pricing control can be challenging with scaling. Cluster configuration is complicated, and startup times are slow. Web support and feature integration need enhancements. Some versions showed compatibility issues. Customers would benefit from improved monitoring, debugging, and stability. Automation for deployment, cluster sizing, and better integration with other AWS services could be advantageous.

"The cost of Amazon EMR is very high."
"The most complicated thing is configuring to the cluster and to ensure it's running correctly."
"In general, I would rate the solution at a four out of ten."

ROI

Various viewpoints indicate that Amazon EMR's return on investment fluctuates depending on vendors and usage. Responses emphasize high returns, especially for those transitioning from on-premise systems, with expectations of at least two-to-one returns. Some report noticeable savings of up to 20%. While specific figures are often not calculated, and some can't provide precise return details, feedback suggests that it generally results in substantial cost efficiencies and savings for companies.

Pricing

Enterprise buyers evaluating Amazon EMR find its pricing reliant on usage and integration, with costs mainly driven by infrastructure resources, particularly EC2 fees. Users appreciate the lack of additional licensing costs, but some note the service can be expensive compared to competitors. Cost control is possible with strategic resource management and careful monitoring, as unexpected charges can arise. Monthly expenditures can vary, with large enterprises spending significantly more annually based on their needs.

"I rate the tool's pricing a five out of ten. It can be expensive since it's a managed service, and if you are not careful, you can run into unexpected charges. You can make a mistake that costs you tens of thousands of dollars. That's happened to us twice, so I'm sensitive to it. We're still trying to work on that. Our smallest client probably spends a hundred thousand dollars yearly on licensing, while our largest is well over a million."
"The product is not cheap, but it is not expensive."
"Amazon EMR is not very expensive."

Popular Use Cases

Amazon EMR is utilized for running Spark scripts, building data lakes, and handling tech processing. It supports complex algorithms, data pipelines, large databases, and big data frameworks. Organizations migrate processes to the cloud, leveraging EMR for cloud-based management, serverless architecture, and data analysis. EMR is also used in industry-specific applications, data transformation, and integration with AWS services like SageMaker, Airflow, and S3.

Service and Support

Amazon EMR's customer service has mixed ratings, with some finding it great, while others find it inconsistent. Technical support generally receives positive feedback for being knowledgeable and responsive, though some users report slower response times when integrating with open-source solutions. Many users appreciate having access to responsive support, particularly when encountering urgent issues, and benefit from 24/7 availability and comprehensive assistance with technical inquiries, billing, and deployment activities.

Deployment

Amazon EMR's initial setup is mostly perceived as straightforward, especially for users familiar with the tool and big data concepts. Some users find it complex due to configuration challenges and issues with multi-factor authentication. Deployment can be done quickly via AWS Console or Terraform. Initial setup duration varies, ranging from 30 minutes to several weeks, influenced by documentation use, team expertise, and specific project requirements.

Scalability

Amazon EMR is considered scalable by several organizations, with capabilities to tailor clusters using memory, storage, or GPU needs. Companies expand usage gradually, leveraging auto-scaling, EC2, and on-demand instances. Managed services ensure extensive scaling, although occasionally, resource allocation may lag. It supports enterprises and vast user bases, processing substantial data volumes. Despite minor performance issues, it suits data analytics and customization for specific cases, enabling seamless scaling for diverse organizational requirements.

Stability

Users find Amazon EMR stable, with high availability and reliability. Some recommend monitoring configurations as data changes, but issues are minimal. Users rate its steadiness highly, noting its support from availability zones, failover capabilities, and fault tolerance. Regular updates, monitoring, logging, and disaster recovery contribute positively. Despite occasional minor issues, users express confidence in its performance, suggesting it meets expectations in most scenarios. Stability ratings are generally eight to nine out of ten.

These insights are based on the in-depth reviews provided by peers to help you make a better buying decision.

Download our Amazon EMR Buyer's Guide for additional reliable information.

Review data by company size

By reviewers
Company Size	Count
Small Business	6
Midsize Enterprise	4
Large Enterprise	10

By reviewers

By visitors reading reviews
Company Size	Count
Small Business	75
Midsize Enterprise	27
Large Enterprise	106

By visitors reading reviews

Top industries

By visitors reading reviews

Financial Services Firm

19%

Manufacturing Company

10%

Construction Company

Healthcare Company

Comms Service Provider

Computer Software Company

Performing Arts

Hospitality Company

Media Company

Retailer

University

Government

Outsourcing Company

Educational Organization

Marketing Services Firm

Real Estate/Law Firm

Insurance Company

Logistics Company

Energy/Utilities Company

Consumer Goods Company

Transportation Company

Wholesaler/Distributor

Compare Amazon EMR with alternative products

Learn more about Amazon EMR

Amazon EMR is a managed service that provides robust features for big data processing. It integrates seamlessly with S3, EC2, Hive, and Spark to facilitate sophisticated data transformation tasks and infrastructure management. It allows organizations to run data lakes, Spark, and Hadoop clusters effortlessly, offering flexibility with on-demand execution and extensive scalability. The platform is valued for its strong processing speed and comprehensive security features, making it ideal for complex data engineering projects. It supports both batch processing and real-time workflows, designed to eliminate hardware management while maintaining cost efficiency and stability.

What are the key features of Amazon EMR?

Cluster Management: Offers intuitive control and configuration of clusters
Integration: Seamlessly integrates with S3, EC2, Spark, and more
Scalability: Provides flexible scaling to meet data demands
Batch Processing: Allows efficient handling of large data sets
Cost Efficiency: Minimizes costs with managed service offerings

What benefits and ROI should be considered?

Processing Speed: Fast performance for data processing tasks
Security: Built-in features ensure data protection
Infrastructure Simplification: Eliminates hardware management needs
Flexibility: Adapts to changing data loads with ease
Affordability: Offers economic processing power

Amazon EMR is implemented by industries such as healthcare and tech processing for complex data tasks like building data lakes or financial data processing. It supports AI-driven analytics and data engineering projects, integrating with SageMaker for predictions and maintaining workflows in public health applications, allowing professionals in different fields to manage data pipelines, resource utilization, and job execution efficiently.

Amazon EMR was previously known as Amazon Elastic MapReduce.

Amazon EMR customers

Yelp

Product Categories

Hadoop

Cloud Data Warehouse

Popular Comparisons

Databricks vs Amazon EMR

Teradata vs Amazon EMR

Snowflake vs Amazon EMR

Azure Data Factory vs Amazon EMR

OpenText Analytics Database (Vertica) vs Amazon EMR

Apache Spark vs Amazon EMR

Dremio vs Amazon EMR

Amazon Redshift vs Amazon EMR

Cloudera Distribution for Hadoop vs Amazon EMR

Microsoft Azure Synapse Analytics vs Amazon EMR

IBM Netezza Performance Server vs Amazon EMR

BigQuery vs Amazon EMR

Oracle Autonomous Data Warehouse vs Amazon EMR

Snowflake Analytics vs Amazon EMR

IBM Spectrum Computing vs Amazon EMR

See all alternatives

Amazon EMR Reviews Summary
Author info	Rating	Review Summary
Senior Chief Engineer (Enterprise System Presales/Postsales) at a tech vendor with 10,001+ employees	3.0	I've used Amazon EMR mainly for Spark-based ETL, but we've shifted 95% to EKS due to better cost control. EMR is simple and integrates well but lacks detailed UI and is costlier compared to pod-level control on EKS.
Senior Technical Engineer at a transportation company with 5,001-10,000 employees	4.0	I use Amazon EMR for data processing and ETL, finding its customizable features valuable, though it could improve with AI/ML capabilities. It's stable, integrates well with AWS services, and support is excellent, but setup could be simpler.
Lead AWS Data Engineer at Fission Labs	5.0	I have used Amazon EMR for data engineering projects, focusing on job submission and resource management. Its cluster properties and integration with big data tools are valuable, though improvements are needed in retries, data handling, scalability, and integration.
Vice President -Product Management at Paytm	4.5	I use Amazon EMR to help American Express process and analyze data. Its multiple connectors and cost-effective processing are valuable, though Spark jobs are slower. The service integrates well with other storage solutions, potentially saving 20% in costs.
Director of Data Science at HealthWorks Analytics	3.5	I find Amazon EMR valuable for maintaining client workflows in the healthcare sector due to its security features as a managed service. It offers a high ROI but can be expensive if not carefully managed on AWS.
Data Governance Manager at VPBFC	4.5	We use Amazon EMR for data processing, reading data from S3 and writing it back there post-processing. Its affordable pricing and seamless integration with open-source platforms are valuable, though its stability could improve. We haven't considered alternate solutions.
AWS / Big Data Engineer at Waste Management, Inc.	4.0	I found Amazon EMR Serverless ideal for deploying online applications, as it simplifies resource management. While pricing could be improved, the solution offers better ROI and substantial savings. We migrated from a server-based setup to AWS for this service.
Senior Software Development Engineer at Yahoo!	3.0	I use EMR with Spark for large cloud jobs. While stable and scalable, its very slow start-up time is a significant problem for frequent, short tasks, impacting costs. Support is also slower than desired. I'd rate it a 6/10.
Hadoop Administrator at Capgemini	4.0	I find Amazon EMR stable and easy to set up for big data processing, praising its fault tolerance. However, resource scaling is slow, the UI needs improvement, and it's expensive. Support is responsive, and I ultimately rate it 8/10.
Lead Data Engineer at Seven Lakes Enterprises, Inc.	3.0	I use Amazon EMR to efficiently analyze data and process millions of records within hours. It offers diverse tools and support, simplifying development both on-prem and in the cloud. Improvement is needed in third-party integration and advance notification of module changes.

reviewer1343079

Senior Chief Engineer (Enterprise System Presales/Postsales) at a tech vendor with 10,001+ employees

Oct 6, 2025

Has simplified ETL workflows with on-demand processing but needs improved cost efficiency and visibility

What is our primary use case?

We are using Amazon EMR for Spark as our platform involves a lot of ETL. We are using Amazon EMR for that, but in the last year, we have moved to Kubernetes, AWS EKS, where we are running Spark. Our Amazon EMR usage has been reduced as we have moved to the EKS side.

We still have some of our pipelines running on Amazon EMR, but 95% of the pipelines are moved to the EKS.

What is most valuable?

Amazon EMR provides out-of-the-box functionality because we can deploy and get Spark functionality over Hadoop. Another feature is on-demand pipeline execution. Whenever Amazon EMR is required, we can provision it with no need to keep it open for long.

EC2 is itself given by Amazon EMR. Amazon EMR provisions worker nodes and master nodes. With other services, we have found it satisfactory for our use case. With S3, we were able to integrate efficiently.

Amazon EMR provides out-of-the-box solutions with Spark and Hive. Regarding customization, I am not from the data team, so I do not have much information about this, but we have not faced many problems with customization. We moved to EKS considering the cost factor because Amazon EMR provisions full EC2. When we use EKS, where apart from the pipelines our whole infrastructure exists, though it also gives us EC2, we can utilize at the pod level. The compute demand has decreased and it has reduced the cost significantly. Moving to EKS has helped us considerably.

Amazon EMR is simple.

What needs improvement?

I have used AWS Glue with S3 for making tables and databases, but regarding Amazon EMR, I do not remember much as we are currently using it very minimally.

This is my observation: In EKS, we have had to deploy by ourselves because EKS does not provide the Hadoop framework, Spark, Hive, and everything, but we have completed all the deployment ourselves. Whereas Amazon EMR provides all these things. The cost factor differs significantly. When you run Spark application on EKS, you run at the pod level, so you can control the compute cost. But in Amazon EMR, when you have to run one application, you have to launch the entire EC2.

In Qubole, the interface was very good. I could see many details because in Amazon EMR console, very few details are available. In Qubole, at one link, you can get all the details of what is happening, how the processes are running, and the cost decreased by using Qubole. I found Qubole more user-friendly and cost-effective. From the security point of view, we had to open some access rights to Qubole, which might be a drawback in comparison to Amazon EMR which is native to AWS.

For how long have I used the solution?

I have been working with Amazon EMR for the last four years, though not very extensively.

How are customer service and support?

We have enterprise tech support with very good technical assistance. We get all call support, screen sharing support, and immediate support, so there are no problems.

How would you rate customer service and support?

Neutral

What other advice do I have?

I am working on Amazon EMR but not extensively.

Basically, our work is data transformation. Our pipelines work on that exclusively. We have Spark applications, and earlier, we used Amazon EMR extensively which did all this ETL.

There is no problem in security because it is part of the AWS ecosystem. All the IAM and everything works properly. Security-wise, we have never faced any issues. Nobody has reported any inability to perform certain tasks in Amazon EMR.

I would rate Amazon EMR a 6 out of 10. The lower rating is due to potential difficulties for less technical users in accessing Spark commands and notebooks, and the cost factor.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)

reviewer2043696

Senior Technical Engineer at a transportation company with 5,001-10,000 employees

Dec 26, 2025

Data pipelines have simplified complex transformations and provide faster insights for users

What is our primary use case?

I use Amazon EMR primarily for data processing.

What is most valuable?

The features at Amazon EMR that I have found most valuable are fully customizable functions.

I am using Amazon EMR for data transformation, where we load the data. The entire ETL process is encompassed within it. The first thing is the egress part, where we pull the data from various sources, which include our main core databases, the systems that include Oracle, MySQL, SQL Server, and Postgres. We bring data from all these sources and use Amazon EMR specifically to combine all this data, clean the data, and combine it at a particular level. We bring everything to a particular level to make it more production-ready, and we simplify the data because every system has its own problems. We are using it to clean the data and transform the data in such a way that the end-user can get the insights faster.

What needs improvement?

I feel some lack of functionality in Amazon EMR. I have thoughts on what would be great to see in the product, such as AI/ML features or additional options.

For how long have I used the solution?

I have been working with Amazon EMR for five years.

What do I think about the stability of the solution?

I would rate the stability of Amazon EMR as eight out of ten.

What do I think about the scalability of the solution?

Customizable cluster configurations have been utilized in my organization.

How are customer service and support?

I would rate the technical support from Amazon as ten out of ten. They are always there to help.

How would you rate customer service and support?

Positive

How was the initial setup?

The setup process for Amazon EMR may have some issues and could be more simplified for some basic users.

What's my experience with pricing, setup cost, and licensing?

I would rate the price for Amazon EMR, where one is high and ten is low, as a good one.

Which other solutions did I evaluate?

I could compare Amazon EMR with a product from another vendor, but I would need more context.

What other advice do I have?

I find it easy to integrate Amazon EMR with other AWS services like S3 or EC2 for data processing needs. I would rate this review as eight out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)

Mirza Mujtaba Baig

Lead AWS Data Engineer at Fission Labs

Apr 8, 2025

Utilize data engineering projects efficiently with seamless resource management and real-time processing capabilities

What is our primary use case?

I have been involved in using Amazon EMR for data engineering projects, specifically for the submission of the jobs to the data clusters. I also focus on resource utilization, monitoring the job status, and managing memory resources.

What is most valuable?

What needs improvement?

There is room for improvement with respect to retries, handling the volume of data on S3 buckets, cluster provisioning, scaling, termination, security, and integration between services like S3, Glue, Lake Formation, and DynamoDB.

For how long have I used the solution?

I have used the solution for almost ten years.

What was my experience with deployment of the solution?

Deployment is mostly based on how the job or pipeline utilizes it. It can depend on the data sources and configurations in the JSON and XML files. Monitoring the management console, availability zones, IAM, permissions, roles, data encryption, and decryption are important for a smooth deployment.

What do I think about the stability of the solution?

Stability is supported by the availability zones, failover capabilities, and fault tolerance, which are really helpful. Regular updates, patch installations, monitoring, logging, alerting, and disaster recovery activities are crucial for maintaining stability.

What do I think about the scalability of the solution?

Scalability can be provisioned using the auto-scaling feature, EC2 instances, on-demand instances, and storage locations like block storage, S3, or file storage. Cluster size can be managed by adding nodes automatically or manually. It is important to monitor jobs, set up alerts, and have enough details to configure a proper cluster.

How are customer service and support?

The technical support is available 24/7. We have to submit tickets, and they are usually proactive in addressing issues. They help with billing, cost determination, IAM properties, security compliance, and deployment and migration activities.

How would you rate customer service and support?

Neutral

What's my experience with pricing, setup cost, and licensing?

Compared to others, Amazon seems efficient and is considered good for Big Data workloads. Costs are involved based on cluster resources, data volumes, EC2 instances, instance sizes, Kubernetes, Docker services, storage, and data transfers. Cost optimization can be achieved through instance usage, cluster sharing, and auto-scaling.

What other advice do I have?

For usage, setup, and planning are crucial. It involves configuring instances, auto-scaling, security groups, performance efficiency, labeling, storage, maintenance, disaster recovery, backups, patch installations, user permissions, roles, and logging. I rate Amazon EMR as ten out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)

Prashant Singh

Vice President -Product Management at Paytm

Nov 26, 2024

Seamless data integration enhances reporting efficiency and an easy setup

What is our primary use case?

I am a solution provider using Amazon EMR. Our main customer, American Express, pulls data from multiple sources, processes it, and provides analytics reporting. Amazon EMR supports AI-driven projects by allowing on-the-fly execution of algorithms to segregate data based on character properties.

What is most valuable?

Amazon EMR has multiple connectors that can connect to various data sources. The service charges are based on processing only, depending on the resources used, which can help save money. It is easy to integrate with other services for storage, allowing data to be shifted to cheaper storage based on usage.

What needs improvement?

Spark jobs take longer on Amazon EMR compared to previous experiences. This aspect could be improved to make them more efficient.

For how long have I used the solution?

I have been using Amazon EMR for five years.

What do I think about the stability of the solution?

Amazon EMR is quite stable now.

What do I think about the scalability of the solution?

Based on requirements, Amazon EMR scales easily and works well for enterprise clients.

How are customer service and support?

Customer support is quite helpful, with acceptable response times.

How would you rate customer service and support?

Positive

How was the initial setup?

The setup of Amazon EMR is straightforward, and I would rate it as eight out of ten for ease.

What was our ROI?

The tool is said to help save costs by up to 20%.

What's my experience with pricing, setup cost, and licensing?

The cost of Amazon EMR is a little bit expensive, especially considering the support package, which includes a gold package.

What other advice do I have?

Amazon EMR is easy to implement, offers features beneficial for scaling, and provides useful metrics.

I recommend using it, and my overall rating is nine out of ten. It offers better features that help in scaling and provides useful metrics through its console.

JeffCooper

Director of Data Science at HealthWorks Analytics

Jun 14, 2024

Manages workflows but can become expensive if the user is not careful

What is our primary use case?

We use the tool to maintain workflows on behalf of our customers. We use it in the medical field, particularly in public health and government healthcare. However, our primary role is maintaining and developing the solution for our clients rather than using it internally.

What is most valuable?

The security of the managed workflow and the managed services are the best features for us. Since we inherited their security model and it's all managed services, those are the key benefits for our clients.

What needs improvement?

The solution can become expensive if you are not careful.

For how long have I used the solution?

I have been using the product for three years.

What do I think about the stability of the solution?

I rate the solution's stability an eight point five out of ten.

What do I think about the scalability of the solution?

The tool is as scalable as you need it to be. We're not in the petabyte range yet, but we've scaled it up by over a hundred with a couple of customers. In terms of data, we're looking at hundreds of gigabytes to hundreds of terabytes. So, it's been highly scalable.

How was the initial setup?

Amazon EMR's deployment ease is five out of ten. I rate it so because of our lack of understanding and training. It was not difficult. The timeline depends on our client. We typically schedule about a six-week deployment. We've done it in as little as two weeks, but six weeks is probably the average for getting things up to speed.

You would need four to six resources to handle it. We have solution architects, security SMEs, and business process engineers. In the background, we also have developers. The main roles are solutions architect, security architect, and business process engineer.

We have a staff of about nine people, but that's spread across several clients. We probably have a couple of people working part-time for our smallest client. None of our staff are dedicated full-time to any single client. We maintain and develop the solutions with these nine staff members, occasionally calling in extra help as needed.

What was our ROI?

Based on customer feedback, the return on investment has been high, especially for those who have used on-premise systems previously. They're likely seeing an ROI of at least two to one on their investment, sometimes even more.

What's my experience with pricing, setup cost, and licensing?

I rate the tool's pricing a five out of ten. It can be expensive since it's a managed service, and if you are not careful, you can run into unexpected charges. You can make a mistake that costs you tens of thousands of dollars. That's happened to us twice, so I'm sensitive to it. We're still trying to work on that. Our smallest client probably spends a hundred thousand dollars yearly on licensing, while our largest is well over a million.

What other advice do I have?

I'm very impressed. It's certainly a contender. I don't keep up much with the current competition, but we've enjoyed the experience with Amazon, and they've been very supportive. So we get a lot of attention from them, which biases me toward a high recommendation for Amazon EMR. I rate it a seven out of ten.

We're focusing on AI integration now, especially on Amazon. It was complicated, but they're doing a great job, especially with Sagemaker. We've been building large learning models, and the most significant thing for us has been applying these models to proprietary data on Amazon. They're focusing on that aspect.

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)

Quan Vu

Data Governance Manager at VPBFC

Aug 22, 2023

Provides efficient data processing features and has good scalability

What is our primary use case?

We use Amazon EMR for data processing. We read the data from S3 and then write the processed data back to S3. It allows users to access the data through a web interface.

What is most valuable?

The product has affordable pricing. We can integrate it easily with multiple open-source platforms.

What needs improvement?

The product's stability could be even better.

What do I think about the stability of the solution?

The product’s stability is good. However, it could be improved a bit. I rate it nine out of ten.

What do I think about the scalability of the solution?

Amazon EMR can scale to a large cluster. In our case, there are 40 EC2 instances. We are processing a lot of tables with half a billion rows per day.

It is designed for end users and pivot users who need to detect data patterns and generate reports. EMR is not a fixed-scale service, but it can be customized to meet the needs of any specific use case.

How was the initial setup?

We need to have a data pipeline tool to ensure consistent data processing for the initial setup. We create a framework, read the code, and execute it in a data catalog. The size of the maintenance team depends on the project and the use cases. Usually, one backup team of four or five DevOps executives takes care of the backend and database.

We need to separate our environments into production and development. We use GitHub for source control, Jenkins for the deployment pipeline, and a standard CI/CD tool to deploy code changes into production.

We need to develop a deployment framework so developers only need to provide the code for their projects. The underlying engine then deploys the code, reads it, addresses the EMR filter, executes it, and completes the data processing.

What's my experience with pricing, setup cost, and licensing?

You only pay for the EC2 fees for Amazon EMR. The number of clusters you launch and the time you use them determine the fee. There is no additional license fee, as Amazon EMR is a fully managed service. You can use any open-source software that is compatible with it. There is no need to pay extra for third-party software.

What other advice do I have?

EMR is a good engine for specific use cases. It is mostly used for data processing, but it does have some real-time streaming capabilities. If you want to stream hundreds of millions of events daily from mobile apps and web apps, then an in-memory technology would be better than EMR.

I rate Amazon EMR a nine out of ten.

ChetanGoyal

AWS / Big Data Engineer at Waste Management, Inc.

Aug 1, 2023

Suitable for online deployments with a serverless architecture

What is our primary use case?

EMR Serverless is useful for online deployments that require a serverless architecture. We were previously using a server-based architecture, but we have since switched to serverless.

How has it helped my organization?

The product is very well-designed. It is easy to use, and it is very cost-effective.

What is most valuable?

The solution eliminates the need to manage hardware resources. Everything is managed for you, so it is a very easy-to-use service.

What needs improvement?

There is room for improvement in pricing.

For how long have I used the solution?

I have been using Amazon EMR for one year.

What do I think about the stability of the solution?

It is stable. It just seems to be stable right now. We are monitoring it closely again, so it is stable right now.

What do I think about the scalability of the solution?

It is highly scalable. There are around 2,000 end users using this solution in our organization.

How are customer service and support?

The customer service and support are great.

How would you rate customer service and support?

Positive

How was the initial setup?

The initial setup is straightforward.

What about the implementation team?

The deployment process took around two months. There's nothing much to do. It's a very easy process. We just need to configure all of them for the service, and it will be done.

For maintenance of the solution, we require managers, savings, engineers, developers

What was our ROI?

The ROI is better. It's giving a good amount of savings to the company.

What's my experience with pricing, setup cost, and licensing?

It is expensive.

What other advice do I have?

You should use it. It will be really helpful in the business. It will really help the business. So it is highly recommended to use it.

Overall, I would rate the solution an eight out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)

Ilya Afanasyev

Senior Software Development Engineer at Yahoo!

Aug 3, 2022

Great for big jobs, reliable, and doesn't require an elaborate setup

What is our primary use case?

We usually use EMR with Spark and bring in Airflow or something. For example, when we grade big jobs from on-prem to the cloud, we do it in EMR with Spark.

What is most valuable?

In the main application, we use Hive, Spark, and Flink, and there are some additional small tools. We have to do this job inside EMR itself and run from Airflow. We also use a service from Amazon, and we manage jobs through Airflow. There's lots of scheduling, et cetera, that happens. Before, we used Jenkins. Now, we also integrate from Jenkins to Airflow and its processes. It's easier to have everything connected via Amazon.

What needs improvement?

The problem for us is it starts very slow. They need to improve the start time. If we use a long-running EMR, it costs a lot of money. However, when we start, for example, a job, if the job runs for one hour, it's normal as it starts in about ten minutes. If we want, for example, to run each five minutes, it's a problem if it takes ten minutes to start. It's a little bit weird that you cannot use the service within a short period.

The support could be better.

For how long have I used the solution?

We've been using the solution for about one year.

What do I think about the stability of the solution?

We've had some issues here and there, however, for the most part, the solution is stable.

What do I think about the scalability of the solution?

The solution can scale quite well. It's not a problem.

A lot of teams use the solution. We have two or three teams that use it.

How are customer service and support?

Technical support could be faster. They don't respond as quickly as we would like.

Which solution did I use previously and why did I switch?

We used an EC2 service. We did manually manage it via our engineers. We also used Knime and looked into Kinesis.

How was the initial setup?

You don't need to deploy anything. You just start. You go to the computer and start using EMR.

Sometimes we found some strange things. We've had some exceptions. Various versions can be used, however.

What's my experience with pricing, setup cost, and licensing?

I'm not sure about to cost of the licensing. However, it does depend on the integration, for example, and many other factors.

What other advice do I have?

I'd recommend the solution to others to try.

I'd rate it a six out of ten.

Which deployment model are you using for this solution?

Public Cloud

RahulJadhav

Hadoop Administrator at Capgemini

Oct 17, 2022

User-friendly, easy to deploy, and good fault tolerance

What is our primary use case?

We are using Amazon EMR for big data processing. We have a client which has financial data and we are using a big data framework to process the data. Amazon EMR is providing all these features needed and we are using Amazon S3 for storage purposes.

We are in the process of moving many aspects of our processes into clouds, such as AWS Managed Services for metrics. We want to migrate our graph and many other aspects to Amazon AWS.

What is most valuable?

In Amazon EMR it is easy to rebuild anything, easy to upgrade and has good fault tolerance.

What needs improvement?

Amazon EMR can improve by adding some features, such as megastore services and HiveServer2. Additionally, the user interface could be better, similar to what Apache service provides, cross-platform services.

For how long have I used the solution?

I have been using Amazon EMR for approximately seven years.

What do I think about the stability of the solution?

Amazon EMR has high availability for storage services or insurance processing engines. They have a guarantee of the data and availability. It is a highly stable solution.

What do I think about the scalability of the solution?

Whenever we have scaling policies for load balancing and our load increases the input rate, we increase the resources. However, at that time Amazon AWS is not providing the scale of the required resources in a short time. It takes too much time to allocate resources, such as increasing the upscaling of the instances. The time overhead should be reduced. There is a performance issue between the cloud and on-premises.

There is some problem that we have not pinpointed yet, but there is computing power that is not up to the standards.

We have more than 1,000 people using this solution in my organization.

How are customer service and support?

I have contacted the support on a regular basis. They have been responsive with a solution.

I rate the support from Amazon EMR a four out of five.

How would you rate customer service and support?

Positive

How was the initial setup?

The initial setup of Amazon EMR is very easy. The time it takes for the full deployment of the solution depends on our Terraform script. However, it typically takes a maximum of 30 minutes to deploy any cluster.

What's my experience with pricing, setup cost, and licensing?

The price of the solution is expensive.

Which other solutions did I evaluate?

When comparing this solution to others there are three to four features that are more attractive rather than others. It's user-friendly and easy to deploy.

What other advice do I have?

My advice to others is to use the free version of the solution and make their decision to purchase it afterward.

I rate Amazon EMR an eight out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)

VinayKumar2

Lead Data Engineer at Seven Lakes Enterprises, Inc.

Aug 15, 2023

Simplifies running big data frameworks, but needs to improve its modules

What is our primary use case?

EMR is used to analyze data for projects where we wish to ingest multiple data sources into our analysts' data lakes. EMR is further used to process millions of roles within a fixed span of hours.

What is most valuable?

It gives us multiple options from Infra to the WES and different tools. It has a variety of options and support systems. Plus, it makes life easy for our developers, whether in the cloud or on-prem. Related DevOS installations are quick and easy to maintain.

What needs improvement?

Interdependencies with a third-party or open source solution should be improved. Modules and strategies should be better handled and notified early in advance. Maybe if AWS starts releasing AWS-certified or AWS-verified installations, that will generate even more confidence just like OpenJet, it'll add a specific version.

For how long have I used the solution?

I have been using Amazon EMR for three years.

What do I think about the stability of the solution?

The solution is mostly stable with certain data and code related issues.

What do I think about the scalability of the solution?

Amazon EMR is a scalable solution. We have around twenty users, but they are engineering users. Outside, we have, our analytics product, which is being used for most of our clients. We have plans of upgrading the usage of the solution.

How are customer service and support?

The technical support team is good. The team is thorough with their knowledge and are quick to respond.

How was the initial setup?

The initial setup of Amazon EMR is straightforward if you know the basics and have knowledge of big data and architecture. The AWS documentation also helps with the deployment. Initially, the deployment takes a few hours and then it becomes easy.

The maintenance depends on your application and specific requirements, but for deployment, you don't need a much bigger team, probably a team of one cloud guy should be enough.

What about the implementation team?

The deployment can be done in-house.

What's my experience with pricing, setup cost, and licensing?

There is a small fee for the EMR system, but the major cost components are the underlying infrastructure resources that we actually use.

What other advice do I have?

My advice would be to do a dependency analysis to understand the limitations before planning to move in with Amazon EMR. I would rate the overall solution a six out of ten.

Title	Rating	Mindshare	Recommending
Databricks	4.1	N/A	96%	94 interviews Add to research
Teradata	4.1	N/A	88%	83 interviews Add to research

Amazon EMR Reviews

What is Amazon EMR?

Featured Amazon EMR reviews

Amazon EMR mindshare

PeerResearch reports based on Amazon EMR reviews

Valuable Features

Room for Improvement

ROI

Pricing

Popular Use Cases

Service and Support

Deployment

Scalability

Stability

Review data by company size

Top industries

Compare Amazon EMR with alternative products

Learn more about Amazon EMR

Amazon EMR customers

Related questions

Product Categories

Popular Comparisons

What is our primary use case?

What is most valuable?

What needs improvement?

For how long have I used the solution?

How are customer service and support?

How would you rate customer service and support?

What other advice do I have?

Which deployment model are you using for this solution?

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

What is our primary use case?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What do I think about the stability of the solution?

What do I think about the scalability of the solution?

How are customer service and support?

How would you rate customer service and support?

How was the initial setup?

What's my experience with pricing, setup cost, and licensing?

Which other solutions did I evaluate?

What other advice do I have?

Which deployment model are you using for this solution?

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

What is our primary use case?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What was my experience with deployment of the solution?

What do I think about the stability of the solution?

What do I think about the scalability of the solution?

How are customer service and support?

How would you rate customer service and support?

What's my experience with pricing, setup cost, and licensing?

What other advice do I have?

Which deployment model are you using for this solution?

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

What is our primary use case?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What do I think about the stability of the solution?

What do I think about the scalability of the solution?

How are customer service and support?

How would you rate customer service and support?

How was the initial setup?

What was our ROI?

What's my experience with pricing, setup cost, and licensing?

What other advice do I have?

What is our primary use case?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What do I think about the stability of the solution?

What do I think about the scalability of the solution?

How was the initial setup?

What was our ROI?

What's my experience with pricing, setup cost, and licensing?

What other advice do I have?