We use the solution to run spark script on our system for combination algorithms on our website. It's a Hadoop cluster to make the calculation to execute spark scripts. We have a cashback website and offer personalized recommendations to users and EMR is used to make the calculation by accessing user data. We also use this product for building a data lake using our numerous primary data sources. We've used EMR to make the latest version in the data lake. All data is stored in S3 bucket in a packet format. I'm Deputy CTO of the company.
Deputy CTO at a tech company with 51-200 employees
Easily accessible to many dev teams, simple to use and very flexible
Pros and Cons
- "This is the best tool for hosts and it's really flexible and scalable."
- "The most complicated thing is configuring to the cluster and ensure it's running correctly."
What is our primary use case?
What is most valuable?
This tool is simple to use and it's really accessible to many dev teams. It's the best tool for hosts and it's really flexible and scalable which is necessary because we have a lot of data and some of our tasks take a lot of resources.
What needs improvement?
The most complicated thing is configuring to the cluster and to ensure it's running correctly. You need to configure at least three Amazon policies to get authorization for all the instances. And if you're new on the system it's really complicated. It's something that could be simplified for users. For additional features, I'd like to see a better MLOps platform but it's possible that it's already in production.
For how long have I used the solution?
I've been using this solution for almost six years.
Buyer's Guide
Amazon EMR
June 2025

Learn what your peers think about Amazon EMR. Get advice and tips from experienced pros sharing their opinions. Updated: June 2025.
857,028 professionals have used our research since 2012.
What do I think about the stability of the solution?
We haven't had any problems with stability.
What do I think about the scalability of the solution?
The solution is scalable, you can choose the cluster with the instance you need depending on memory or storage or GPU as you need. There are about five users in the company.
How was the initial setup?
The initial setup is simple when you know the tool, but when you don't know the tool you need to look through the documentation. One of our team carried out deployment. We recently rebuilt our data lake and it took a day to get the right configuration.
Which other solutions did I evaluate?
We were on AWS for the web part so it was logical to take another AWS product, but today we are looking for an acceleration tool. The DevOps part takes a lot of time so we want to integrate with all the scope of MLOps and so we're looking at Databricks.
What other advice do I have?
I would recommend this solution and I rate it an eight out of 10.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Responsable sofware factory at BOX AFRICA
Stable but could offer better dashboard management and workarounds for multi-factor authentication
Pros and Cons
- "The initial setup is pretty straightforward."
- "The dashboard management could be better. Right now, it's lacking a bit."
What is most valuable?
We're working to try the solution on a small-scale first. It's great that it's a solution that allows us to try it bit by bit.
The initial setup is pretty straightforward.
The stability of the product has been great overall.
Technical support has been knowledgeable and helpful.
What needs improvement?
The dashboard management could be better. Right now, it's lacking a bit.
I'd like more of a remote connection between my computer and the solution.
We have multi-factor authentication, and at one point it was an issue due to the fact that I lost my phone. It stopped me from accessing the system.
We have to replicate all the infrastructure and we need to ensure that we have the scalability and to do so in production. We are hoping that Amazon will allow us to scale easily. However, we have not attempted to scale just yet.
For how long have I used the solution?
I've been using the solution for a while, however, I can't specify the exact amount of time. It might have been approximately six to a year ago. I may have started using it in March 2020.
What do I think about the stability of the solution?
We haven't had any issues with Amazon in terms of stability. It's been good so far.
What do I think about the scalability of the solution?
I can't exactly speak to the scalability of the solution just yet. All of the products we are using are in the development phase. We're testing everything out. Once we go into production we'll likely look more closely at scaling, however, at this point, it's not necessary. Therefore, I can't really discuss anything about it.
How are customer service and technical support?
Technical support has been very good so far. We've been quite happy with them. They are knowledgeable and responsive and we are satisfied with the level of support provided to us.
How was the initial setup?
The initial setup is pretty straightforward. It's not extremely complex.
However, it's difficult to reset the multi-factor authentication once it is set up. This was an issue for me when we lost the phone.
What's my experience with pricing, setup cost, and licensing?
The price of the solution may be a bit more than other competitors, such as Microsoft.
Which other solutions did I evaluate?
I did look at other solutions before choosing Amazon. We did look at Microsoft as well, among others.
What other advice do I have?
In general, I would rate the solution at a four out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Buyer's Guide
Amazon EMR
June 2025

Learn what your peers think about Amazon EMR. Get advice and tips from experienced pros sharing their opinions. Updated: June 2025.
857,028 professionals have used our research since 2012.
Data Science Engineer
Ability to easily and quickly resize the cluster is what really makes it stand out
Pros and Cons
- "The ability to resize the cluster is what really makes it stand out over other Hadoop and big data solutions."
- "There were times where they would release new versions and it seemed to end up breaking old versions, which is very strange."
What is most valuable?
The ability to resize the cluster is what really makes it stand out over other Hadoop and big data solutions. You can do it very easily and quickly. It is a managed service from AWS Amazon so it removes a lot of the headaches of configuring the different environments for all the nodes in the cluster, and frees you up to do other things. You can use it. You can set it up in minutes and it's very straightforward.
How has it helped my organization?
Well, I've been at two different companies and mostly I'll relate to my experience at HLI, Human Longevity, in San Diego. We used it for genomics. Genomics is a perfect use case for big data. We manage literally terabytes of data using some of the tools that are included with EMR like Spark and Hive. What we were able to do with these EMR tools - EMR is a collection of things - was to essentially set up a genomic data warehouse of people's samples and their sequenced DNA. And then we were able to quickly and easily pair that with annotation data which essentially just tells you what your genome means, like what that sequence, or what certain sections of those characters, means. That was just all very, very easy and it allowed everyone to know where, for instance, the most recent versions of certain data lived at all times, which is really important.
What needs improvement?
There were times where they would release new versions and it seemed to end up breaking old versions, which is very strange. It could have been a red herring, it could have been that something else changed in our environment that we never found out. But all of a sudden one day we couldn't run our scripts to start up clusters, the things we could do the day before. It was because they'd released a new version and we had to change things around.
They have listened to the community quite a bit. So, the things that we had suggested to them - they sometimes have older versions of some of these tools because they're open source and Amazon creates their own version of these. Like, for instance, the version of Hive was pretty far behind for a quite a while.
They've addressed that and I think it's partially because of customers like us telling them, "Hey, there are a lot of new features that should be available but aren't in your distribution."
For how long have I used the solution?
For close to two years now.
What do I think about the stability of the solution?
No, not really. I can definitely count on it to do what it needs to do. There hasn't been a time in the last year that it has been anything but the data you're feeding into it.
You have to configure it. You may have to configure your cluster with bigger nodes or with more nodes if the shape of your data changes. That's going to be the nature of the beast with any kind of solution like this, so that's not EMR's fault.
What do I think about the scalability of the solution?
No.
How are customer service and technical support?
I have not called them but we had a plan where, if we had an urgent case, we could email them. There were certain people in the organization who could actually call them for mission critical things in our department using EMR. We could basically either ask those people to do it or we could email them, and we could expect the response within a couple of hours.
We did have to do that when the new version came out and broke the old version. And then when there was one time it turned out to be the data that was a problem. There were so many logs and we were in a time crunch and searching through the logs, trying to figure out what was going on. So we emailed them, and both times they were very responsive, and they solved the problem very quickly.
Which solution did I use previously and why did I switch?
No, not really. The reason that we used it at that company - when I got there, that's what they were using. It was because my boss was very big on using those managed services from Amazon because it does give you an additional layer of insurance where, if something goes wrong at the level of the operating system for instance - the patching for the operating system for the nodes in the cluster - that's on Amazon to take care of that. We didn't have to focus on that so we could focus on actually getting the work done.
How was the initial setup?
It was one of those things where once you figured it out, you've got it. With this big data stuff, you put in a lot of work, trying to set something up and then you sort of set it and forget it.
Amazon has made it much easier since I first started with it. Once you get the cluster set up, if you set it up in the graphical interface, just point and click, you can actually copy a script that you could run from the command line to create that cluster. That is extremely helpful and that's the way that most people do it in production. You have a script and you run and it comes up. So it's a one-button kind of thing.
They tried to make it easy. It was fairly simple once you got through the complexity of everything that was involved with it.
Which other solutions did I evaluate?
Every now and then we would evaluate another vendor like Cloudera or MapR, but at the end of the day, we ended up sticking with EMR because nothing made a compelling enough argument to change.
We did try Cloudera. We liked Cloudera quite a bit, but between the fact that we already had such an investment in EMR and the fact was that Cloudera's cost - it's not that they weren't competitive - just wasn't enough of a cost savings to justify switching. And then MapR came in and tried to sell us on them, and none of us ever saw any benefit for using MapR over any other solutions.
Using Cloudera may have looked a little bit less expensive because Amazon EMR does charge extra fees per node based on the size of the node. When you're using EMR, it can be up to 16 to 32 times the actual original cost of the nodes. But we determined that that extra cost - for us, it was only about two to four times because of the size of the nodes we were using - the penalties weren't as great. And the benefit of not having to manage the infrastructure was enough that we said, "Well, if we want the Cloudera, we would have to do that to a certain extent, potentially." So, we said, "All right, well, it would be more work. So, let's just keep it with EMR."
What other advice do I have?
I would say take advantage of the documentation that exists, there are a lot of tutorials, and there's a really good community. The documentation is actually very thorough and very well-written, which is one of the greatest things with AWS. I don't know if this matters, but I'm a Certified Developer and Solutions Architect with Associate level, so not that I wouldn't criticize them, if I had anything to criticize.
I gave it a nine out of 10 because nothing is perfect. Everything can always improve but, overall, it's extremely well thought out. The cost is a bit prohibitive sometimes, but the whole world of big data and cluster computing can be very daunting, especially for someone new getting into it as a developer, or from a business perspective. Amazon makes it about as easy as it can be to dip a toe in those waters.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Big Data Architect at a tech services company with 1,001-5,000 employees
It's helped to automate processes. but it should offer automation deployment on multiple nodes.
What is most valuable?
There are several features that are most valuable for us--
- Hue
- Hive
- Spark
- S3
How has it helped my organization?
It's helped to automate processes. It's also provided quick deployment, and faster processing.
What needs improvement?
Quicker and offer automation deployment on multiple nodes.
For how long have I used the solution?
I've used it for one year.
What was my experience with deployment of the solution?
Sometimes there were issues.
What do I think about the stability of the solution?
Sometimes there were issues.
What do I think about the scalability of the solution?
Sometimes there were issues.
How are customer service and technical support?
Customer Service:
It's good.
Technical Support:It's good.
Which solution did I use previously and why did I switch?
No solution had been used previously, but we are using it alongside Hortonworks.
How was the initial setup?
It was complex to configure.
What about the implementation team?
It was done in-house.
What other advice do I have?
We provide services for product implementation, so people looking for such products can contact me.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Big Data Specialist at a media company with 501-1,000 employees
It's less expensive than in-house hosting and much more scalable when needed to process larger volumes of data, but it needs better monitoring and debugging.
What is most valuable?
- Availability
- Scalability
- Zero maintenance
How has it helped my organization?
Less expensive than in-house hosting, much more scalable when needed to process larger volumes of data.
What needs improvement?
Better monitoring, debugging, and stability are all needed.
For how long have I used the solution?
I've used it for two years.
What was my experience with deployment of the solution?
No issues encountered.
What do I think about the stability of the solution?
No issues encountered.
What do I think about the scalability of the solution?
No issues encountered.
How are customer service and technical support?
Customer Service:
Not consistent. Sometimes great, sometimes poor.
Technical Support:Not consistent. Sometimes great, sometimes poor.
Which solution did I use previously and why did I switch?
We used another vendors cloud solution to host our own Hadoop cluster. It wasn't scalable and productive enough,
How was the initial setup?
It was straightforward.
What about the implementation team?
We did it in-house.
What was our ROI?
It varies from vendor to vendor and depends on usage.
What's my experience with pricing, setup cost, and licensing?
It varies from vendor to vendor and depends on usage.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Tech Support Staff at a tech company with 51-200 employees
Cost effective IaaS cloud service with a very good distributed data processing powers.
Valuable Features:
Amazon Elastic MapReduce is an IaaS platform and is capable of processing vast amounts of data on the cloud. It can do data processing with data sitting on different machines distributed across the cloud. It is based on the underlying technology of Hadoop and Hadoop Distributed File System for doing distributed data processing. It is comparatively cheaper than setting up your own distributed hardware platform. It is designed to grow up or scale down as per the requirement of the task, and all this is done automatically, without any manual intervention. No IT infrastructure management work is required as everything is over the cloud and managed by the service provider.
Room for Improvement:
The web interface for managing all your cloud services is a bit patchy and needs improvement. The way the services and features are intermingled is quite difficult for a new user to get acquainted with. This requires a decent amount of time investment for learning the initial basics. Setting up map reduce tasks for financial analysis, file analysis is very difficult unlike other tasks like data mining etc.
Other Advice:
Overall a good cloud infrastructure and takes the burden of managing the distributed hardware and can easily scale up and scale down as per the requirement of the task at hand. Very efficient in number crunching tasks using the map reduce technology as the tasks are distributed across multiple machines spread across the network. The web interface for managing cloud instances is a bit difficult to understand particularly for beginners and needs further improvement keeping in mind the ease of use for end users.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Developer at a tech company with 51-200 employees
A highly scalable, reliable, and cost-effective data processing and number crunching platform
Valuable Features:
- A highly scalable platform to run vast amounts of data processing with very high efficiency.
- Can be set to auto-scale up/down, as and when required.
- Highly reliable platform with almost no downtime
- Cost Effective
- Can be used as a Web Management Console and a Web Services API.
- Web consoles makes it very easy to run simple jobs.
- Can be very easily integrated with Hadoop Clusters and HDFS distributed file systems.
- Uses the in-house Amazon Elastic Cloud (EC2) and Amazon Simple Storage Service (Amazon S3) for providing a dynamic cloud storage facility.
Room for Improvement:
- Setting up jobs for operations like data mining, web indexing, and machine learning is comparatively easier than log file analysis, financial analysis, etc .
- For novice users, there is a bit of a steep learning curve, but things become much easier once you have the basics under your belt.
- One of the lacking features is good web support. Though the web interface looks pretty decent, some of the basic features are missing. For example, you will find it a bit difficult to customize a particular map to reduce tasks, which involves a lot of customizations with regard to a given web indexing task. This involves extensive use of the underlying HDFS file system.
Other Advice:
I've been using Amazon Elastic MapReduce for more than a year and found it to be a very useful tool. I was a bit hesitant to try this wonderful tool when I had started for the first time, but having previous expertise in a similar tool helped me grasp things at a faster pace.
The bottom line is that if you have used some similar tools in the past, you are good to go. And, if you are new to the concept of a distributed task structure, it would be wise to spend a couple of minutes to get yourself acquainted with the MapReduce technology. This is my personal experience.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Buyer's Guide
Download our free Amazon EMR Report and get advice and tips from experienced pros
sharing their opinions.
Updated: June 2025
Popular Comparisons
Azure Data Factory
Teradata
Snowflake
Apache Spark
Microsoft Azure Synapse Analytics
Dremio
Amazon Redshift
AWS Lake Formation
Cloudera Distribution for Hadoop
HPE Ezmeral Data Fabric
Spark SQL
IBM Analytics Engine
Buyer's Guide
Download our free Amazon EMR Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
Very helpful.