Coming October 25: PeerSpot Awards will be announced! Learn more

Amazon EMR OverviewUNIXBusinessApplication

Amazon EMR is #3 ranked solution in top Hadoop tools and #10 ranked solution in top Cloud Data Warehouse tools. PeerSpot users give Amazon EMR an average rating of 6.8 out of 10. Amazon EMR is most commonly compared to Cloudera Distribution for Hadoop: Amazon EMR vs Cloudera Distribution for Hadoop. Amazon EMR is popular among the large enterprise segment, accounting for 74% of users researching this solution on PeerSpot. The top industry researching this solution are professionals from a computer software company, accounting for 20% of all views.
Buyer's Guide

Download the Hadoop Buyer's Guide including reviews and more. Updated: September 2022

What is Amazon EMR?
Amazon Elastic MapReduce (Amazon EMR) is a web service that makes it easy to quickly and cost-effectively process vast amounts of data. Amazon EMR simplifies big data processing, providing a managed Hadoop framework that makes it easy, fast, and cost-effective for you to distribute and process vast amounts of your data across dynamically scalable Amazon EC2 instances.

Amazon EMR was previously known as Amazon Elastic MapReduce.

Amazon EMR Customers
Yelp
Amazon EMR Video

Amazon EMR Pricing Advice

What users are saying about Amazon EMR pricing:
  • "You don't need to pay for licensing on a yearly or monthly basis, you only pay for what you use, in terms of underlying instances."
  • "The cost of Amazon EMR is very high."
  • Amazon EMR Reviews

    Filter by:
    Filter Reviews
    Industry
    Loading...
    Filter Unavailable
    Company Size
    Loading...
    Filter Unavailable
    Job Level
    Loading...
    Filter Unavailable
    Rating
    Loading...
    Filter Unavailable
    Considered
    Loading...
    Filter Unavailable
    Order by:
    Loading...
    • Date
    • Highest Rating
    • Lowest Rating
    • Review Length
    Search:
    Showingreviews based on the current filters. Reset all filters
    Engineering Manager/Solution architect at a computer software company with 201-500 employees
    Real User
    Top 5Leaderboard
    Stable, scalable, and has all the necessary distributions
    Pros and Cons
    • "One of the valuable features about this solution is that it's managed services, so it's pretty stable, and scalable as much as you wish. It has all the necessary distributions. With some additional work, it's also possible to change to a Spark version with the latest version of EMR. It also has Hudi, so we are leveraging Apache Hudi on EMR for change data capture, so then it comes out-of-the-box in EMR."
    • "Amazon EMR is continuously improving, but maybe something like CI/CD out-of-the-box or integration with Prometheus Grafana."

    What is our primary use case?

    A use case of this solution, for one of our clients with a large database of letters with addresses, is to predict if a person still lives at the listed address or if they have moved to another. We leverage EMR and SageMaker in AWS. 

    EMR is cloud-based and managed through the cloud. 

    What is most valuable?

    One of the valuable features about this solution is that it's managed services, so it's pretty stable, and scalable as much as you wish. It has all the necessary distributions. With some additional work, it's also possible to change to a Spark version with the latest version of EMR. It also has Hudi, so we are leveraging Apache Hudi on EMR for change data capture, so then it comes out-of-the-box in EMR. 

    What needs improvement?

    Amazon EMR is continuously improving, but maybe something like CI/CD out-of-the-box or integration with Prometheus Grafana. 

    For how long have I used the solution?

    I have been working with this solution for three years. 

    Buyer's Guide
    Hadoop
    September 2022
    Find out what your peers are saying about Amazon, Cloudera, Apache and others in Hadoop. Updated: September 2022.
    633,572 professionals have used our research since 2012.

    What do I think about the stability of the solution?

    This solution is pretty stable. 

    What do I think about the scalability of the solution?

    It's managed services, so it's scalable as much as you wish. 

    There are something like 40 to 50 people using EMR in my organization. 

    How are customer service and support?

    We are an AWS Premier Partner, so we have all the necessary support and the ability to contact product teams. 

    Which solution did I use previously and why did I switch?

    We didn't use any other products before implementing EMR. Some of our clients have Cloudera distributions, but we prefer EMR. 

    How was the initial setup?

    The installation is straightforward because you can do it from the AWS Console or with Terraform. You can do it yourself. 

    What about the implementation team?

    We implement this solution ourselves. On our team, we have admins, data engineers, DevOps engineers, and MLOps engineers. We have 40 or 50 data engineers. 

    What's my experience with pricing, setup cost, and licensing?

    You don't need to pay for licensing on a yearly or monthly basis, you only pay for what you use, in terms of underlying instances. 

    What other advice do I have?

    We have a range of clients in addition to the client with the large database of addresses. Another client is a large blockchain company and we do analytics for them, using Bare Metal and Hadoop, but not EMR. We're also doing Spark Streaming, Spark SQL, and some queries with Impala. We also have a company that enriches data from mobile companies, in terms of GAL locations of cell phones, with a variety of data from other sources to predict profitability.

    I rate Amazon EMR an eight out of ten. It's continuously improving, and now it's possible to manage EMR directly from SageMaker Notebook. It's continuously evolving. I would recommend EMR to others because it's pretty straightforward, so onboarding doesn't take much time. 

    Which deployment model are you using for this solution?

    Public Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Amazon Web Services (AWS)
    Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
    PeerSpot user
    Fabrice Rene Ngadjeu Tchouake - PeerSpot reviewer
    Responsable sofware factory at BOX AFRICA
    Real User
    Top 10
    Stable but could offer better dashboard management and workarounds for multi-factor authentication
    Pros and Cons
    • "The initial setup is pretty straightforward."
    • "The dashboard management could be better. Right now, it's lacking a bit."

    What is most valuable?

    We're working to try the solution on a small-scale first. It's great that it's a solution that allows us to try it bit by bit.

    The initial setup is pretty straightforward.

    The stability of the product has been great overall.

    Technical support has been knowledgeable and helpful.

    What needs improvement?

    The dashboard management could be better. Right now, it's lacking a bit. 

    I'd like more of a remote connection between my computer and the solution.

    We have multi-factor authentication, and at one point it was an issue due to the fact that I lost my phone. It stopped me from accessing the system.

    We have to replicate all the infrastructure and we need to ensure that we have the scalability and to do so in production. We are hoping that Amazon will allow us to scale easily. However, we have not attempted to scale just yet.

    For how long have I used the solution?

    I've been using the solution for a while, however, I can't specify the exact amount of time. It might have been approximately six to a year ago. I may have started using it in March 2020.

    What do I think about the stability of the solution?

    We haven't had any issues with Amazon in terms of stability. It's been good so far.

    What do I think about the scalability of the solution?

    I can't exactly speak to the scalability of the solution just yet. All of the products we are using are in the development phase. We're testing everything out. Once we go into production we'll likely look more closely at scaling, however, at this point, it's not necessary. Therefore, I can't really discuss anything about it.

    How are customer service and technical support?

    Technical support has been very good so far. We've been quite happy with them. They are knowledgeable and responsive and we are satisfied with the level of support provided to us.

    How was the initial setup?

    The initial setup is pretty straightforward. It's not extremely complex.

    However, it's difficult to reset the multi-factor authentication once it is set up. This was an issue for me when we lost the phone.

    What's my experience with pricing, setup cost, and licensing?

    The price of the solution may be a bit more than other competitors, such as Microsoft.

    Which other solutions did I evaluate?

    I did look at other solutions before choosing Amazon. We did look at Microsoft as well, among others.

    What other advice do I have?

    In general, I would rate the solution at a four out of ten.

    Which deployment model are you using for this solution?

    Public Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Amazon Web Services (AWS)
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    Buyer's Guide
    Hadoop
    September 2022
    Find out what your peers are saying about Amazon, Cloudera, Apache and others in Hadoop. Updated: September 2022.
    633,572 professionals have used our research since 2012.
    Prashant  Singh - PeerSpot reviewer
    Vice President -Product Management at a computer software company with 1,001-5,000 employees
    Real User
    Top 5Leaderboard
    Easy to manage and reliable but the cost is hard to control
    Pros and Cons
    • "The solution is pretty simple to set up."
    • "We don't have much control. If we have multiple users, if they want to scale up, the cost will go and increase and we don't know how we can restrict that price part."

    What is our primary use case?

    We primarily use the solution for tech processing. 

    How has it helped my organization?

    It's made life very easy. Now, a lot of things are very automated. 

    What is most valuable?

    It is easy to manage. The applications are much easier as compared to others.

    The solution is pretty simple to set up. 

    It's stable and reliable.

    The product can scale. 

    What needs improvement?

    The cost is increasing. We are looking into how we can optimize the cost part of EMR. We're doing a comparison between Cloudera running on AWS and running AWS EMR.

    We don't have much control. If we have multiple users, if they want to scale up, the cost will go and increase and we don't know how we can restrict that price part. 

    For how long have I used the solution?

    I've been using this solution for a while now. It's been maybe a year or more. 

    What do I think about the stability of the solution?

    It's quite stable. There are no bugs or glitches and it doesn't crash or freeze. It's reliable. 

    What do I think about the scalability of the solution?

    The product can scale. It's not a problem at all. 

    How are customer service and support?

    Technical support is okay. The only challenge that we face is when case we integrate with other open-source solutions or products. We have issues, for example, with integrating ranges, and the VMR.

    Which solution did I use previously and why did I switch?

    We are also using Hortonworks, which is now a part of Cloudera.

    How was the initial setup?

    The initial setup was very easy. It's not overly complex or difficult. 

    We can deploy the solution in a single day. It's very fast to get up and running. 

    There's a team of ten people that can handle the setup and maintenance of the product.

    What about the implementation team?

    We have a team that handles the initial setup.

    What was our ROI?

    The ROI would depend on the business case. I can't speak to an exact ROI. 

    What's my experience with pricing, setup cost, and licensing?

    The price can get a bit high. We're looking for ways to reduce costs. 

    Right now it costs us between $40,000 and $50,000 a month.

    What other advice do I have?

    We are a customer and end-user.

    I'd advise potential new users to give it a try if they have the requisite use cases. If it fits their use case, they should definitely go for EMR.

    I'd rate the solution seven out of ten.

    Which deployment model are you using for this solution?

    Public Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Amazon Web Services (AWS)
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    Flag as inappropriate
    PeerSpot user
    Ilya Afanasyev - PeerSpot reviewer
    Senior Software Development Engineer at Yahoo!
    Real User
    Top 5Leaderboard
    Great for big jobs, reliable, and doesn't require an elaborate setup
    Pros and Cons
    • "When we grade big jobs from on-prem to the cloud, we do it in EMR with Spark."
    • "The problem for us is it starts very slow."

    What is our primary use case?

    We usually use EMR with Spark and bring in Airflow or something. For example, when we grade big jobs from on-prem to the cloud, we do it in EMR with Spark. 

    What is most valuable?

    In the main application, we use Hive, Spark, and Flink, and there are some additional small tools. We have to do this job inside EMR itself and run from Airflow. We also use a service from Amazon, and we manage jobs through Airflow. There's lots of scheduling, et cetera, that happens. Before, we used Jenkins. Now, we also integrate from Jenkins to Airflow and its processes. It's easier to have everything connected via Amazon. 

    What needs improvement?

    The problem for us is it starts very slow. They need to improve the start time. If we use a long-running EMR, it costs a lot of money. However, when we start, for example, a job, if the job runs for one hour, it's normal as it starts in about ten minutes. If we want, for example, to run each five minutes, it's a problem if it takes ten minutes to start. It's a little bit weird that you cannot use the service within a short period. 

    The support could be better.

    For how long have I used the solution?

    We've been using the solution for about one year. 

    What do I think about the stability of the solution?

    We've had some issues here and there, however, for the most part, the solution is stable. 

    What do I think about the scalability of the solution?

    The solution can scale quite well. It's not a problem.

    A lot of teams use the solution. We have two or three teams that use it. 

    How are customer service and support?

    Technical support could be faster. They don't respond as quickly as we would like. 

    Which solution did I use previously and why did I switch?

    We used an EC2 service. We did manually manage it via our engineers. We also used Knime and looked into Kinesis. 

    How was the initial setup?

    You don't need to deploy anything. You just start. You go to the computer and start using EMR.

    Sometimes we found some strange things. We've had some exceptions. Various versions can be used, however.

    What's my experience with pricing, setup cost, and licensing?

    I'm not sure about to cost of the licensing. However, it does depend on the integration, for example, and many other factors. 

    What other advice do I have?

    I'd recommend the solution to others to try.

    I'd rate it a six out of ten. 

    Which deployment model are you using for this solution?

    Public Cloud
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    Flag as inappropriate
    PeerSpot user
    Deputy CTO at a tech company with 51-200 employees
    Real User
    Top 5Leaderboard
    Easily accessible to many dev teams, simple to use and very flexible
    Pros and Cons
    • "This is the best tool for hosts and it's really flexible and scalable."
    • "The most complicated thing is configuring to the cluster and ensure it's running correctly."

    What is our primary use case?

    We use the solution to run spark script on our system for combination algorithms on our website. It's a Hadoop cluster to make the calculation to execute spark scripts. We have a cashback website and offer personalized recommendations to users and EMR is used to make the calculation by accessing user data. We also use this product for building a data lake using our numerous primary data sources. We've used EMR to make the latest version in the data lake. All data is stored in S3 bucket in a packet format. I'm Deputy CTO of the company. 

    What is most valuable?

    This tool is simple to use and it's really accessible to many dev teams. It's the best tool for hosts and it's really flexible and scalable which is necessary because we have a lot of data and some of our tasks take a lot of resources. 

    What needs improvement?

    The most complicated thing is configuring to the cluster and to ensure it's running correctly. You need to configure at least three Amazon policies to get authorization for all the instances. And if you're new on the system it's really complicated. It's something that could be simplified for users. For additional features, I'd like to see a better MLOps platform but it's possible that it's already in production. 

    For how long have I used the solution?

    I've been using this solution for almost six years. 

    What do I think about the stability of the solution?

    We haven't had any problems with stability. 

    What do I think about the scalability of the solution?

    The solution is scalable, you can choose the cluster with the instance you need depending on memory or storage or GPU as you need. There are about five users in the company. 

    How was the initial setup?

    The initial setup is simple when you know the tool, but when you don't know the tool you need to look through the documentation. One of our team carried out deployment. We recently rebuilt our data lake and it took a day to get the right configuration.

    Which other solutions did I evaluate?

    We were on AWS for the web part so it was logical to take another AWS product, but today we are looking for an acceleration tool. The DevOps part takes a lot of time so we want to integrate with all the scope of MLOps and so we're looking at Databricks. 

    What other advice do I have?

    I would recommend this solution and I rate it an eight out of 10. 

    Which deployment model are you using for this solution?

    Public Cloud
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    Senior Chief Engineer (Enterprise System Presales/Postsales) at a comms service provider with 10,001+ employees
    Real User
    Reliable, responsive support, and simple implementation

    What is our primary use case?

    We are using Amazon EMR for data pipelines. We are using it to put our data into it and then we are transforming it.

    What is most valuable?

    We are using applications, such as Splunk, Livy, Hadoop, and Spark. We are using all of these applications in Amazon EMR and they're helping us a lot.

    For how long have I used the solution?

    I have been using Amazon EMR for approximately one year.

    What do I think about the stability of the solution?

    Amazon EMR is reliable and stable.

    How are customer service and support?

    Whenever we have issues we contact Amazon EMR support and we receive the responses. We are satisfied with the support.

    How was the initial setup?

    The initial setup of Amazon EMR is easy.

    What's my experience with pricing, setup cost, and licensing?

    The cost of Amazon EMR is very high.

    What other advice do I have?

    My advice to others is that before implementing a solution they should look around. There are multiple solutions available and one might be a better fit for their use case.

    I rate Amazon EMR an eight out of ten.

    Which deployment model are you using for this solution?

    Public Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Amazon Web Services (AWS)
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    Flag as inappropriate
    PeerSpot user
    Buyer's Guide
    Download our free Hadoop Report and find out what your peers are saying about Amazon, Cloudera, Apache, and more!
    Updated: September 2022
    Product Categories
    Hadoop Cloud Data Warehouse
    Buyer's Guide
    Download our free Hadoop Report and find out what your peers are saying about Amazon, Cloudera, Apache, and more!