We performed a comparison between Amazon EMR and Apache Spark based on real PeerSpot user reviews.
Find out in this report how the two Hadoop solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."We are using applications, such as Splunk, Livy, Hadoop, and Spark. We are using all of these applications in Amazon EMR and they're helping us a lot."
"Amazon EMR is a good solution that can be used to manage big data."
"It has a variety of options and support systems."
"One of the valuable features about this solution is that it's managed services, so it's pretty stable, and scalable as much as you wish. It has all the necessary distributions. With some additional work, it's also possible to change to a Spark version with the latest version of EMR. It also has Hudi, so we are leveraging Apache Hudi on EMR for change data capture, so then it comes out-of-the-box in EMR."
"The initial setup is pretty straightforward."
"When we grade big jobs from on-prem to the cloud, we do it in EMR with Spark."
"The project management is very streamlined."
"The solution is pretty simple to set up."
"I appreciate everything about the solution, not just one or two specific features. The solution is highly stable. I rate it a perfect ten. The solution is highly scalable. I rate it a perfect ten. The initial setup was straightforward. I recommend using the solution. Overall, I rate the solution a perfect ten."
"The data processing framework is good."
"The solution has been very stable."
"The most valuable feature of this solution is its capacity for processing large amounts of data."
"The most crucial feature for us is the streaming capability. It serves as a fundamental aspect that allows us to exert control over our operations."
"I feel the streaming is its best feature."
"AI libraries are the most valuable. They provide extensibility and usability. Spark has a lot of connectors, which is a very important and useful feature for AI. You need to connect a lot of points for AI, and you have to get data from those systems. Connectors are very wide in Spark. With a Spark cluster, you can get fast results, especially for AI."
"The good performance. The nice graphical management console. The long list of ML algorithms."
"The initial setup was time-consuming."
"Amazon EMR is continuously improving, but maybe something like CI/CD out-of-the-box or integration with Prometheus Grafana."
"We don't have much control. If we have multiple users, if they want to scale up, the cost will go and increase and we don't know how we can restrict that price part."
"The legacy versions of the solution are not supported in the new versions."
"There is room for improvement in pricing."
"The dashboard management could be better. Right now, it's lacking a bit."
"Modules and strategies should be better handled and notified early in advance."
"As people are shifting from legacy solutions to other technologies, Amazon EMR needs to add more features that give more flexibility in managing user data."
"Needs to provide an internal schedule to schedule spark jobs with monitoring capability."
"We use big data manager but we cannot use it as conditional data so whenever we're trying to fetch the data, it takes a bit of time."
"It should support more programming languages."
"The product could improve the user interface and make it easier for new users."
"Apache Spark is very difficult to use. It would require a data engineer. It is not available for every engineer today because they need to understand the different concepts of Spark, which is very, very difficult and it is not easy to learn."
"The solution needs to optimize shuffling between workers."
"The solution’s integration with other platforms should be improved."
"We've had problems using a Python process to try to access something in a large volume of data. It crashes if somebody gives me the wrong code because it cannot handle a large volume of data."
Amazon EMR is ranked 3rd in Hadoop with 20 reviews while Apache Spark is ranked 1st in Hadoop with 60 reviews. Amazon EMR is rated 7.8, while Apache Spark is rated 8.4. The top reviewer of Amazon EMR writes "Provides efficient data processing features and has good scalability ". On the other hand, the top reviewer of Apache Spark writes "Reliable, able to expand, and handle large amounts of data well". Amazon EMR is most compared with Snowflake, Cloudera Distribution for Hadoop, Azure Data Factory, Amazon Redshift and Microsoft Azure Synapse Analytics, whereas Apache Spark is most compared with Spring Boot, AWS Batch, Spark SQL, SAP HANA and AWS Fargate. See our Amazon EMR vs. Apache Spark report.
See our list of best Hadoop vendors.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.