We performed a comparison between Amazon EMR, Apache Spark, and Hortonworks Data Platform based on real PeerSpot user reviews.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop."The initial setup is pretty straightforward."
"The project management is very streamlined."
"One of the valuable features about this solution is that it's managed services, so it's pretty stable, and scalable as much as you wish. It has all the necessary distributions. With some additional work, it's also possible to change to a Spark version with the latest version of EMR. It also has Hudi, so we are leveraging Apache Hudi on EMR for change data capture, so then it comes out-of-the-box in EMR."
"The initial setup is straightforward."
"We are using applications, such as Splunk, Livy, Hadoop, and Spark. We are using all of these applications in Amazon EMR and they're helping us a lot."
"The solution is scalable."
"Amazon EMR is a good solution that can be used to manage big data."
"It has a variety of options and support systems."
"The solution is very stable."
"DataFrame: Spark SQL gives the leverage to create applications more easily and with less coding effort."
"The product’s most valuable feature is the SQL tool. It enables us to create a database and publish it."
"The product’s most valuable features are lazy evaluation and workload distribution."
"It's easy to prepare parallelism in Spark, run the solution with specific parameters, and get good performance."
"The most crucial feature for us is the streaming capability. It serves as a fundamental aspect that allows us to exert control over our operations."
"Apache Spark provides a very high-quality implementation of distributed data processing."
"The data processing framework is good."
"The product offers a fairly easy setup process."
"Distributed computing, secure containerization, and governance capabilities are the most valuable features."
"Hortonworks should not be expensive at all to those looking into using it."
"The upgrades and patches must come from Hortonworks."
"Ambari Web UI: user-friendly."
"It is a scalable platform."
"Ranger for security; with Ranger we can manager user’s permissions/access controls very easily."
"We use it for data science activities."
"Amazon EMR can improve by adding some features, such as megastore services and HiveServer2. Additionally, the user interface could be better, similar to what Apache service provides, cross-platform services."
"The legacy versions of the solution are not supported in the new versions."
"The dashboard management could be better. Right now, it's lacking a bit."
"The most complicated thing is configuring to the cluster and ensure it's running correctly."
"There is no need to pay extra for third-party software."
"The product must add some of the latest technologies to provide more flexibility to the users."
"There were times where they would release new versions and it seemed to end up breaking old versions, which is very strange."
"There is room for improvement in pricing."
"We use big data manager but we cannot use it as conditional data so whenever we're trying to fetch the data, it takes a bit of time."
"We are building our own queries on Spark, and it can be improved in terms of query handling."
"Apache Spark should add some resource management improvements to the algorithms."
"We've had problems using a Python process to try to access something in a large volume of data. It crashes if somebody gives me the wrong code because it cannot handle a large volume of data."
"The management tools could use improvement. Some of the debugging tools need some work as well. They need to be more descriptive."
"One limitation is that not all machine learning libraries and models support it."
"The solution must improve its performance."
"It should support more programming languages."
"Deleting any service requires a lot of clean up, unlike Cloudera."
"The cost of the solution is high and there is room for improvement."
"I work a lot with banking, IT and communications customers. Hortonworks must improve or must upgrade their services for these sectors."
"It's at end of life and no longer will there be improvements."
"Security and workload management need improvement."
"I would like to see more support for containers such as Docker and OpenShift."
"Since Cloudera acquired HDP, it's been bundled with CBH and HDP. However, the biggest challenge is cloud storage integration with Azure, GCP, and AWS."
"The version control of the software is also an issue."