We performed a comparison between Apache Spark, Hortonworks Data Platform, and IBM InfoSphere BigInsights [EOL] based on real PeerSpot user reviews.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop."The scalability has been the most valuable aspect of the solution."
"The most crucial feature for us is the streaming capability. It serves as a fundamental aspect that allows us to exert control over our operations."
"DataFrame: Spark SQL gives the leverage to create applications more easily and with less coding effort."
"I found the solution stable. We haven't had any problems with it."
"With Hadoop-related technologies, we can distribute the workload with multiple commodity hardware."
"It provides a scalable machine learning library."
"The fault tolerant feature is provided."
"We use Spark to process data from different data sources."
"Ambari Web UI: user-friendly."
"It is a scalable platform."
"We use it for data science activities."
"Now, using this solution, it is much cheaper to have all of the data available for searching, not in real-time, but whenever there is a pending request."
"Ranger for security; with Ranger we can manager user’s permissions/access controls very easily."
"The scalability is the key reason why we are on this platform."
"The Hortonworks solution is so stable. It is working as a production system, without any error, without any downtime. If I have downtime, it is mostly caused by the hardware of the computers."
"The data platform is pretty neat. The workflow is also really good."
"InfoSphere Streams was the one core product from the platform in which we were using. We were building a real-time response system and we built it on InfoSphere Streams."
"When using Spark, users may need to write their own parallelization logic, which requires additional effort and expertise."
"When you first start using this solution, it is common to run into memory errors when you are dealing with large amounts of data."
"The solution’s integration with other platforms should be improved."
"The management tools could use improvement. Some of the debugging tools need some work as well. They need to be more descriptive."
"We've had problems using a Python process to try to access something in a large volume of data. It crashes if somebody gives me the wrong code because it cannot handle a large volume of data."
"The solution must improve its performance."
"The initial setup was not easy."
"It's not easy to install."
"It would also be nice if there were less coding involved."
"The version control of the software is also an issue."
"Since Cloudera acquired HDP, it's been bundled with CBH and HDP. However, the biggest challenge is cloud storage integration with Azure, GCP, and AWS."
"I would like to see more support for containers such as Docker and OpenShift."
"Deleting any service requires a lot of clean up, unlike Cloudera."
"Security and workload management need improvement."
"The cost of the solution is high and there is room for improvement."
"Hive performance. If Hive performance increased, Hadoop would replace (not everywhere) traditional databases."
"The UI was not interactive: Responses used to be very slow and hang up at times."
Earn 20 points