We performed a comparison between Apache Spark, Cloudera Distribution for Hadoop, and IBM InfoSphere BigInsights [EOL] based on real PeerSpot user reviews.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop."I found the solution stable. We haven't had any problems with it."
"It is highly scalable, allowing you to efficiently work with extensive datasets that might be problematic to handle using traditional tools that are memory-constrained."
"The features we find most valuable are the machine learning, data learning, and Spark Analytics."
"Apache Spark can do large volume interactive data analysis."
"It provides a scalable machine learning library."
"We use Spark to process data from different data sources."
"The good performance. The nice graphical management console. The long list of ML algorithms."
"With Hadoop-related technologies, we can distribute the workload with multiple commodity hardware."
"In terms of scalability, if you have enough hardware you can scale out. Scalability doesn't have any issues."
"It is helpful to gather and process data."
"The features I find most valuable is that the solution is that it is easy to install and to work with. It starts with the installation and from there on the management is very simple and centralized."
"It has the best proxy, security, and support features compared to open-source products."
"We're now able to store large volumes of data through Cloudera Distribution for Hadoop. We're able to push large volumes of data to the platform, and that used to be a challenge, especially when storing a terabyte of information. This is the area where Cloudera Distribution for Hadoop improved the organization."
"The data science aspect of the solution is valuable."
"The scalability of Cloudera Distribution for Hadoop is excellent."
"Very good end-to-end security features."
"InfoSphere Streams was the one core product from the platform in which we were using. We were building a real-time response system and we built it on InfoSphere Streams."
"When using Spark, users may need to write their own parallelization logic, which requires additional effort and expertise."
"Apache Spark is very difficult to use. It would require a data engineer. It is not available for every engineer today because they need to understand the different concepts of Spark, which is very, very difficult and it is not easy to learn."
"The solution must improve its performance."
"They could improve the issues related to programming language for the platform."
"Apache Spark's GUI and scalability could be improved."
"One limitation is that not all machine learning libraries and models support it."
"Apache Spark could improve the connectors that it supports. There are a lot of open-source databases in the market. For example, cloud databases, such as Redshift, Snowflake, and Synapse. Apache Spark should have connectors present to connect to these databases. There are a lot of workarounds required to connect to those databases, but it should have inbuilt connectors."
"Needs to provide an internal schedule to schedule spark jobs with monitoring capability."
"There is a maximum of a one-gigabyte block size, which is an area of storage that can be improved upon."
"The security of this solution could be improved. There should also be a way to basically have a blockchain enabled storage with the HDFS."
"The solution is not fit for on-premise distributions."
"While the deployed product is generally functional, there are instances where it presents difficulties."
"The tool's ability to be deployed on a cloud model is an area of concern where improvements are required."
"The one thing that we struggled with predominately was support. Because it was relatively new, support was always a big issue and I think it's still a bit of an ongoing concern with the team currently managing it."
"They should focus on upgrading their technical capabilities in the market."
"The Cloudera training has deteriorated significantly."
"The UI was not interactive: Responses used to be very slow and hang up at times."
More Cloudera Distribution for Hadoop Pricing and Cost Advice →
Earn 20 points