We performed a comparison between Apache Spark, Cloudera Distribution for Hadoop, and IBM Spectrum Computing based on real PeerSpot user reviews.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop."There's a lot of functionality."
"The main feature that we find valuable is that it is very fast."
"It is useful for handling large amounts of data. It is very useful for scientific purposes."
"The product’s most valuable feature is the SQL tool. It enables us to create a database and publish it."
"One of Apache Spark's most valuable features is that it supports in-memory processing, the execution of jobs compared to traditional tools is very fast."
"The most crucial feature for us is the streaming capability. It serves as a fundamental aspect that allows us to exert control over our operations."
"It is highly scalable, allowing you to efficiently work with extensive datasets that might be problematic to handle using traditional tools that are memory-constrained."
"The processing time is very much improved over the data warehouse solution that we were using."
"The tool can be deployed using different container technologies, which makes it very scalable."
"The solution is reliable and stable, it fits our requirements."
"The solution is stable."
"It has the best proxy, security, and support features compared to open-source products."
"We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there are a lot of things that need to improve. I believe they are working on that."
"In terms of scalability, if you have enough hardware you can scale out. Scalability doesn't have any issues."
"The data science aspect of the solution is valuable."
"CDH has a wide variety of proprietary tools that we use, like Impala. So from that perspective, it's quite useful as opposed to something open-source. We get a lot of value from Cloudera's proprietary tools."
"Easy to operate and use."
"The most valuable aspect of the product is the policy driving resource management, to optimize the computing across data centers."
"This solution is working for both VTL and tape."
"We are satisfied with the technical support, we have no issues."
"The most valuable feature is the backup capability."
"Spectrum Computing's best features are its speed, robustness, and data processing and analysis."
"Include more machine learning algorithms and the ability to handle streaming of data versus micro batch processing."
"We use big data manager but we cannot use it as conditional data so whenever we're trying to fetch the data, it takes a bit of time."
"The solution’s integration with other platforms should be improved."
"Apart from the restrictions that come with its in-memory implementation. It has been improved significantly up to version 3.0, which is currently in use."
"The logging for the observability platform could be better."
"In data analysis, you need to take real-time data from different data sources. You need to process this in a subsecond, do the transformation in a subsecond, and all that."
"It requires overcoming a significant learning curve due to its robust and feature-rich nature."
"This solution currently cannot support or distribute neural network related models, or deep learning related algorithms. We would like this functionality to be developed."
"Without the big data environment, we cannot store all of this data live. We have billions of records and terabytes of storage to be used. It's not an option actually for us to have a big data environment."
"The solution does not support multiple languages very well and this means users need to create work-arounds to implement some solutions."
"Cloudera Distribution for Hadoop is not always completely stable in some cases, which can be a concern for big data solutions."
"The one thing that we struggled with predominately was support. Because it was relatively new, support was always a big issue and I think it's still a bit of an ongoing concern with the team currently managing it."
"This is a very expensive solution."
"Cloudera's support is extremely bad and cannot be relied on."
"The Cloudera training has deteriorated significantly."
"It could be faster and more user-friendly."
"We have not been able to use deduplication."
"Lack of sufficient documentation, particularly in Spanish."
"We'd like to see some AI model training for machine learning."
"SMB storage and HPC is not compatible and it should be supported by IBM Spectrum Computing."
"Spectrum Computing is lagging behind other products, most likely because it hasn't been shifted to the cloud."
"This solution is no longer managing tapes correctly."
More Cloudera Distribution for Hadoop Pricing and Cost Advice →