Try our new research platform with insights from 80,000+ expert users

Cloudera Distribution for Hadoop vs Spark SQL comparison

 

Comparison Buyer's Guide

Executive Summary

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Categories and Ranking

Cloudera Distribution for H...
Ranking in Hadoop
2nd
Average Rating
8.0
Reviews Sentiment
6.4
Number of Reviews
50
Ranking in other categories
NoSQL Databases (8th)
Spark SQL
Ranking in Hadoop
5th
Average Rating
7.8
Reviews Sentiment
7.6
Number of Reviews
14
Ranking in other categories
No ranking in other categories
 

Mindshare comparison

As of May 2025, in the Hadoop category, the mindshare of Cloudera Distribution for Hadoop is 25.7%, up from 24.2% compared to the previous year. The mindshare of Spark SQL is 10.5%, down from 12.1% compared to the previous year. It is calculated based on PeerSpot user engagement data.
Hadoop
 

Featured Reviews

Rok Dolinsek - PeerSpot reviewer
Enables on-premise implementation with powerful data processing capabilities
This is the only solution that is possible to install on-premise. Cloudera provides a hybrid solution that combines compute on cloud or on-premises. It includes all machine learning algorithms in the Spark machine learning library. All functionalities needed for a big data platform and ETL are on the platform, eliminating the need for other tools. It is scalable, ready for vertical scaling, and very powerful, offering numerous functionalities and configurations for generative AI.
Sahil Taneja - PeerSpot reviewer
Easy to use and do not require a learning curve
Spark SQL can improve the documentation they have provided. It can be a bit unclear at times. They could improve the documentation a bit more so that we can understand it more easily. Moreover, they could improve SparkUI to have more advanced versions of the performance and the queries and all.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"We had a data warehouse before all the data. We can process a lot more data structures."
"The search function is the most valuable aspect of the solution."
"This is the only solution that is possible to install on-premise."
"Cloudera, as a whole, is designed to provide organizations with solutions for big data."
"CDH has a wide variety of proprietary tools that we use, like Impala. So from that perspective, it's quite useful as opposed to something open-source. We get a lot of value from Cloudera's proprietary tools."
"Provides a viable open-source solution for enterprise implementations and reliable, intelligent data analysis."
"The product as a whole is good."
"The most valuable feature is Kubernetes."
"This solution is useful to leverage within a distributed ecosystem."
"The performance is one of the most important features. It has an API to process the data in a functional manner."
"Certain data sets that are very large are very difficult to process with Pandas and Python libraries. Spark SQL has helped us a lot with that."
"Overall the solution is excellent."
"Offers a variety of methods to design queries and incorporates the regular SQL syntax within tasks."
"The team members don't have to learn a new language and can implement complex tasks very easily using only SQL."
"I find the Thrift connection valuable."
"The solution is easy to understand if you have basic knowledge of SQL commands."
 

Cons

"The dashboard could be improved."
"This is a very expensive solution."
"The solution is not fit for on-premise distributions."
"There is a maximum of a one-gigabyte block size, which is an area of storage that can be improved upon."
"We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there is a lot of things that need to improve."
"Currently, we are using many other tools such as Spark and Blade Job to improve the performance."
"Cloudera's support is extremely bad and cannot be relied on."
"It could be faster and more user-friendly."
"Being a new user, I am not able to find out how to partition it correctly. I probably need more information or knowledge. In other database solutions, you can easily optimize all partitions. I haven't found a quicker way to do that in Spark SQL. It would be good if you don't need a partition here, and the system automatically partitions in the best way. They can also provide more educational resources for new users."
"There should be better integration with other solutions."
"The solution needs to include graphing capabilities. Including financial charts would help improve everything overall."
"It would be beneficial for aggregate functions to include a code block or toolbox that explains its calculations or supported conditional statements."
"SparkUI could have more advanced versions of the performance and the queries and all."
"It would be useful if Spark SQL integrated with some data visualization tools."
"Anything to improve the GUI would be helpful."
"This solution could be improved by adding monitoring and integration for the EMR."
 

Pricing and Cost Advice

"When comparing with Oracle Sybase and SQL, it's cheaper. It's not expensive."
"The product’s price depends from project to project."
"The tool is expensive...For the SMB market or customers whose environments are not that complex and do not have multiple systems running, Cloudera might not be a good option."
"It is an expensive product."
"The price is very high. The solution is expensive."
"The tool is not expensive."
"I wouldn't recommend CDH to others because of its high cost."
"The pricing must be improved."
"The on-premise solution is quite expensive in terms of hardware, setting up the cluster, memory, hardware and resources. It depends on the use case, but in our case with a shared cluster which is quite large, it is quite expensive."
"The solution is bundled with Palantir Foundry at no extra charge."
"We don't have to pay for licenses with this solution because we are working in a small market, and we rely on open-source because the budgets of projects are very small."
"The solution is open-sourced and free."
"We use the open-source version, so we do not have direct support from Apache."
"There is no license or subscription for this solution."
report
Use our free recommendation engine to learn which Hadoop solutions are best for your needs.
849,963 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
25%
Computer Software Company
15%
Educational Organization
14%
Manufacturing Company
6%
Financial Services Firm
22%
Computer Software Company
16%
Manufacturing Company
8%
Retailer
7%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What do you like most about Cloudera Distribution for Hadoop?
The tool can be deployed using different container technologies, which makes it very scalable.
What is your experience regarding pricing and costs for Cloudera Distribution for Hadoop?
The price for Cloudera is average, yet it is very good compared to other solutions. It can be deployed on-premises, unlike competitors' cloud-only solutions.
What needs improvement with Cloudera Distribution for Hadoop?
It is quite complicated to configure and install. Integrating the platform into an information system is always a challenge, especially when starting with on-premise implementation. Integrating wit...
What do you like most about Spark SQL?
Spark SQL's efficiency in managing distributed data and its simplicity in expressing complex operations make it an essential part of our data pipeline.
What is your experience regarding pricing and costs for Spark SQL?
We don't have to pay for licenses with this solution because we are working in a small market, and we rely on open-source because the budgets of projects are very small.
What needs improvement with Spark SQL?
In terms of improvement, the only thing that could be enhanced is the stability aspect of Spark SQL. There could be additional features that I haven't explored but the current solution for working ...
 

Overview

 

Sample Customers

37signals, Adconion,adgooroo, Aggregate Knowledge, AMD, Apollo Group, Blackberry, Box, BT, CSC
UC Berkeley AMPLab, Amazon, Alibaba Taobao, Kenshoo, Hitachi Solutions
Find out what your peers are saying about Cloudera Distribution for Hadoop vs. Spark SQL and other solutions. Updated: April 2025.
849,963 professionals have used our research since 2012.