Try our new research platform with insights from 80,000+ expert users

Cloudera Distribution for Hadoop vs Spark SQL comparison

 

Comparison Buyer's Guide

Executive Summary

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Categories and Ranking

Cloudera Distribution for H...
Ranking in Hadoop
2nd
Average Rating
8.0
Reviews Sentiment
6.3
Number of Reviews
51
Ranking in other categories
NoSQL Databases (8th)
Spark SQL
Ranking in Hadoop
5th
Average Rating
7.8
Reviews Sentiment
7.6
Number of Reviews
14
Ranking in other categories
No ranking in other categories
 

Mindshare comparison

As of August 2025, in the Hadoop category, the mindshare of Cloudera Distribution for Hadoop is 23.3%, down from 25.0% compared to the previous year. The mindshare of Spark SQL is 10.4%, down from 11.3% compared to the previous year. It is calculated based on PeerSpot user engagement data.
Hadoop
 

Featured Reviews

Rok Dolinsek - PeerSpot reviewer
Enables on-premise implementation with powerful data processing capabilities
This is the only solution that is possible to install on-premise. Cloudera provides a hybrid solution that combines compute on cloud or on-premises. It includes all machine learning algorithms in the Spark machine learning library. All functionalities needed for a big data platform and ETL are on the platform, eliminating the need for other tools. It is scalable, ready for vertical scaling, and very powerful, offering numerous functionalities and configurations for generative AI.
SurjitChoudhury - PeerSpot reviewer
Offers the flexibility to handle large-scale data processing
My experience with the initial setup of Spark SQL was relatively smooth. Understanding the system wasn't overly difficult because the data was structured in databases, and we could use notebooks for coding in Python or Java. Configuring networks and running scripts to load data into the database were routine tasks that didn't pose significant challenges. The flexibility to use different languages for coding and the ability to process data using key-value pairs in Python made the setup adaptable. Once we received the source data, processing it in SparkSQL involved writing scripts to create dimension and fact tables, which became a standard part of our workflow. Setting up Spark SQL was reasonably quick, but sometimes we face performance issues, especially during data loading into the SQL Server data warehouse. Sequencing notebooks for efficient job runs is crucial, and managing complex tasks with multiple notebooks requires careful tracking. Exploring ways to optimize this process could be beneficial. However, once you are familiar with the database architecture and project tools, understanding and adapting to the system become more straightforward.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"The product provides better data processing features than other tools."
"The product as a whole is good."
"With a cluster available, you can manage the security layer using the shared SDX - it provides flexibility."
"The file system is a valuable feature."
"Cloudera Distribution for Hadoop provides numerous features and capabilities combined into one platform, offers power processing, supports different file systems and query engines, and provides parallel processing for handling many requests."
"The product is completely secure."
"Provides a viable open-source solution for enterprise implementations and reliable, intelligent data analysis."
"Customer service and support were able to fix whatever the issue was."
"This solution is useful to leverage within a distributed ecosystem."
"The performance is one of the most important features. It has an API to process the data in a functional manner."
"Data validation and ease of use are the most valuable features."
"Certain data sets that are very large are very difficult to process with Pandas and Python libraries. Spark SQL has helped us a lot with that."
"I find the Thrift connection valuable."
"Offers a variety of methods to design queries and incorporates the regular SQL syntax within tasks."
"The solution is easy to understand if you have basic knowledge of SQL commands."
"Spark SQL's efficiency in managing distributed data and its simplicity in expressing complex operations make it an essential part of our data pipeline."
 

Cons

"The procedure for operations could be simplified."
"Cloudera Distribution for Hadoop is not always completely stable in some cases, which can be a concern for big data solutions."
"The pricing needs to improve."
"The price of this solution could be lowered."
"The security of this solution could be improved. There should also be a way to basically have a blockchain enabled storage with the HDFS."
"Cloudera Distribution for Hadoop has a limited feature list and a lot of costs involved."
"I would like to see an improvement in how the solution helps me to handle the whole cluster."
"If they could support modifying the data more easily than the current implementation, it would be beneficial."
"There are many inconsistencies in syntax for the different querying tasks."
"The solution needs to include graphing capabilities. Including financial charts would help improve everything overall."
"It takes a bit of time to get used to using this solution versus Pandas as it has a steep learning curve."
"Being a new user, I am not able to find out how to partition it correctly. I probably need more information or knowledge. In other database solutions, you can easily optimize all partitions. I haven't found a quicker way to do that in Spark SQL. It would be good if you don't need a partition here, and the system automatically partitions in the best way. They can also provide more educational resources for new users."
"It would be beneficial for aggregate functions to include a code block or toolbox that explains its calculations or supported conditional statements."
"I've experienced some incompatibilities when using the Delta Lake format."
"There should be better integration with other solutions."
"In terms of improvement, the only thing that could be enhanced is the stability aspect of Spark SQL."
 

Pricing and Cost Advice

"Cloudera requires a license to use."
"The pricing must be improved."
"The price is very high. The solution is expensive."
"It is an expensive product."
"The tool is expensive...For the SMB market or customers whose environments are not that complex and do not have multiple systems running, Cloudera might not be a good option."
"The price could be better for the product."
"I wouldn't recommend CDH to others because of its high cost."
"When comparing with Oracle Sybase and SQL, it's cheaper. It's not expensive."
"There is no license or subscription for this solution."
"The solution is bundled with Palantir Foundry at no extra charge."
"The solution is open-sourced and free."
"The on-premise solution is quite expensive in terms of hardware, setting up the cluster, memory, hardware and resources. It depends on the use case, but in our case with a shared cluster which is quite large, it is quite expensive."
"We use the open-source version, so we do not have direct support from Apache."
"We don't have to pay for licenses with this solution because we are working in a small market, and we rely on open-source because the budgets of projects are very small."
report
Use our free recommendation engine to learn which Hadoop solutions are best for your needs.
865,384 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
19%
Educational Organization
17%
Computer Software Company
12%
Energy/Utilities Company
6%
Financial Services Firm
17%
University
10%
Retailer
10%
Manufacturing Company
9%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What do you like most about Cloudera Distribution for Hadoop?
The tool can be deployed using different container technologies, which makes it very scalable.
What is your experience regarding pricing and costs for Cloudera Distribution for Hadoop?
The price for Cloudera is average, yet it is very good compared to other solutions. It can be deployed on-premises, unlike competitors' cloud-only solutions.
What needs improvement with Cloudera Distribution for Hadoop?
It is quite complicated to configure and install. Integrating the platform into an information system is always a challenge, especially when starting with on-premise implementation. Integrating wit...
What do you like most about Spark SQL?
Spark SQL's efficiency in managing distributed data and its simplicity in expressing complex operations make it an essential part of our data pipeline.
What is your experience regarding pricing and costs for Spark SQL?
We don't have to pay for licenses with this solution because we are working in a small market, and we rely on open-source because the budgets of projects are very small.
What needs improvement with Spark SQL?
In terms of improvement, the only thing that could be enhanced is the stability aspect of Spark SQL. There could be additional features that I haven't explored but the current solution for working ...
 

Overview

 

Sample Customers

37signals, Adconion,adgooroo, Aggregate Knowledge, AMD, Apollo Group, Blackberry, Box, BT, CSC
UC Berkeley AMPLab, Amazon, Alibaba Taobao, Kenshoo, Hitachi Solutions
Find out what your peers are saying about Cloudera Distribution for Hadoop vs. Spark SQL and other solutions. Updated: July 2025.
865,384 professionals have used our research since 2012.