No more typing reviews! Try our Samantha, our new voice AI agent.

Cloudera Data Platform vs Spark SQL comparison

 

Comparison Buyer's Guide

Executive SummaryUpdated on Jan 18, 2026

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Categories and Ranking

Cloudera Data Platform
Average Rating
7.6
Reviews Sentiment
5.5
Number of Reviews
37
Ranking in other categories
Cloud Master Data Management (MDM) (7th), Data Management Platforms (DMP) (4th), AI Data Analysis (6th)
Spark SQL
Average Rating
7.8
Reviews Sentiment
7.6
Number of Reviews
15
Ranking in other categories
Hadoop (5th)
 

Mindshare comparison

Cloudera Data Platform and Spark SQL aren’t in the same category and serve different purposes. Cloudera Data Platform is designed for Data Management Platforms (DMP) and holds a mindshare of 9.0%, up 1.4% compared to last year.
Spark SQL, on the other hand, focuses on Hadoop, holds 6.1% mindshare, down 10.2% since last year.
Data Management Platforms (DMP) Mindshare Distribution
ProductMindshare (%)
Cloudera Data Platform9.0%
Palantir Foundry15.6%
Informatica Intelligent Data Management Cloud (IDMC)10.1%
Other65.3%
Data Management Platforms (DMP)
Hadoop Mindshare Distribution
ProductMindshare (%)
Spark SQL6.1%
Cloudera Distribution for Hadoop14.1%
HPE Data Fabric13.5%
Other66.3%
Hadoop
 

Featured Reviews

T Sarwar - PeerSpot reviewer
Data architect at SentientAI, Karachi
Has enabled efficient big data processing and querying but remains complex to manage and configure
Cloudera Data Platform should use fewer tools and remove the complexity between them. It should make it easier for the end user to change the configuration and understand it better. The UI tool for jobs in Cloudera Data Platform can be improved to provide a proper image of ETL jobs and detailed consolidated graphs to monitor Spark-based Hue jobs.
Kemal Duman - PeerSpot reviewer
Team Lead, Data Engineering at Nesine.com
Data pipelines have run faster and support flexible batch and streaming transformations
We do not have any performance problems, but we do have some resource problems. Spark SQL consumes so many resources that we migrated our streaming job from Spark to Apache Flink. Resource management in Spark SQL should be better. It consumes more resources, which is normal. The main reason we switched from Spark is memory and CPU consumption. The major reason is the resource problem because the number of streaming jobs has been increasing in our company. That is why we considered resource management as a priority. Because of the resource consumption, I would say the development of Spark SQL is better. For development purposes, it is a top product and not difficult to work with, but resources are the major problem. We changed to Flink regardless of development time. Development time is less in Spark compared with Flink.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"The best features Cloudera Data Platform offers are the processing power with Spark and the distributed data storage, HDFS, which helps us handle massive volumes of data."
"Hortonworks is the best, comparing all three flavors."
"The most valuable part of this product is what Cloudera Data Science Workbench can do as a whole for modeling and analysis."
"The Hadoop value proposition is in expanded functionality, linear scalability, and reduced software and infrastructure costs."
"The Hortonworks solution is very stable, it works as a production system without any error and without any downtime and if I have downtime, it is mostly caused by the hardware of the computers."
"The data platform is pretty neat. The workflow is also really good."
"My main use case for Cloudera Data Platform is to support a multi-source system with different data types, handling semi-structured, structured, and unstructured data to support those kinds of workloads."
"Ambari Web UI: user-friendly."
"Offers a variety of methods to design queries and incorporates the regular SQL syntax within tasks."
"The speed of getting data."
"Spark SQL gives us a handful of methods to design queries based on its own syntax and also incorporates the regular SQL syntax within tasks."
"The team members don't have to learn a new language and can implement complex tasks very easily using only SQL."
"Spark SQL's efficiency in managing distributed data and its simplicity in expressing complex operations make it an essential part of our data pipeline."
"This solution is useful to leverage within a distributed ecosystem."
"The scalability of the solution is good."
"This solution is useful to leverage within a distributed ecosystem."
 

Cons

"Cloudera Data Platform could improve by innovating more in terms of full-fledged support for AI workloads, enriching machine learning or LLM, as there haven't been updates in that aspect over the last one and a half years."
"Hive performance. If Hive performance increased, Hadoop would replace (not everywhere) traditional databases (Oracle/Teradata, etc.), which would save a lot of money for the company."
"The version control of the software is also an issue."
"Cloudera Data Platform can be improved in several areas. I recently attended their roadmap session. Whatever limitations they have identified involve moving data from on-premises to cloud as a single-pane view and better lineage."
"For on-premise use, I would not recommend Cloudera Data Platform as it is expensive and complicated to upgrade."
"Deleting any service requires a lot of clean up, unlike Cloudera."
"I have not seen enough innovation in Cloudera Data Platform, particularly in support for machine learning."
"The only issue that I had was when I tried to reinstall the software on every node."
"The solution needs to include graphing capabilities. Including financial charts would help improve everything overall."
"The initial setup is a bit complex."
"In the next release, maybe the visualization of some command-line features could be added."
"In terms of improvement, the only thing that could be enhanced is the stability aspect of Spark SQL."
"There are many inconsistencies in syntax for the different querying tasks like selecting columns and joining between two tables so I'd like to see a more consistent syntax."
"In the next update, we'd like to see better performance for small points of data. It is possible but there are better tools that are faster and cheaper."
"There are many inconsistencies in syntax for the different querying tasks."
"This solution could be improved by adding monitoring and integration for the EMR."
 

Pricing and Cost Advice

"It is priced well and it is affordable"
"Currently, we are using the product in a sandbox environment, and there is no licensing. We might choose a licensing option once we get the results."
"The on-premise solution is quite expensive in terms of hardware, setting up the cluster, memory, hardware and resources. It depends on the use case, but in our case with a shared cluster which is quite large, it is quite expensive."
"We don't have to pay for licenses with this solution because we are working in a small market, and we rely on open-source because the budgets of projects are very small."
"We use the open-source version, so we do not have direct support from Apache."
"The solution is bundled with Palantir Foundry at no extra charge."
"There is no license or subscription for this solution."
"The solution is open-sourced and free."
report
Use our free recommendation engine to learn which Data Management Platforms (DMP) solutions are best for your needs.
885,837 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Manufacturing Company
12%
Construction Company
10%
Financial Services Firm
9%
Marketing Services Firm
8%
Financial Services Firm
18%
University
13%
Retailer
11%
Healthcare Company
8%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
By reviewers
Company SizeCount
Small Business8
Midsize Enterprise7
Large Enterprise26
By reviewers
Company SizeCount
Small Business5
Midsize Enterprise6
Large Enterprise4
 

Questions from the Community

What is your experience regarding pricing and costs for Hortonworks Data Platform?
The experience with pricing, setup cost, and licensing is very good.
What needs improvement with Hortonworks Data Platform?
Areas for improvement with Cloudera Data Platform could be the initial learning curve that can be a step for teams new to big data economy systems. Platform setup and configuration require careful ...
What is your primary use case for Hortonworks Data Platform?
Cloudera Data Platform on AWS was adopted as the core enterprise data platform, covering the full data lifecycle from ingestion to analytics and advanced use cases. Cloudera Data Platform was used ...
What needs improvement with Spark SQL?
We do not have any performance problems, but we do have some resource problems. Spark SQL consumes so many resources that we migrated our streaming job from Spark to Apache Flink. Resource manageme...
What is your primary use case for Spark SQL?
Spark SQL has been in our stack for less than one year, though some of our colleagues are using it. It is a useful product for transformation jobs. We generally use Spark SQL for batch processing. ...
What advice do you have for others considering Spark SQL?
Regarding the Catalyst query optimizer, I think we are using it. We were using it in the past, but I am not certain if we use it now. We used it a long time ago. I rate my experience with Spark SQL...
 

Overview

 

Sample Customers

Information Not Available
UC Berkeley AMPLab, Amazon, Alibaba Taobao, Kenshoo, Hitachi Solutions
Find out what your peers are saying about Palantir, Informatica, Denodo and others in Data Management Platforms (DMP). Updated: March 2026.
885,837 professionals have used our research since 2012.