Try our new research platform with insights from 80,000+ expert users
Apache Spark Logo

Apache Spark Reviews

Vendor: Apache
4.2 out of 5
Badge Ranked 1
1,133 followers
Start review

What is Apache Spark?

Featured Apache Spark reviews

Apache Spark mindshare

As of May 2025, the mindshare of Apache Spark in the Hadoop category stands at 17.8%, down from 21.4% compared to the previous year, according to calculations based on PeerSpot user engagement data.
Hadoop

PeerResearch reports based on Apache Spark reviews

TypeTitleDate
CategoryHadoopMay 29, 2025Download
ProductReviews, tips, and advice from real usersMay 29, 2025Download
ComparisonApache Spark vs Cloudera Distribution for HadoopMay 29, 2025Download
ComparisonApache Spark vs Amazon EMRMay 29, 2025Download
ComparisonApache Spark vs HPE Ezmeral Data FabricMay 29, 2025Download
Suggested products
TitleRatingMindshareRecommending
Spring Boot4.2N/A95%38 interviewsAdd to research
Jakarta EE3.7N/A66%3 interviewsAdd to research
 
 
Key learnings from peers

Valuable Features

Room for Improvement

ROI

Pricing

Service and Support

Scalability

Stability

Review data by company size

By reviewers
By visitors reading reviews

Top industries

By visitors reading reviews
Financial Services Firm
27%
Computer Software Company
13%
Manufacturing Company
8%
Comms Service Provider
6%
University
5%
Retailer
5%
Government
5%
Insurance Company
4%
Educational Organization
4%
Healthcare Company
3%
Real Estate/Law Firm
3%
Media Company
2%
Hospitality Company
2%
Construction Company
2%
Energy/Utilities Company
2%
Non Profit
1%
Recreational Facilities/Services Company
1%
Transportation Company
1%
Pharma/Biotech Company
1%
Legal Firm
1%
Outsourcing Company
1%
Engineering Company
1%
Consumer Goods Company
1%
 

Apache Spark reviews

Sort by:
Dunstan Matekenya - PeerSpot user
Data Scientist at a financial services firm with 10,001+ employees
Verified user of Apache Spark
Jul 30, 2024
Open-source solution for data processing with portability

Pros

"Apache Spark is known for its ease of use. Compared to other available data processing frameworks, it is user-friendly."

Cons

"Apache Spark lacks geospatial data."
Bharghava Raghavendra Beesa - PeerSpot user
Senior Developer at Infosys
Verified user of Apache Spark
Jan 22, 2025
Faster data transformations achieved but scheduling dependencies require external solutions

Pros

"Spark is used for transformations from large volumes of data, and it is usefully distributed."

Cons

"The Spark solution could improve in scheduling tasks and managing dependencies. "
Find out what your peers are saying about Apache Spark. Updated May 2025
853,271 professionals have used our research since 2012.
Madhan Potluri - PeerSpot user
Head of Data at a energy/utilities company with 51-200 employees
Verified user of Apache Spark
Aug 13, 2024
Offers user-friendliness, clarity and flexibility

Pros

"The product's initial setup phase was easy."

Cons

"From my perspective, the only thing that needs improvement is the interface, as it was not easily understandable."
KamleshPant - PeerSpot user
Senior Software Architect at USEReady
Verified user of Apache Spark
Apr 24, 2025
Handles both batch and streaming data efficiently for real-time processing
AM
Head of Data Science center of excellence at Ameriabank CJSC
Verified user of Apache Spark
Sep 25, 2024
Product version discussed: 3.0
Enhanced data processing with good support and helpful integration with Pandas syntax in distributed mode

Pros

"The most significant advantage of Spark 3.0 is its support for DataFrame UDF Pandas UDF features. "

Cons

"The main concern is the overhead of Java when distributed processing is not necessary. "
SS
Sr Manager at a transportation company with 10,001+ employees
Verified user of Apache Spark
Dec 11, 2023
Offers real-time and near-real-time data processing

Pros

"We use it for ETL purposes as well as for implementing the full transformation pipelines."

Cons

"Apart from the restrictions that come with its in-memory implementation. It has been improved significantly up to version 3.0, which is currently in use. "
PeerSpot user
Manager Data Analytics at a consultancy with 10,001+ employees
Verified user of Apache Spark
Aug 13, 2024
A flexible solution with real-time processing capabilities

Pros

"I like Apache Spark's flexibility the most. Before, we had one server that would choke up. With the solution, we can easily add more nodes when needed. The machine learning models are also really helpful. We use them to predict energy theft and find infrastructure problems. "

Cons

"For improvement, I think the tool could make things easier for people who aren't very technical. There's a significant learning curve, and I've seen organizations give up because of it. Making it quicker or easier for non-technical people would be beneficial. "
SurjitChoudhury - PeerSpot user
Data engineer at Cocos pt
Verified user of Apache Spark
Mar 16, 2024
Offers batch processing of data and in-memory processing in Spark greatly enhances performance

Pros

"Now, when we're tackling sentiment analysis using NLP technologies, we deal with unstructured data—customer chats, feedback on promotions or demos, and even media like images, audio, and video files. For processing such data, we rely on PySpark. Beneath the surface, Spark functions as a compute engine with in-memory processing capabilities, enhancing performance through features like broadcasting and caching. It's become a crucial tool, widely adopted by 90% of companies for a decade or more."

Cons

"There could be enhancements in optimization techniques, as there are some limitations in this area that could be addressed to further refine Spark's performance."