Apache Spark Reviews

Name: Apache Spark
Brand: Apache
Rating: 4.2 (69 reviews)

Vendor: Apache

4.2 out of 5

69 reviews
90% willing to recommend

Leave a review

What is Apache Spark?

Apache Spark is a leading open-source processing tool known for scalability and speed in managing large datasets. It supports both real-time and batch processing and is widely used for building data pipelines, machine learning applications, and analytics.

Get the Apache Spark Buyer's Guide and find out what your peers are saying about Apache Spark, Spring Boot, Spot by Flexera and more!

Apache Spark is the #1 ranked solution in top Hadoop solutions, #2 ranked solution in top Java Frameworks, and #6 ranked solution in top Compute Service solutions. PeerSpot users give Apache Spark an average rating of 8.4 out of 10. Apache Spark is most commonly compared to Spring Boot: Apache Spark vs Spring Boot. Apache Spark is popular among the large enterprise segment, accounting for 57% of users researching this solution on PeerSpot. The top industry researching this solution are professionals from a financial services firm, accounting for 23% of all views.

Helped 893,221 peers since 2012

Featured Apache Spark reviews

Devindra Weerasooriya

Data Architect at Devtech

The in-memory computation feature is certainly helpful for my processing tasks. It is helpful because while using structures that could be held in memory rather than stored during the period of computation, I go for the in-memory option, though there are limitations related to holding it in memory that need to be addressed, but I have a preference for in-memory computation. The solution is beneficial in that it provides a base-level long-held understanding of the framework that is not variant day by day, which is very helpful in my prototyping activity as an architect trying to assess Apache Spark, Great Expectations, and Vault-based solutions versus those proposed by clients like TIBCO or Informatica.

Read full review

Michael Lierheimer

Consultant, Chief Engineer, Teamleiter at infoteam Software AG

I find that there really lacks the technical depth to do any recommendations for future updates of Apache Spark. I used it for two years for our prototype work and testing things, but because I had no final project with a release and running at the customer side or other side, I cannot say what I would expect if I wanted to use it in a real project. Regarding the current licensing cost, I would say it is in the medium range. However, because I do not have a licensed project for our customer, I do not know if it would be too high for our customers if they have to buy the license for themselves. For me, compared to other things, the licensing was acceptable.

Read full review

Omar Khaled

Data Engineer at a tech company with 10,001+ employees

I can improve the organization's functions by taking less time to make decisions. To make the right decision, you need the right data, and a solution can provide this by hiring talent and employees who can consolidate data from different sources and organize it. Not all solutions can make this data fast enough to be used, except for solutions such as Apache Spark Structured Streaming. To make the right decision, you should have both accurate and fast data. Apache Spark itself is similar to the Python programming language. Python is a language with many libraries for mathematics and machine learning. Apache Spark is the solution, and within it, you have PySpark, which is the API for Apache Spark to write and run Python code. Within it, there are many APIs, including SQL APIs, allowing you to write SQL code within a Python function in Apache Spark. You can also use Apache Spark Structured Streaming and machine learning APIs.

Read full review

Apache Spark mindshare

Product category:

As of May 2026, the mindshare of Apache Spark in the Hadoop category stands at 13.6%, down from 17.6% compared to the previous year, according to calculations based on PeerSpot user engagement data.

Hadoop Mindshare Distribution
Product	Mindshare (%)
Apache Spark	13.6%
Cloudera Distribution for Hadoop	14.8%
HPE Data Fabric	10.5%
Other	61.1%

Hadoop

PeerResearch reports based on Apache Spark reviews

Type	Title	Date
Category	Hadoop	May 9, 2026	Download
Product	Reviews, tips, and advice from real users	May 9, 2026	Download
Comparison	Apache Spark vs Cloudera Distribution for Hadoop	May 9, 2026	Download
Comparison	Apache Spark vs Amazon EMR	May 9, 2026	Download
Comparison	Apache Spark vs HPE Data Fabric	May 9, 2026	Download

Valuable Features

Apache Spark excels in speed, scalability, and flexibility. Users find its in-memory data processing efficient, allowing large-scale data handling. Features like Spark SQL, machine learning libraries, and real-time streaming are highly valuable. Its distributed computing framework facilitates processing across multiple nodes. Integration with various platforms and support for multiple languages enhances its usability. Organizations appreciate its fault tolerance, ease of use, and capability to process both batch and streaming data effectively.

"One of Apache Spark's most valuable features is that it supports in-memory processing, the execution of jobs compared to traditional tools is very fast."
"As it uses in-memory data processing, Spark is very fast."
"It is useful for handling large amounts of data, and it is very useful for scientific purposes."

Room for Improvement

Apache Spark requires better real-time querying, improved user interface, and enhanced documentation. Its complexity and steep learning curve present challenges. Integration with more languages and machine learning tools is needed. Users face issues with garbage collection affecting performance and memory usage. Monitoring and debugging capabilities should be more user-friendly. Stability and scalability concerns exist, and better connectors for databases are necessary. Stream processing improvements and enhanced API stability are also suggested.

"Apache Spark could improve the connectors that it supports."
"When you are working with large, complex tasks, the garbage collection process is slow and affects performance."
"It is useful for scientific purposes, but for commercial use of big data, it gives some trouble."

ROI

Apache Spark delivers significant cost savings and performance enhancements. Users experience reduced operational expenses, with one reporting a 50% decrease. The tool's capability to leverage expertise and efficiencies results in high returns within a medium timeframe. Open-source nature implies ROI varies, yet substantial reductions in both time and money are reported. Operational advantages arise from lower costs due to expertise availability, though additional memory and infrastructure may increase performance costs. Overall, Spark considerably impacts cost-efficiency.

Pricing

Apache Spark is an open-source tool, primarily free of charge, but operational costs can vary based on deployment. While utilizing open-source versions incurs no licensing fees, cloud and infrastructure expenses can be significant. Certain services, such as Cloudera or Databricks, may add costs for enhanced support or bundled packages. Setup time generally spans four to five weeks. Licensing and costs depend on project specifics and can be influenced by existing infrastructure and platform choices.

"I did not pay anything when using the tool on cloud services, but I had to pay on the compute side. The tool is not expensive compared with the benefits it offers. I rate the price as an eight out of ten."
"Considering the product version used in my company, I feel that the tool is not costly since the product is available for free."
"The tool is an open-source product. If you're using the open-source Apache Spark, no fees are involved at any time. Charges only come into play when using it with other services like Databricks."

Popular Use Cases

Organizations primarily utilize Apache Spark for processing large datasets, executing data analytics, and conducting predictive analytics. Common tasks include real-time data streaming, data integration, data transformation, machine learning, and building ETL pipelines. Apache Spark's in-memory computing capabilities make it efficient for big data processing, enabling tasks like clustering, segmentation, and batch processing. It supports multiple environments, from cloud to on-premise, and integrates seamlessly with tools like Data Bricks and machine learning programs.

Service and Support

Apache Spark, as open-source, lacks official technical support, relying on community forums and documentation for guidance. Some users find this sufficient, pointing out the vibrant community and resources available online or via vendors like Cloudera. Others mention limitations in response times or quality due to the nature of open-source support. While free versions depend on the community, paid services from Databricks or Cloudera provide more structured assistance, which users find beneficial.

Deployment

Apache Spark's initial setup varies in complexity based on environment and expertise. Many find deploying in cloud environments like Databricks straightforward, often taking just minutes. In contrast, on-premise setups can be more challenging, requiring extensive configuration and time. Experience with distributed systems influences ease, with knowledgeable teams finding the process simpler. Some users note integration difficulties with additional services, while security configurations significantly increase complexity. Documentation and specialized consulting can facilitate smoother installations.

Scalability

Apache Spark is highly scalable, supporting both large and small teams across varied industries. Users appreciate its capacity for expansion, often employing additional monitoring and technical expertise for optimization. They highlight its versatility across different user types and its reliable performance with large data sets. While some find node addition challenging, others praise the straightforward scaling in cloud environments. Proper infrastructure management is key, ensuring strong performance and efficient resource use.

Stability

Apache Spark is widely stable according to user feedback. Companies find it reliable for large-scale operations, citing few bugs or crashes. It effectively handles tasks, though some experience challenges with initial setups or streaming data. Memory issues and optimization needs occur but are manageable with proper configuration. Users rate its stability high, appreciating its robust performance in handling big data workloads, especially with newer versions which have addressed previous difficulties.

These insights are based on the in-depth reviews provided by peers to help you make a better buying decision.

Download our Apache Spark Buyer's Guide for additional reliable information.

Review data by company size

By reviewers
Company Size	Count
Small Business	25
Midsize Enterprise	14
Large Enterprise	25

By reviewers

By visitors reading reviews
Company Size	Count
Small Business	131
Midsize Enterprise	48
Large Enterprise	242

By visitors reading reviews

Top industries

By visitors reading reviews

Financial Services Firm

23%

Comms Service Provider

Manufacturing Company

Computer Software Company

Marketing Services Firm

Government

University

Healthcare Company

Construction Company

Retailer

Insurance Company

Outsourcing Company

Educational Organization

Media Company

Performing Arts

Real Estate/Law Firm

Transportation Company

Legal Firm

Non Profit

Consumer Goods Company

Pharma/Biotech Company

Recreational Facilities/Services Company

Renewables & Environment Company

Religious Institution

Compare Apache Spark with alternative products

Learn more about Apache Spark

Apache Spark's strengths lie in its ability to process large data volumes efficiently through real-time and batch capabilities. With in-memory computation, it ensures fast data processing and significant performance gains. Its wide range of APIs, including those for machine learning, SQL, and analytics, make it versatile in handling complex data operations. While popular for ease of use and fault tolerance, Spark's management, debugging, and user-friendliness could benefit from improvements. Better GUIs, integration with BI tools, and enhanced monitoring are desired, alongside shuffling optimization and compatibility with more programming languages.

What are Apache Spark's key features?

Scalability: Efficiently manages large datasets across nodes.
Performance: In-memory computation for faster data processing.
Real-time Processing: Supports real-time analytics and data streaming.
APIs: Offers extensive APIs for machine learning, SQL, and analytics.

What benefits or ROI should users look for in reviews?

Ease of Use: Simplifies complex data tasks through intuitive operations.
Fault Tolerance: Ensures data reliability and continuous operations.
Integration Flexibility: Easily integrates with big data platforms and tools.

Organizations use Apache Spark predominantly for in-memory data processing, enabling seamless integration with big data frameworks. It's applied in security analytics, predictive modeling, and helps facilitate secure data transmissions in AI deployments. Industries leverage Spark's speed for sentiment analysis, data integration, and efficient ETL transformations.

Apache Spark customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions

Product Categories

Hadoop

Compute Service

Java Frameworks

Popular Comparisons

Spring Boot vs Apache Spark

Spot by Flexera vs Apache Spark

IBM Netezza Performance Server vs Apache Spark

AWS Lambda vs Apache Spark

Cloudera Distribution for Hadoop vs Apache Spark

Amazon EC2 vs Apache Spark

Amazon EMR vs Apache Spark

AWS Fargate vs Apache Spark

Jakarta EE vs Apache Spark

Apache NiFi vs Apache Spark

IBM Spectrum Computing vs Apache Spark

AWS Batch vs Apache Spark

HPE Data Fabric vs Apache Spark

Amazon EC2 Auto Scaling vs Apache Spark

Helidon vs Apache Spark

See all alternatives

Apache Spark Reviews Summary
Author info	Rating	Review Summary
Data Architect at Devtech	4.5	I’ve used Apache Spark for four years, mainly for data integration and access. Its in-memory processing and open-source flexibility suit my needs, despite some stability issues. I prefer it over commercial tools like Informatica due to cost and adaptability.
Consultant, Chief Engineer, Teamleiter at infoteam Software AG	4.0	I used Apache Spark for two years in an on-prem prototype; setup was straightforward and support was good. I liked its fast database access, transformation, and reliable data exchange/integration. Licensing seemed midrange, but the customer ultimately chose another technology.
Data Engineer at a tech company with 10,001+ employees	5.0	I use Apache Spark for real-time data processing and transformation across multiple sources like CRM and Siebel. It's reliable, fast, and improves our decision-making, though I see future needs for better integration with emerging cloud solutions.
Data Scientist at a financial services firm with 10,001+ employees	4.5	I primarily use Apache Spark for data processing tasks involving large datasets, appreciating its ease of use and portability. While it's efficient for both small and large datasets, the lack of support for geospatial data is a limitation.
Head of Data at a energy/utilities company with 51-200 employees	4.0	Apache Spark significantly reduced operational costs by 50% and although it supports parallel processing, it needs improvements in scalability and user-friendliness. Working with datasets isn't as straightforward as with Pandas, though it's flexible and functional.
Senior Software Architect at USEReady	4.0	I use Apache Spark for big data engineering, valuing its batch and streaming capabilities. While stable and scalable, its ecosystem is complex for beginners, and clustering setup can be tricky. I rate it 8/10.
Manager Data Analytics at a outsourcing company with 5,001-10,000 employees	3.5	We use Apache Spark to handle real-time data streaming and machine learning, significantly improving efficiency and reducing costs. It offers flexibility in scaling and integrates well with other tools, though its learning curve could be challenging for non-technical users.
Senior Developer at Infosys	3.5	My experience with Spark for large-scale distributed data transformations is positive due to its speed and cost reduction. While setup is complex and scheduling needs external tools, I recommend it for big data processing.

Title	Rating	Mindshare	Recommending
Spring Boot	4.2	N/A	95%	43 interviews Add to research
Spot by Flexera	4.3	N/A	100%	6 interviews Add to research

Apache Spark Reviews

What is Apache Spark?

Featured Apache Spark reviews

Apache Spark mindshare

PeerResearch reports based on Apache Spark reviews

Valuable Features

Room for Improvement

ROI

Pricing

Popular Use Cases

Service and Support

Deployment

Scalability

Stability

Review data by company size

Top industries

Compare Apache Spark with alternative products

Learn more about Apache Spark

Apache Spark customers

Related questions

Product Categories

Popular Comparisons