Try our new research platform with insights from 80,000+ expert users

Apache Spark pros and cons

Vendor: Apache

4.2 out of 5

67 reviews
90% willing to recommend

Pros & Cons summary

Apache Spark excels in performance due to its in-memory processing and distributed computing, enabling faster data computing. It supports machine learning, streaming, and scalability, suitable for large datasets. User-friendly SQL compliance and flexible APIs aid integration, while extensive libraries offer high ROI. Challenges include tuning stability between APIs, limited programming language support, and improving connectors for cloud databases, with a significant learning curve and the need for more machine learning algorithm support.

Buyer's Guide

Get pricing advice, tips, use cases and valuable features from real users of this product.

Prominent pros & cons

PROS

Apache Spark delivers exceptional performance and memory processing with in-memory capabilities, offering significant improvements over traditional tools.

The system's fault tolerance and scalability are highly valued, enabling efficient data handling for extensive datasets.

Apache Spark's support for machine learning, real-time streaming, data processing, and integration with AI libraries is outstanding.

The ease of use, combined with SQL compliance and a clear syntax, makes Apache Spark user-friendly and efficient for analytical tasks.

Apache Spark's distributed computing framework and workload distribution capabilities lead to enhanced processing speed and efficiency.

CONS

Apache Spark could benefit from expanding its support for more diverse programming languages and database connectors.

Apache Spark's stream processing capabilities and dynamic DataFrame options are areas that require development.

Apache Spark's machine learning and algorithmic features need enhancement to provide more comprehensive solutions for developers.

Apache Spark has a significant learning curve, requiring technical expertise that might hinder adoption by less technical users.

Apache Spark could improve its resource management and optimization techniques to enhance overall performance.

Apache Spark Pros review quotes

VM

Cloud solution architect at 0

Feb 20, 2024

The solution is scalable.

Read full review

Head of Data at a energy/utilities company with 51-200 employees

Aug 5, 2024

The product's initial setup phase was easy.

Read full review

Data Engineer at a tech company with 10,001+ employees

Aug 12, 2025

Apache Spark resolves many problems in the MapReduce solution and Hadoop, such as the inability to run effective Python or machine learning algorithms.

Read full review

Free Report: Apache Spark Reviews and More

Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: September 2025.

867,676 professionals have used our research since 2012.

Technical Consultant at a tech services company with 1-10 employees

Dec 23, 2019

I feel the streaming is its best feature.

Read full review

SurjitChoudhury

Data engineer at Cocos pt

Feb 20, 2024

Now, when we're tackling sentiment analysis using NLP technologies, we deal with unstructured data—customer chats, feedback on promotions or demos, and even media like images, audio, and video files. For processing such data, we rely on PySpark. Beneath the surface, Spark functions as a compute engine with in-memory processing capabilities, enhancing performance through features like broadcasting and caching. It's become a crucial tool, widely adopted by 90% of companies for a decade or more.

Read full review

KK

Software Architect at Akbank

Oct 30, 2020

AI libraries are the most valuable. They provide extensibility and usability. Spark has a lot of connectors, which is a very important and useful feature for AI. You need to connect a lot of points for AI, and you have to get data from those systems. Connectors are very wide in Spark. With a Spark cluster, you can get fast results, especially for AI.

Read full review

Miodrag Milojevic

Senior Data Archirect at Yettel

Jul 25, 2023

It's easy to prepare parallelism in Spark, run the solution with specific parameters, and get good performance.

Read full review

Senior Software Development Engineer at Yahoo!

Aug 3, 2022

There's a lot of functionality.

Read full review

RV

Rajendran Veerappan

Director at Nihil Solutions

Jul 23, 2020

The memory processing engine is the solution's most valuable aspect. It processes everything extremely fast, and it's in the cluster itself. It acts as a memory engine and is very effective in processing data correctly.

Read full review

NK

Director of Enginnering at Sigmoid

Aug 1, 2022

Its scalability and speed are very valuable. You can scale it a lot. It is a great technology for big data. It is definitely better than a lot of earlier warehouse or pipeline solutions, such as Informatica. Spark SQL is very compliant with normal SQL that we have been using over the years. This makes it easy to code in Spark. It is just like using normal SQL. You can use the APIs of Spark or you can directly write SQL code and run it. This is something that I feel is useful in Spark.

Read full review

Show 10 more reviews (out of 55)

Apache Spark Cons review quotes

VM

Cloud solution architect at 0

Feb 20, 2024

The setup I worked on was really complex.

Read full review

Head of Data at a energy/utilities company with 51-200 employees

Aug 5, 2024

From my perspective, the only thing that needs improvement is the interface, as it was not easily understandable.

Read full review

Data Engineer at a tech company with 10,001+ employees

Aug 12, 2025

The basic improvement would be to have integration with these solutions.

Read full review

Free Report: Apache Spark Reviews and More

Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: September 2025.

867,676 professionals have used our research since 2012.

Technical Consultant at a tech services company with 1-10 employees

Dec 23, 2019

When you want to extract data from your HDFS and other sources then it is kind of tricky because you have to connect with those sources.

Read full review

SurjitChoudhury

Data engineer at Cocos pt

Feb 20, 2024

There could be enhancements in optimization techniques, as there are some limitations in this area that could be addressed to further refine Spark's performance.

Read full review

KK

Software Architect at Akbank

Oct 30, 2020

Stream processing needs to be developed more in Spark. I have used Flink previously. Flink is better than Spark at stream processing.

Read full review

Miodrag Milojevic

Senior Data Archirect at Yettel

Jul 25, 2023

If you have a Spark session in the background, sometimes it's very hard to kill these sessions because of D allocation.

Read full review

Senior Software Development Engineer at Yahoo!

Aug 3, 2022

I know there is always discussion about which language to write applications in and some people do love Scala. However, I don't like it.

Read full review

RV

Rajendran Veerappan

Director at Nihil Solutions

Jul 23, 2020

The graphical user interface (UI) could be a bit more clear. It's very hard to figure out the execution logs and understand how long it takes to send everything. If an execution is lost, it's not so easy to understand why or where it went. I have to manually drill down on the data processes which takes a lot of time. Maybe there could be like a metrics monitor, or maybe the whole log analysis could be improved to make it easier to understand and navigate.

Read full review

NK

Director of Enginnering at Sigmoid

Aug 1, 2022

Its UI can be better. Maintaining the history server is a little cumbersome, and it should be improved. I had issues while looking at the historical tags, which sometimes created problems. You have to separately create a history server and run it. Such things can be made easier. Instead of separately installing the history server, it can be made a part of the whole setup so that whenever you set it up, it becomes available.

Read full review

Show 10 more reviews (out of 55)

Product Categories

Product Categories

Compute Service

Java Frameworks

Popular Comparisons

Popular Comparisons

Spring Boot vs Apache Spark

Jakarta EE vs Apache Spark

Amazon EMR vs Apache Spark

AWS Lambda vs Apache Spark

Cloudera Distribution for Hadoop vs Apache Spark

AWS Fargate vs Apache Spark

Apache NiFi vs Apache Spark

AWS Batch vs Apache Spark

Spot vs Apache Spark

Amazon EC2 Auto Scaling vs Apache Spark

Amazon EC2 vs Apache Spark

Vert.x vs Apache Spark

HPE Ezmeral Data Fabric vs Apache Spark

Spring MVC vs Apache Spark

Amazon Corretto vs Apache Spark

See all alternatives

Product Categories

Product Categories

Compute Service

Java Frameworks

Popular Comparisons

Popular Comparisons

Spring Boot vs Apache Spark

Jakarta EE vs Apache Spark

Amazon EMR vs Apache Spark

AWS Lambda vs Apache Spark

Cloudera Distribution for Hadoop vs Apache Spark

AWS Fargate vs Apache Spark

Apache NiFi vs Apache Spark

AWS Batch vs Apache Spark

Spot vs Apache Spark

Amazon EC2 Auto Scaling vs Apache Spark

Amazon EC2 vs Apache Spark

Vert.x vs Apache Spark

HPE Ezmeral Data Fabric vs Apache Spark

Spring MVC vs Apache Spark

Amazon Corretto vs Apache Spark

See all alternatives

Related questions

44

Which is the best RDMBS solution for big data?

102

Apache Spark without Hadoop -- Is this recommended?

64

Which solution has better performance: Spring Boot or Apache Spark?

12

AWS EMR vs Hadoop

8

Handling real and fast data - how do BigInsight and other solutions perform?

4

When evaluating Hadoop, what aspect do you think is the most important to look for?

6

Should we choose InfoSphere BigInsights or Cloudera?