Try our new research platform with insights from 80,000+ expert users
Apache Spark Logo

Apache Spark pros and cons

Vendor: Apache
4.2 out of 5
Badge Ranked 1

Pros & Cons summary

Buyer's Guide

Get pricing advice, tips, use cases and valuable features from real users of this product.
Get the report

Prominent pros & cons

PROS

Apache Spark delivers exceptional performance and memory processing with in-memory capabilities, offering significant improvements over traditional tools.
The system's fault tolerance and scalability are highly valued, enabling efficient data handling for extensive datasets.
Apache Spark's support for machine learning, real-time streaming, data processing, and integration with AI libraries is outstanding.
The ease of use, combined with SQL compliance and a clear syntax, makes Apache Spark user-friendly and efficient for analytical tasks.
Apache Spark's distributed computing framework and workload distribution capabilities lead to enhanced processing speed and efficiency.

CONS

Apache Spark could benefit from expanding its support for more diverse programming languages and database connectors.
Apache Spark's stream processing capabilities and dynamic DataFrame options are areas that require development.
Apache Spark's machine learning and algorithmic features need enhancement to provide more comprehensive solutions for developers.
Apache Spark has a significant learning curve, requiring technical expertise that might hinder adoption by less technical users.
Apache Spark could improve its resource management and optimization techniques to enhance overall performance.
 

Apache Spark Pros review quotes

VM
Feb 20, 2024
The solution is scalable.
Madhan Potluri - PeerSpot reviewer
Aug 5, 2024
The product's initial setup phase was easy.
Omar Khaled - PeerSpot reviewer
Aug 12, 2025
Apache Spark resolves many problems in the MapReduce solution and Hadoop, such as the inability to run effective Python or machine learning algorithms.
Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: August 2025.
865,164 professionals have used our research since 2012.
reviewer879201 - PeerSpot reviewer
Dec 23, 2019
I feel the streaming is its best feature.
SurjitChoudhury - PeerSpot reviewer
Feb 20, 2024
Now, when we're tackling sentiment analysis using NLP technologies, we deal with unstructured data—customer chats, feedback on promotions or demos, and even media like images, audio, and video files. For processing such data, we rely on PySpark. Beneath the surface, Spark functions as a compute engine with in-memory processing capabilities, enhancing performance through features like broadcasting and caching. It's become a crucial tool, widely adopted by 90% of companies for a decade or more.
KK
Oct 30, 2020
AI libraries are the most valuable. They provide extensibility and usability. Spark has a lot of connectors, which is a very important and useful feature for AI. You need to connect a lot of points for AI, and you have to get data from those systems. Connectors are very wide in Spark. With a Spark cluster, you can get fast results, especially for AI.
Miodrag Milojevic - PeerSpot reviewer
Jul 25, 2023
It's easy to prepare parallelism in Spark, run the solution with specific parameters, and get good performance.
Ilya Afanasyev - PeerSpot reviewer
Aug 3, 2022
There's a lot of functionality.
RV
Jul 23, 2020
The memory processing engine is the solution's most valuable aspect. It processes everything extremely fast, and it's in the cluster itself. It acts as a memory engine and is very effective in processing data correctly.
NK
Aug 1, 2022
Its scalability and speed are very valuable. You can scale it a lot. It is a great technology for big data. It is definitely better than a lot of earlier warehouse or pipeline solutions, such as Informatica. Spark SQL is very compliant with normal SQL that we have been using over the years. This makes it easy to code in Spark. It is just like using normal SQL. You can use the APIs of Spark or you can directly write SQL code and run it. This is something that I feel is useful in Spark.
 

Apache Spark Cons review quotes

VM
Feb 20, 2024
The setup I worked on was really complex.
Madhan Potluri - PeerSpot reviewer
Aug 5, 2024
From my perspective, the only thing that needs improvement is the interface, as it was not easily understandable.
Omar Khaled - PeerSpot reviewer
Aug 12, 2025
The basic improvement would be to have integration with these solutions.
Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: August 2025.
865,164 professionals have used our research since 2012.
reviewer879201 - PeerSpot reviewer
Dec 23, 2019
When you want to extract data from your HDFS and other sources then it is kind of tricky because you have to connect with those sources.
SurjitChoudhury - PeerSpot reviewer
Feb 20, 2024
There could be enhancements in optimization techniques, as there are some limitations in this area that could be addressed to further refine Spark's performance.
KK
Oct 30, 2020
Stream processing needs to be developed more in Spark. I have used Flink previously. Flink is better than Spark at stream processing.
Miodrag Milojevic - PeerSpot reviewer
Jul 25, 2023
If you have a Spark session in the background, sometimes it's very hard to kill these sessions because of D allocation.
Ilya Afanasyev - PeerSpot reviewer
Aug 3, 2022
I know there is always discussion about which language to write applications in and some people do love Scala. However, I don't like it.
RV
Jul 23, 2020
The graphical user interface (UI) could be a bit more clear. It's very hard to figure out the execution logs and understand how long it takes to send everything. If an execution is lost, it's not so easy to understand why or where it went. I have to manually drill down on the data processes which takes a lot of time. Maybe there could be like a metrics monitor, or maybe the whole log analysis could be improved to make it easier to understand and navigate.
NK
Aug 1, 2022
Its UI can be better. Maintaining the history server is a little cumbersome, and it should be improved. I had issues while looking at the historical tags, which sometimes created problems. You have to separately create a history server and run it. Such things can be made easier. Instead of separately installing the history server, it can be made a part of the whole setup so that whenever you set it up, it becomes available.