Try our new research platform with insights from 80,000+ expert users
Apache Spark Logo

Apache Spark pros and cons

Vendor: Apache
4.2 out of 5
Badge Ranked 1

Pros & Cons summary

Buyer's Guide

Get pricing advice, tips, use cases and valuable features from real users of this product.
Get the report

Prominent pros & cons

PROS

Apache Spark offers exceptional speed and efficiency in data processing, significantly outperforming traditional tools by effectively managing parallel processing and handling large data volumes.
It boasts strong scalability, enabling efficient handling of extensive datasets by distributing workloads across multiple nodes, which enhances performance and flexibility.
The in-memory processing engine is highly valuable, allowing for rapid data handling and processing by utilizing RAM rather than disk storage, which leads to enhanced execution speed.
Apache Spark supports a wide range of machine learning processes with a scalable library, providing valuable functionalities for real-time data processing, streaming, and analytical tasks.
The flexibility and user-friendly nature of Apache Spark make it easy to deploy and integrate seamlessly with existing processes, supported by clear documentation, which increases its adoption in various industries.

CONS

Apache Spark requires significant technical expertise to deploy and run high-tech tools, making it challenging for users without a technical background.
Apache Spark lacks support for certain machine learning libraries, models, and neural network-related algorithms, limiting its use in some applications.
Apache Spark's initial setup and installation are complex, demanding a considerable learning curve for practitioners.
Optimization techniques in Apache Spark have limitations, particularly when handling large data sets, affecting performance and efficiency.
Integration with popular databases and third-party platforms needs improvement, as current support often requires workarounds.
 

Apache Spark Pros review quotes

it_user372393 - PeerSpot reviewer
Big Data Consultant at a tech services company with 501-1,000 employees
Aug 25, 2017
The good performance. The nice graphical management console. The long list of ML algorithms.
it_user326142 - PeerSpot reviewer
Architect at a healthcare company with 51-200 employees
Sep 26, 2017
ETL and streaming capabilities.
it_user746673 - PeerSpot reviewer
Sr. Software Engineer at a tech vendor with 1-10 employees
Oct 1, 2017
The most valuable feature is the Fault Tolerance and easy binding with other processes like Machine Learning, graph analytics.
Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: March 2026.
884,192 professionals have used our research since 2012.
it_user746943 - PeerSpot reviewer
Big Data and Cloud Solution Consultant at a financial services firm with 10,001+ employees
Oct 2, 2017
DataFrame: Spark SQL gives the leverage to create applications more easily and with less coding effort.
it_user786777 - PeerSpot reviewer
Manager | Data Science Enthusiast | Management Consultant at a consultancy with 5,001-10,000 employees
Dec 10, 2017
With Hadoop-related technologies, we can distribute the workload with multiple commodity hardware.
reviewer894894 - PeerSpot reviewer
Solutions Architect at a computer software company with 51-200 employees
Jun 27, 2018
Features include machine learning, real time streaming, and data processing.
it_user946074 - PeerSpot reviewer
Principal Architect at a financial services firm with 1,001-5,000 employees
Jul 10, 2019
I found the solution stable. We haven't had any problems with it.
LC
Snr Security Engineer at a tech vendor with 201-500 employees
Jul 14, 2019
The scalability has been the most valuable aspect of the solution.
reviewer1046250 - PeerSpot reviewer
Senior Consultant & Training at a tech services company with 51-200 employees
Oct 13, 2019
The most valuable feature of this solution is its capacity for processing large amounts of data.
MG
Director of BigData Offer at IVIDATA
Dec 9, 2019
The solution is very stable.
 

Apache Spark Cons review quotes

it_user372393 - PeerSpot reviewer
Big Data Consultant at a tech services company with 501-1,000 employees
Aug 25, 2017
Apache Spark provides very good performance The tuning phase is still tricky.
it_user326142 - PeerSpot reviewer
Architect at a healthcare company with 51-200 employees
Sep 26, 2017
Stability in terms of API (things were difficult, when transitioning from RDD to DataFrames, then to DataSet).
it_user746673 - PeerSpot reviewer
Sr. Software Engineer at a tech vendor with 1-10 employees
Oct 1, 2017
More ML based algorithms should be added to it, to make it algorithmic-rich for developers.
Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: March 2026.
884,192 professionals have used our research since 2012.
it_user746943 - PeerSpot reviewer
Big Data and Cloud Solution Consultant at a financial services firm with 10,001+ employees
Oct 2, 2017
Dynamic DataFrame options are not yet available.
it_user786777 - PeerSpot reviewer
Manager | Data Science Enthusiast | Management Consultant at a consultancy with 5,001-10,000 employees
Dec 10, 2017
Include more machine learning algorithms and the ability to handle streaming of data versus micro batch processing.
reviewer894894 - PeerSpot reviewer
Solutions Architect at a computer software company with 51-200 employees
Jun 27, 2018
It should support more programming languages.
it_user946074 - PeerSpot reviewer
Principal Architect at a financial services firm with 1,001-5,000 employees
Jul 10, 2019
It needs a new interface and a better way to get some data. In terms of writing our scripts, some processes could be faster.
LC
Snr Security Engineer at a tech vendor with 201-500 employees
Jul 14, 2019
The management tools could use improvement. Some of the debugging tools need some work as well. They need to be more descriptive.
reviewer1046250 - PeerSpot reviewer
Senior Consultant & Training at a tech services company with 51-200 employees
Oct 13, 2019
When you first start using this solution, it is common to run into memory errors when you are dealing with large amounts of data.
MG
Director of BigData Offer at IVIDATA
Dec 9, 2019
The solution needs to optimize shuffling between workers.