No more typing reviews! Try our Samantha, our new voice AI agent.
Apache Spark Logo

Apache Spark pros and cons

Vendor: Apache
4.2 out of 5
Badge Ranked 1

Pros & Cons summary

Buyer's Guide

Get pricing advice, tips, use cases and valuable features from real users of this product.
Get the report

Prominent pros & cons

PROS

Apache Spark offers exceptional speed and efficiency in data processing, significantly outperforming traditional tools by effectively managing parallel processing and handling large data volumes.
It boasts strong scalability, enabling efficient handling of extensive datasets by distributing workloads across multiple nodes, which enhances performance and flexibility.
The in-memory processing engine is highly valuable, allowing for rapid data handling and processing by utilizing RAM rather than disk storage, which leads to enhanced execution speed.
Apache Spark supports a wide range of machine learning processes with a scalable library, providing valuable functionalities for real-time data processing, streaming, and analytical tasks.
The flexibility and user-friendly nature of Apache Spark make it easy to deploy and integrate seamlessly with existing processes, supported by clear documentation, which increases its adoption in various industries.

CONS

Apache Spark requires significant technical expertise to deploy and run high-tech tools, making it challenging for users without a technical background.
Apache Spark lacks support for certain machine learning libraries, models, and neural network-related algorithms, limiting its use in some applications.
Apache Spark's initial setup and installation are complex, demanding a considerable learning curve for practitioners.
Optimization techniques in Apache Spark have limitations, particularly when handling large data sets, affecting performance and efficiency.
Integration with popular databases and third-party platforms needs improvement, as current support often requires workarounds.
 

Apache Spark Pros review quotes

it_user365301 - PeerSpot reviewer
Software Developer (Product Engineering) at a computer software company with 501-1,000 employees
Jan 5, 2016
Spark Streaming, Spark SQL and MLib in that order.
it_user365304 - PeerSpot reviewer
Software Consultant at a tech services company with 10,001+ employees
Mar 27, 2016
Apache Spark is a framework, which allows one organization to perform business and data analytics, at a very low cost, as compared to Ab-Initio or Informatica.
it_user371325 - PeerSpot reviewer
Data Scientist at a tech vendor with 10,001+ employees
Jan 17, 2016
It allows the loading and investigation of very lard data sets, has MLlib for machine learning, Spark streaming, and both the new and old dataframe API.
Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: April 2026.
892,487 professionals have used our research since 2012.
it_user371334 - PeerSpot reviewer
CEO at a tech consulting company with 51-200 employees
Jan 17, 2016
We have 1000x improvement in performance over other techniques.
it_user74256 - PeerSpot reviewer
Engineer at a tech vendor with 10,001+ employees
Jan 18, 2016
Spark Streaming's micro-batch mode helps improving performance.
it_user371832 - PeerSpot reviewer
Chief System Architect at a marketing services firm with 501-1,000 employees
Mar 30, 2016
With Spark SQL we've now the capabilities to analyse very large quantities of data located in S3 on Amazon at very low cost comparing other solution we checked.
it_user372393 - PeerSpot reviewer
Big Data Consultant at a tech services company with 501-1,000 employees
Aug 25, 2017
The good performance. The nice graphical management console. The long list of ML algorithms.
it_user373173 - PeerSpot reviewer
Lead Big Data Engineer at a non-profit with 51-200 employees
Jan 20, 2016
Spark is relatively easy to deploy, with rich features in handling big data.
it_user374040 - PeerSpot reviewer
Systems Engineering Lead, Mid-Atlantic at a tech company with 10,001+ employees
Jan 21, 2016
Apache Spark’s ability to perform batch processing at one second or less intervals is the most transformative and less pervasive for any data processing application.
it_user374028 - PeerSpot reviewer
Core Engine Engineer at a computer software company with 51-200 employees
Jan 21, 2016
Faster time to parse and compute data makes web-based queries for plotting data easier.
 

Apache Spark Cons review quotes

it_user365301 - PeerSpot reviewer
Software Developer (Product Engineering) at a computer software company with 501-1,000 employees
Jan 5, 2016
Like I said scalability is still an issue, also stability.
it_user365304 - PeerSpot reviewer
Software Consultant at a tech services company with 10,001+ employees
Mar 27, 2016
The main problem is, now in the market, there are not many people certified in Apache Spark.
it_user371325 - PeerSpot reviewer
Data Scientist at a tech vendor with 10,001+ employees
Jan 17, 2016
The initial setup was complex. It was not easy getting the correct version and dependencies set up.
Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: April 2026.
892,487 professionals have used our research since 2012.
it_user371334 - PeerSpot reviewer
CEO at a tech consulting company with 51-200 employees
Jan 17, 2016
Better integration of BI tools would be a much appreciated improvement.
it_user74256 - PeerSpot reviewer
Engineer at a tech vendor with 10,001+ employees
Jan 18, 2016
I have to say it is bad. I can only ask for help in the Google group.
it_user371832 - PeerSpot reviewer
Chief System Architect at a marketing services firm with 501-1,000 employees
Mar 30, 2016
Spark Streaming is difficult to stabilize as you're always dependant to your stream flow.
it_user372393 - PeerSpot reviewer
Big Data Consultant at a tech services company with 501-1,000 employees
Aug 25, 2017
Apache Spark provides very good performance The tuning phase is still tricky.
it_user373173 - PeerSpot reviewer
Lead Big Data Engineer at a non-profit with 51-200 employees
Jan 20, 2016
I ran into Spark application performance issues.
it_user374040 - PeerSpot reviewer
Systems Engineering Lead, Mid-Atlantic at a tech company with 10,001+ employees
Jan 21, 2016
Although you are able to perform complex transformations using Spark libraries, the support for SQL to perform transformations is still limited.
it_user374028 - PeerSpot reviewer
Core Engine Engineer at a computer software company with 51-200 employees
Jan 21, 2016
It needs to be simpler to use the machine learning algorithms supported by Octave (example polynomial regressions, polynomial interpolation).