Name: Apache Spark
Brand: Apache
Rating: 4.2 (68 reviews)

it_user371334

CEO at a tech consulting company with 51-200 employees

Jan 17, 2016

Download

It's enabled interactive self-service access to data.

What is most valuable?

There are several valuable features.

Interactive data access (low latency)
Batch ETL-style processing
Schema-free data models
Algorithms

How has it helped my organization?

We have 1000x improvement in performance over other techniques. It's enabled interactive self-service access to data.

What needs improvement?

Better integration of BI tools wold be a much appreciated improvement.

For how long have I used the solution?

I've used it for about 14 months.

Buyer's Guide

Apache Spark

December 2025

Free Report: Apache Spark Reviews and More

Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: December 2025.

DOWNLOAD NOW

879,425 professionals have used our research since 2012.

What was my experience with deployment of the solution?

I haven't had any issues with deployment.

What do I think about the stability of the solution?

It's been stable for us.

What do I think about the scalability of the solution?

It's scaled without issue.

How are customer service and support?

Customer Service:

Customer service is excellent.

Technical Support:

Technical support is excellent.

Which solution did I use previously and why did I switch?

Yes, we previously used Oracle, from which we ported our data.

How was the initial setup?

The initial setup was simple.

What about the implementation team?

We implemented it with our in-house team.

What other advice do I have?

Be sure to Uuse the Apache versions and avoid vendor-specific extensions.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

it_user371325

Data Scientist at a tech vendor with 10,001+ employees

Jan 17, 2016

Download

It allows the loading and investigation of very lard data sets, has MLlib for machine learning, Spark streaming, and both the new and old dataframe API.

What is most valuable?

It allows the loading and investigation of very lard data sets, has MLlib for machine learning, Spark streaming, and both the new and old dataframe API.

How has it helped my organization?

We're able to perform data discovery on large datasets without too much difficulty.

What needs improvement?

It needs better documentation as well as examples for all the Spark libraries. That would be very helpful in maximizing its capabilities and results.

For how long have I used the solution?

I've used it for over nine months now.

What was my experience with deployment of the solution?

I haven't encountered any issues with deployment.

What do I think about the stability of the solution?

There have been no stability issues.

What do I think about the scalability of the solution?

I haven't had any scalability issues. It scales better than Python and R.

How are customer service and technical support?

Customer Service:

I haven't had to use customer service.

Technical Support:

I haven't had to use technical support.

Which solution did I use previously and why did I switch?

I previously used Python and R, but neither of these scaled particularly well.

How was the initial setup?

The initial setup was complex. It was not easy getting the correct version and dependencies set up.

What about the implementation team?

I implemented it in-house on my own!

What was our ROI?

It's open-source, so ROI is inapplicable.

What other advice do I have?

Learn Scala as this will greatly reduce the pain in starting off with Spark.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Buyer's Guide

Apache Spark

December 2025

Free Report: Apache Spark Reviews and More

Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: December 2025.

DOWNLOAD NOW

879,425 professionals have used our research since 2012.

it_user365301

Software Developer (Product Engineering) at a computer software company with 501-1,000 employees

Jan 13, 2016

Download

We have been using Spark to do a lot of batch and stream processing of inbound data from Apache Kafka. Scaling Spark on YARN is still an issue but we are getting acceptable performance.

Valuable Features:

\Spark Streaming, Spark SQL and MLib in that order.

Improvements to My Organization:

We have been using Spark to do a lot of batch and stream processing of inbound data from Apache Kafka. Scaling Spark on YARN is still an issue but we are getting acceptable performance.

Room for Improvement:

Like I said scalability is still an issue, also stability. Spark on Yarn still doesn't seem to have programming submission api, so have to rely on spark-submit script to run jobs on YARN. Scala vs Java API have performance differences which will require sometimes to code in Scala.

Other Advice:

Have Scala developers at hand. Base Java competency will not be enough during optimization rounds.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

reviewer1904019

Chief Technology Officer at a tech services company with 11-50 employees

Jul 6, 2022

Download

Helpful support, easy to use, and high availability

Pros and Cons

"The most valuable feature of Apache Spark is its ease of use."

"Apache Spark can improve the use case scenarios from the website. There is not any information on how you can use the solution across the relational databases toward multiple databases."

What is our primary use case?

I am using Apache Spark for the data transition from databases. We have customers who have one database as a data lake.

What is most valuable?

The most valuable feature of Apache Spark is its ease of use.

What needs improvement?

Apache Spark can improve the use case scenarios from the website. There is not any information on how you can use the solution across the relational databases toward multiple databases.