Try our new research platform with insights from 80,000+ expert users
DevOps engineer at Vvolve management consultants
Real User
Top 5
Handles large datasets and is relatively easy to manage, especially with cloud technologies but scalability features could be enhanced
Pros and Cons
  • "Apache Spark's capabilities for machine learning are quite extensive and can be used in a low-code way."
  • "The debugging aspect could use some improvement."

What is our primary use case?

I've used it more for ETL. It's useful for creating data pipelines, streaming datasets, generating synthetic data, synchronizing data, creating data lakes, and loading and unloading data is fast and easy. 

In my ETL work, I often move data from multiple sources into a data lake. Apache Spark is very helpful for tracking the latest data delivery and automatically streaming it to the target database.

How has it helped my organization?

Apache Spark is a versatile technology useful not only for data solutions but also for data creation. This is especially valuable given GDPR regulations and limited access to production, which make tasks like testing quite difficult. It helps with data creation and alignment for both consumers and developers.

What is most valuable?

Apache Spark Streaming is particularly good at handling real-time data. It has built-in data streaming integration, which allows it to stream data from any source as soon as it becomes available.

What needs improvement?

The scalability features are already good, but they could be further enhanced. Additionally, the debugging aspect could use some improvement.

Buyer's Guide
Streaming Analytics
June 2025
Find out what your peers are saying about Apache, Amazon Web Services (AWS), Microsoft and others in Streaming Analytics. Updated: June 2025.
856,873 professionals have used our research since 2012.

What do I think about the stability of the solution?

The stability is very good.  Since everything runs as code, it's easy to understand what's happening under the hood. It's not a closed-box system, which makes it quite transparent.

What do I think about the scalability of the solution?

On my team, there are about six or seven people using it. However, on the analytics side, where users view the reports, there are many more, perhaps over a hundred.

How was the initial setup?

The deployment process is quite easy and not very complicated.

Since it's an open-source technology, it can be deployed in various environments, including local machines and all kinds of clouds. If you're using the cloud, scaling is quite easy.

What about the implementation team?

If there are knowledgeable, experienced team members, it doesn't require a large team. One or two developers are enough.

What was our ROI?

It can handle large datasets and is relatively easy to manage, especially with cloud technologies. This means you can process a lot of data even with a low-configuration environment, which helps with cost savings.

What other advice do I have?

I would rate it a seven out of ten. Apache Spark's capabilities for machine learning are quite extensive and can be used in a low-code way. This can be much more efficient than using various technologies. You can also combine its batch processing capabilities with new technologies and machine learning.

It's quite useful for AI because of its machine-learning capabilities, which allow for model training and output generation.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
reviewer1516182 - PeerSpot reviewer
Chief Innovation & Technology Leader at a mining and metals company with 1,001-5,000 employees
Real User
Efficient, better then average, but overly developer-focused
Pros and Cons
  • "The solution is better than average and some of the valuable features include efficiency and stability."
  • "There could be an improvement in the area of the user configuration section, it should be less developer-focused and more business user-focused."

What is our primary use case?

The primary use of the solution is to implement predictive maintenance qualities. 

What is most valuable?

The solution is better than average and some of the valuable features include efficiency and stability.

What needs improvement?

There could be an improvement in the area of the user configuration section, it should be less developer-focused and more business user-focused. For example, it is still not plug and play and use as some of the cloud offerings that come ready to use. It is not up there in the reading leading edge.

For how long have I used the solution?

I have been using this solution for approximately one and a half year. 

What do I think about the stability of the solution?

The solution is very stable.

How was the initial setup?

The initial setup is developer-focused but it is not very complex. I can set up a stream in less than an hour. It will stream but It will not be a production-ready stream.

What other advice do I have?

I rate Apache Spark Streaming a six out of ten. 

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Streaming Analytics
June 2025
Find out what your peers are saying about Apache, Amazon Web Services (AWS), Microsoft and others in Streaming Analytics. Updated: June 2025.
856,873 professionals have used our research since 2012.
reviewer2392494 - PeerSpot reviewer
Enterprise Data Architect at a pharma/biotech company with 11-50 employees
Real User
Top 5Leaderboard
Provides real-time data processing capabilities with efficient reliability
Pros and Cons
  • "The platform’s most valuable feature for processing real-time data is its ability to handle continuous data streams."
  • "Integrating event-level streaming capabilities could be beneficial."

What is most valuable?

The platform’s most valuable feature for processing real-time data is its ability to handle continuous data streams.

What needs improvement?

The product's event handling capabilities, particularly compared to Kaspersky, need improvement. Integrating event-level streaming capabilities could be beneficial. This aligns with the idea of expanding Spark's functionality to cover unaddressed areas, potentially enhancing its competitiveness.

For how long have I used the solution?

We have been using Apache Spark Streaming for five years.

What's my experience with pricing, setup cost, and licensing?

Spark is an affordable solution, especially considering its open-source nature. However, it could use support from experienced companies to resolve any issues effectively.

What other advice do I have?

Spark does not encounter integration issues, particularly due to its utilization of JDBC connectors. These connectors facilitate seamless integration with third-party solutions. Furthermore, successful integration with tools like SAP HANA indicates its versatility in handling various data sources. Additionally, its performance surpasses Informatica in certain scenarios, especially when real-time streaming capabilities are crucial. It remains a preferred choice for businesses requiring efficient real-time data processing.

I rate it an eight.

Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
Buyer's Guide
Download our free Streaming Analytics Report and find out what your peers are saying about Apache, Amazon Web Services (AWS), Microsoft, and more!
Updated: June 2025
Product Categories
Streaming Analytics
Buyer's Guide
Download our free Streaming Analytics Report and find out what your peers are saying about Apache, Amazon Web Services (AWS), Microsoft, and more!