Apache Spark Streaming Reviews

Name: Apache Spark Streaming
Brand: Apache
Rating: 3.9 (17 reviews)

3.9 out of 5

17 reviews
94% willing to recommend

What is Apache Spark Streaming?

Spark Streaming makes it easy to build scalable fault-tolerant streaming applications.

Get the Apache Spark Streaming Buyer's Guide and find out what your peers are saying about Apache Spark Streaming, Databricks, Confluent and more!

Apache Spark Streaming is the #7 ranked solution in Streaming Analytics tools. PeerSpot users give Apache Spark Streaming an average rating of 7.8 out of 10. Apache Spark Streaming is most commonly compared to Databricks: Apache Spark Streaming vs Databricks. Apache Spark Streaming is popular among the large enterprise segment, accounting for 56% of users researching this solution on PeerSpot. The top industry researching this solution are professionals from a computer software company, accounting for 23% of all views.

Buyer's Guide

Apache Spark Streaming

October 2025

Get the report

Helped 869,089 peers since 2012

Featured Apache Spark Streaming reviews

Himansu Jena

Sr Project Manager at Raj Subhatech

There are various ways we can improve Apache Spark Streaming through best practices. The initial part requires attention to batch interval tuning, which helps small intervals in micro batches based on latency requirements and helps prevent back pressure. We can use data formats such as Parquet or ORC for storage that needs faster reads and leveraging feature predicate push-down optimizations. We can implement serialization which helps with any Kyro in terms of .NET or Java. We have boxing and unboxing serialization for XML and JSON for converting key-pair values stored in browser. We can also implement caching mechanisms for storing and recomputing multiple operations. We can use specified joins which help with smaller databases, and distributed joins can minimize users. We can implement project optimization memory for CPU efficiency, known as Tungsten. Additionally, load balancing, checkpointing, and schema evaluation are areas to consider based on performance and bottlenecks. We can use Bugzilla tools for tracking and Splunk to monitor the performance of process systems, utilization, and performance based on data frames or data sets.

Read full review

Kuldeep Pal

Data Engineer at Walmart Global Tech

The positive impact from Apache Spark Streaming is its near real-time capability. It has a good ecosystem that provides good support. However, if you need purely real-time data, you would be going with Flink. Apache Spark Streaming is good for near-to-real-time data and requires less maintenance, which is beneficial for developers and companies. The new feature coming in Apache Spark Streaming 4 is continuous streaming. If continuous streaming becomes stable and performs comparably to Flink, then Apache Spark Streaming would be preferred everywhere due to its good maintenance and support system. While it is reliable, there are some issues with Apache Spark Streaming as it is not 100% reliable. Sometimes it fails, requiring numerous configurations such as checkpointing, watermarking, and other features. If you select a 10-minute window and the data arrives at the 30th minute, it sometimes loses data in between. You also have to apply back pressure when numerous messages are coming in. It requires constant monitoring and maintenance. I would say it is 90-95% reliable, but multiple configurations and frequent maintenance make it slightly less reliable. The continuous deployment feature being in beta phase could benefit everyone if released earlier.

Read full review

Khoa Dang Le

Principal AI Engineer at IMT Solutions

I find the fault tolerance feature beneficial because I use it for serving data from a landing area. I understand all of the structures we have for Spark SQL, Spark Streaming, and MLlib. The ability of Apache Spark Streaming to handle out-of-order data using watermarking and windowing is something we use in our pipeline. Nearly 50% of our usage is based on that because we use it for landing data, and we appreciate that we can work with it. The main benefits of Apache Spark Streaming include cost savings, time savings, and efficiency improvements about data storage. The fast storage capability is crucial because Apache Spark replaces Hadoop's MapReduce, allowing us to manage our data more efficiently.

Read full review

Apache Spark Streaming mindshare

As of October 2025, the mindshare of Apache Spark Streaming in the Streaming Analytics category stands at 3.6%, up from 3.4% compared to the previous year, according to calculations based on PeerSpot user engagement data.

Streaming Analytics Market Share Distribution
Product	Market Share (%)
Apache Spark Streaming	3.6%
Apache Flink	14.8%
Databricks	12.5%
Other	69.1%

Streaming Analytics

PeerResearch reports based on Apache Spark Streaming reviews

Type	Title	Date
Category	Streaming Analytics	Oct 5, 2025	Download
Product	Reviews, tips, and advice from real users	Oct 5, 2025	Download
Comparison	Apache Spark Streaming vs Databricks	Oct 5, 2025	Download
Comparison	Apache Spark Streaming vs Amazon Kinesis	Oct 5, 2025	Download
Comparison	Apache Spark Streaming vs Confluent	Oct 5, 2025	Download

Title	Rating	Mindshare	Recommending
Databricks	4.1	12.5%	96%	91 interviews Add to research
Confluent	4.1	8.5%	95%	25 interviews Add to research

Valuable Features

Valuable features of Apache Spark Streaming include real-time analytics, scalability, efficient data processing, integration with other technologies, and low latency. Users appreciate its versatility in supporting multiple languages, stability, fault tolerance, and ease of deployment. Features like checkpointing, windowing, and watermarking enhance its capability to manage out-of-order data. It enables handling large-scale data, provides native Python support, and facilitates integration with databases and other data sources. Users benefit from cost efficiency and strong community support.

"The main benefits of Apache Spark Streaming include cost savings, time savings, and efficiency improvements about data storage."
"The main benefits of Apache Spark Streaming include cost savings, time savings, and efficiency improvements about data storage."
"For Apache Spark Streaming, the feature I appreciated most is that it provides live data delivery; additionally, it provides the capability to send a larger amount of data in parallel."

Room for Improvement

Apache Spark Streaming requires enhancements in user configuration to be more business-friendly, easier installation, and improved cloud-native support. Further improvements include handling real-time analytics, memory management, and latency issues. Users desire more robust monitoring, better UI, integration of arbitrary stateful functions in Python, and handling unstructured data. Auto-tuning, continuous deployment features, and capabilities for instant job stopping directly from the UI need attention. Improvements in Spark SQL, scalability, debugging, and adaptation to real-time processing are also necessary.

"The problem is we need to use it in a certain manner. After that, we need to apply another pipeline for the machine learning processes, and that's what we work on."
"The problem is we need to use it in a certain manner. After that, we need to apply another pipeline for the machine learning processes, and that's what we work on."
"The downside is when you have this the other way around in the columns, it becomes really hard to use."

Pricing

Enterprise users find Apache Spark Streaming cost-effective due to its open-source nature. While using Apache Spark as a service incurs costs, the open-source version has no licensing fees. Cloud setups like AWS EMR, Google Cloud's DataProc, and similar services provide managed solutions, influencing cost through additional features. On-premises setups are generally more expensive. Costs can vary based on additional service dependencies, but the core software remains free, offering a flexible pricing structure for businesses.

"Spark is an affordable solution, especially considering its open-source nature."
"On a scale from one to ten, where one is expensive, or not cost-effective, and ten is cheap, I rate the price a seven."
"I was using the open-source community version, which was self-hosted."

Popular Use Cases

Apache Spark Streaming is utilized for predictive maintenance, anomaly detection, and real-time streaming. Entities employ it for healthcare data processing, ETL tasks, GIS, and IoT-data processing. It's integrated with diverse data ecosystems like Kafka, Google Cloud, and HDFS, enhancing real-time decision-making in telecommunications and order management. Organizations use Apache Spark Streaming with micro-batching for tasks like fraud detection and building Customer 360 profiles, benefiting from reduced latency and improved data integration.

Service and Support

Apache Spark Streaming offers strong documentation and a robust open-source community support. Users frequently rely on available online resources and community assistance rather than directly contacting Apache's team. Many appreciate the community, especially major contributions from Databricks. External consulting is sometimes used to enhance support. Apache's technical support is often unnecessary as users find ample guidance through public information and community channels for managing databases or storage-related tasks.

Deployment

Apache Spark Streaming's initial setup varies; some find it developer-focused and straightforward, while others see it as complex, requiring Java or Scala knowledge. Installation can be done in minutes in hosted or hybrid cloud environments. There is extensive documentation and community support, facilitating ease in smaller-scale setups. Scaling is simple in cloud settings, and maintenance benefits from active development and version migration. Multiple users highlight easy deployment if supported by other tools like Hadoop, Data Lake, or Kafka.

Scalability

Apache Spark Streaming appears highly scalable with support for large-scale data processing and many active users. Its distributed compute architecture, horizontal scalability, and features like auto-scaling, adaptive query planning, and handling of data skewness enhance performance. Adaptable across several domains, it manages workload balancing globally. Users didn't face issues scaling operations, even with significant data loads, and its capabilities extend to handling real-time processing, machine learning, and data visualizations.

Stability

Apache Spark Streaming is regarded as stable and mature, with no significant bugs or crashes noted. Users report it as reliable, especially version 3.0.1. Crashes may occur with large datasets or improper configuration but are infrequent and manageable. It benefits from being open and transparent, allowing users to understand its operations. Although maintenance on platforms like EMR, EKS, or Azure Databricks is necessary, its stability is highly rated and suitable for various use cases.

These insights are based on the in-depth reviews provided by peers to help you make a better buying decision.

Download our Apache Spark Streaming Buyer's Guide for additional reliable information.

Review data by company size

By reviewers
Company Size	Count
Small Business	7
Midsize Enterprise	2
Large Enterprise	5

By reviewers

By visitors reading reviews
Company Size	Count
Small Business	50
Midsize Enterprise	22
Large Enterprise	91

By visitors reading reviews

Top industries

By visitors reading reviews

Computer Software Company

23%

Financial Services Firm

20%

Healthcare Company

University

Manufacturing Company

Outsourcing Company

Comms Service Provider

Educational Organization

Media Company

Insurance Company

Performing Arts

Retailer

Government

Real Estate/Law Firm

Legal Firm

Recreational Facilities/Services Company

Marketing Services Firm

Hospitality Company

Pharma/Biotech Company

Transportation Company

Energy/Utilities Company

Sports Company

Aerospace/Defense Firm

Leisure / Travel Company

Logistics Company

Wholesaler/Distributor

Compare Apache Spark Streaming with alternative products

Learn more about Apache Spark Streaming

Apache Spark Streaming was previously known as Spark Streaming.

Apache Spark Streaming customers

UC Berkeley AMPLab, Amazon, Alibaba Taobao, Kenshoo, eBay Inc.

Product Categories

Streaming Analytics

Popular Comparisons

Databricks vs Apache Spark Streaming

Confluent vs Apache Spark Streaming

Apache Flink vs Apache Spark Streaming

Azure Stream Analytics vs Apache Spark Streaming

Spring Cloud Data Flow vs Apache Spark Streaming

Amazon Kinesis vs Apache Spark Streaming

Amazon MSK vs Apache Spark Streaming

Starburst Enterprise vs Apache Spark Streaming

Informatica Data Engineering Streaming vs Apache Spark Streaming

Apache Pulsar vs Apache Spark Streaming

Aiven Platform vs Apache Spark Streaming

Talend Data Streams vs Apache Spark Streaming

SAS Event Stream Processing vs Apache Spark Streaming

See all alternatives

Apache Spark Streaming Reviews Summary
Author info	Rating	Review Summary
Sr Project Manager at Raj Subhatech	4.0	I've used Apache Spark Streaming for real-time GIS and data processing, benefiting from its scalability, integration with Python tools, and predictive analytics, though handling varied data types sometimes presents challenges with missing or incomplete values.
Data Engineer at Walmart Global Tech	4.0	I've used Apache Spark Streaming for near real-time fraud detection with Kafka. Its flexible windowing, checkpointing, and scalability work well, though it requires careful configuration. It's reliable but not perfect, and continuous monitoring is essential.
Principal AI Engineer at IMT Solutions	4.0	I've used Apache Spark Streaming for three years for real-time data processing and machine learning, appreciating its fault tolerance and scalability, though retraining MLlib models for each pipeline remains a notable limitation.
Sr. Manager Data Engineer at a tech consulting company with 51-200 employees	3.5	I've used Apache Spark Streaming for years to process network data in near real-time. It's scalable and easy to deploy on AWS, but lacks support for certain features, monitoring, and handling of slowly changing dimensions.
Data Engineer III at a tech consulting company with 10,001+ employees	4.0	I've used Apache Spark Streaming to improve data latency for real-time customer profiling and ML features, though I’d like true real-time processing instead of micro-batches; setup was easy, and scalability and community support are excellent.
Gen AI Lead/Architect at Alvaria	3.5	I used Apache Spark Streaming during my academics for live data transmission and appreciated its real-time capabilities, though it lacks support for unstructured data, which limits some use cases; overall, I’d rate it seven out of ten.
Chief Data-strategist and Director at Theworkshop.es	4.5	I use Apache Spark Streaming for processing real-time data in web analytics. Its versatility in supporting multiple languages makes it ideal for integrating diverse data sources. While the UI could improve, it effectively handles various scenarios and requires careful use case consideration.
Engineering Leader at Walmart	4.0	No summary available

Apache Spark Streaming Reviews

What is Apache Spark Streaming?

Featured Apache Spark Streaming reviews

Apache Spark Streaming mindshare

PeerResearch reports based on Apache Spark Streaming reviews

Valuable Features

Room for Improvement

Pricing

Popular Use Cases

Service and Support

Deployment

Scalability

Stability

Review data by company size

Top industries

Compare Apache Spark Streaming with alternative products

Learn more about Apache Spark Streaming

Apache Spark Streaming customers

Related questions

Product Categories

Popular Comparisons