What is your primary use case for Apache Spark Streaming?

Learn what your peers think about Apache Spark Streaming. Get advice and tips from experienced pros sharing their opinions. Updated: November 2025.

DOWNLOAD NOW

872,869 professionals have used our research since 2012.

Related Q&As

Aug 18, 2025

What is your experience regarding pricing and costs for Apache Spark Streaming?

Jan 25, 2024

What do you like most about Apache Spark Streaming?

Khoa Dang Le Principal AI Engineer at IMT Solutions · Answer 1 · 2025-09-26T10:14:51Z

We work with Apache Spark Streaming for our project because we use that as one of the landing data sources, and we work with it to ensure we can get all of the data before it goes through our data warehouse. Apache Spark is one of the solutions we work with for that, especially for applying machine learning to our data. We use Apache Spark Streaming for real-time processing, and we utilize MLlib for machine learning. Those are two components we use for that. I find the setup of Apache Spark Streaming straightforward. I run it on the local machine. The complexity doesn't stem from our end but rather from how we integrate data. Using SQL with Apache Spark can be quite easy, and that's all I engage with regarding Apache Spark Streaming without complex issues.

Ajay Hiremath Gen AI Lead/Architect at Alvaria · Answer 2 · 2025-09-09T17:26:57Z

My use cases for Apache Spark Streaming were during my academics. During that time, I used Apache Spark Streaming to transmit data live from one source to another.

Himansu Jena Sr Project Manager at Raj Subhatech · Answer 3 · 2025-08-19T14:37:30Z

I use Apache Spark Streaming for GIS (Graphical Information System), satellite imaging processing, image processing, longitude, latitude, and predicting electricity, road, and transformations in these areas. I process all the information in real time where I can get lots of petabyte data, terabyte data from any type of XML, Excel, structured data, semi-structured data, and unstructured data. I use micro-batching, streams, transformations, and this information. Based on that, I predict and create models that can be used for regular expressions and image processing. Then using TensorFlow, I create dynamic views. Additionally, I create models which provide accuracy of predictive analytics. With Apache Spark Streaming's integration with Anaconda and Miniconda with Python, I interact with databases using data frames or data sets in micro versions. I create solutions based on what business is expecting for decision-making, logistic regression, linear regression, or machine learning which will give image or voice record, graphical data that will provide more accuracy. These features are implemented based on client requirements. We ensure we are on track using AML and real-time processing from various data sources, whether structured, unstructured, or semi-structured data.

Venkata Phaneendra Reddy Janga Data Engineer III at Verizon · Answer 4 · 2025-08-18T09:16:05Z

We have used Apache Spark Streaming for ingestion from streaming sources such as Apache Kafka. Another use case is for building a Customer 360, specifically C360 in real-time. Additionally, we used it to build a real-time feature store to build features that will be used for machine learning models and AI models. When building this customer profile with Apache Spark Streaming, we create a micro-batch of one minute for customer profiling. For example, we track email changes, contact changes, and address changes. This approach is near-real-time due to the micro-batch. By using Apache Spark Streaming, the data freshness rate and latency have decreased significantly. Earlier we used to run 24-hour batches, and now it is less than one minute, allowing us to communicate any customer changes, such as address or email, to downstream systems such as Adobe Experience Platform for marketing within one minute, capturing all changes in near real-time. We have integrated Apache Spark Streaming with Google's Cloud Storage (GCS) and Google BigQuery. Additionally, we have integrated with native HDFS and Hive as well.

Aleksandr Motuzov Head of Data Science center of excellence at Ameriabank CJSC · Answer 5 · 2024-11-14T09:02:00Z

We use Spark Streaming in a micro-batch region. It's not a full real-time system, but it offers high performance and low latency.

RajeevKumar10 DevOps engineer at Vvolve management consultants · Answer 6 · 2024-06-03T16:40:20Z

I've used it more for ETL. It's useful for creating data pipelines, streaming datasets, generating synthetic data, synchronizing data, creating data lakes, and loading and unloading data is fast and easy. In my ETL work, I often move data from multiple sources into a data lake. Apache Spark is very helpful for tracking the latest data delivery and automatically streaming it to the target database.

Oscar Estorach Chief Data-strategist and Director at Theworkshop.es · Answer 7 · 2024-01-25T11:39:24Z

As a data engineer, I use Apache Spark Streaming to process real-time data for web page analytics and integrate diverse data sources into centralized data warehouses.

score 0 · Answer 8 · 2023-07-24T08:33:07Z

The solution has industry-related use cases, with orders flowing from the order management system. We use Apache Spark Streaming to collect and store these orders in our database.

Daleep R Chief Technology Officer at Teslon Technologies Pvt Ltd · Answer 9 · 2023-06-08T10:44:00Z

We used Spark and Spark Streaming, as well as Spark ML, for multiple use cases, particularly streaming IoT-related data. Additionally, we applied Spark ML for various machine learning algorithms on the streaming data, mainly in the healthcare space. So, primarily in the healthcare domain.

Srikanth Bhuvanagiri Sr Technical Analyst at Sumtotal · Answer 10 · 2022-11-21T18:14:54Z

The primary use case of this solution is for streaming data. It can stream large amounts of data in small data chunks which are used for Databricks data. I've been using the solution for personal research purposes only and not for business applications. I'm a customer of Apache.

AbhishekGupta Engineering Leader at Walmart · Answer 11 · 2022-10-08T01:13:40Z

We have built services around Apache Spark Streaming. We use it for real-time streaming use cases. There are many last-minute delivery use cases. We are trying to build on Apache Spark Stream but the latency has to be better.

score 0 · Answer 12 · 2022-04-11T16:30:40Z

reviewer1494531

Head of Data Science at a energy/utilities company with 10,001+ employees

Real User

Apr 11, 2022

We're primarily using the solution for anomaly detection.

NitinKumar Director of Enginnering at Sigmoid · Answer 13 · 2021-09-03T05:04:15Z

NK

NitinKumar

Director of Enginnering at Sigmoid

Real User

Sep 3, 2021

Near Real time analytics using Near real time data ingestion.

score 0 · Answer 14 · 2021-03-19T22:33:34Z

reviewer1516182

Chief Innovation & Technology Leader at a mining and metals company with 1,001-5,000 employees

Real User

Mar 19, 2021

The primary use of the solution is to implement predictive maintenance qualities.