I use Apache Spark Streaming for GIS (Graphical Information System), satellite imaging processing, image processing, longitude, latitude, and predicting electricity, road, and transformations in these areas. I process all the information in real time where I can get lots of petabyte data, terabyte data from any type of XML, Excel, structured data, semi-structured data, and unstructured data. I use micro-batching, streams, transformations, and this information. Based on that, I predict and create models that can be used for regular expressions and image processing. Then using TensorFlow, I create dynamic views. Additionally, I create models which provide accuracy of predictive analytics. With Apache Spark Streaming's integration with Anaconda and Miniconda with Python, I interact with databases using data frames or data sets in micro versions. I create solutions based on what business is expecting for decision-making, logistic regression, linear regression, or machine learning which will give image or voice record, graphical data that will provide more accuracy. These features are implemented based on client requirements. We ensure we are on track using AML and real-time processing from various data sources, whether structured, unstructured, or semi-structured data.
We have used Apache Spark Streaming for ingestion from streaming sources such as Apache Kafka. Another use case is for building a Customer 360, specifically C360 in real-time. Additionally, we used it to build a real-time feature store to build features that will be used for machine learning models and AI models. When building this customer profile with Apache Spark Streaming, we create a micro-batch of one minute for customer profiling. For example, we track email changes, contact changes, and address changes. This approach is near-real-time due to the micro-batch. By using Apache Spark Streaming, the data freshness rate and latency have decreased significantly. Earlier we used to run 24-hour batches, and now it is less than one minute, allowing us to communicate any customer changes, such as address or email, to downstream systems such as Adobe Experience Platform for marketing within one minute, capturing all changes in near real-time. We have integrated Apache Spark Streaming with Google's Cloud Storage (GCS) and Google BigQuery. Additionally, we have integrated with native HDFS and Hive as well.
I've used it more for ETL. It's useful for creating data pipelines, streaming datasets, generating synthetic data, synchronizing data, creating data lakes, and loading and unloading data is fast and easy. In my ETL work, I often move data from multiple sources into a data lake. Apache Spark is very helpful for tracking the latest data delivery and automatically streaming it to the target database.
Chief Data-strategist and Director at Theworkshop.es
Real User
Top 10
2024-01-25T11:39:24Z
Jan 25, 2024
As a data engineer, I use Apache Spark Streaming to process real-time data for web page analytics and integrate diverse data sources into centralized data warehouses.
Data Engineer at a comms service provider with 201-500 employees
Real User
Top 10
2023-07-24T08:33:07Z
Jul 24, 2023
The solution has industry-related use cases, with orders flowing from the order management system. We use Apache Spark Streaming to collect and store these orders in our database.
Chief Technology Officer at Teslon Technologies Pvt Ltd
Real User
Top 20
2023-06-08T10:44:00Z
Jun 8, 2023
We used Spark and Spark Streaming, as well as Spark ML, for multiple use cases, particularly streaming IoT-related data. Additionally, we applied Spark ML for various machine learning algorithms on the streaming data, mainly in the healthcare space. So, primarily in the healthcare domain.
The primary use case of this solution is for streaming data. It can stream large amounts of data in small data chunks which are used for Databricks data. I've been using the solution for personal research purposes only and not for business applications. I'm a customer of Apache.
We have built services around Apache Spark Streaming. We use it for real-time streaming use cases. There are many last-minute delivery use cases. We are trying to build on Apache Spark Stream but the latency has to be better.
I use Apache Spark Streaming for GIS (Graphical Information System), satellite imaging processing, image processing, longitude, latitude, and predicting electricity, road, and transformations in these areas. I process all the information in real time where I can get lots of petabyte data, terabyte data from any type of XML, Excel, structured data, semi-structured data, and unstructured data. I use micro-batching, streams, transformations, and this information. Based on that, I predict and create models that can be used for regular expressions and image processing. Then using TensorFlow, I create dynamic views. Additionally, I create models which provide accuracy of predictive analytics. With Apache Spark Streaming's integration with Anaconda and Miniconda with Python, I interact with databases using data frames or data sets in micro versions. I create solutions based on what business is expecting for decision-making, logistic regression, linear regression, or machine learning which will give image or voice record, graphical data that will provide more accuracy. These features are implemented based on client requirements. We ensure we are on track using AML and real-time processing from various data sources, whether structured, unstructured, or semi-structured data.
We have used Apache Spark Streaming for ingestion from streaming sources such as Apache Kafka. Another use case is for building a Customer 360, specifically C360 in real-time. Additionally, we used it to build a real-time feature store to build features that will be used for machine learning models and AI models. When building this customer profile with Apache Spark Streaming, we create a micro-batch of one minute for customer profiling. For example, we track email changes, contact changes, and address changes. This approach is near-real-time due to the micro-batch. By using Apache Spark Streaming, the data freshness rate and latency have decreased significantly. Earlier we used to run 24-hour batches, and now it is less than one minute, allowing us to communicate any customer changes, such as address or email, to downstream systems such as Adobe Experience Platform for marketing within one minute, capturing all changes in near real-time. We have integrated Apache Spark Streaming with Google's Cloud Storage (GCS) and Google BigQuery. Additionally, we have integrated with native HDFS and Hive as well.
We use Spark Streaming in a micro-batch region. It's not a full real-time system, but it offers high performance and low latency.
I've used it more for ETL. It's useful for creating data pipelines, streaming datasets, generating synthetic data, synchronizing data, creating data lakes, and loading and unloading data is fast and easy. In my ETL work, I often move data from multiple sources into a data lake. Apache Spark is very helpful for tracking the latest data delivery and automatically streaming it to the target database.
As a data engineer, I use Apache Spark Streaming to process real-time data for web page analytics and integrate diverse data sources into centralized data warehouses.
The solution has industry-related use cases, with orders flowing from the order management system. We use Apache Spark Streaming to collect and store these orders in our database.
We used Spark and Spark Streaming, as well as Spark ML, for multiple use cases, particularly streaming IoT-related data. Additionally, we applied Spark ML for various machine learning algorithms on the streaming data, mainly in the healthcare space. So, primarily in the healthcare domain.
The primary use case of this solution is for streaming data. It can stream large amounts of data in small data chunks which are used for Databricks data. I've been using the solution for personal research purposes only and not for business applications. I'm a customer of Apache.
We have built services around Apache Spark Streaming. We use it for real-time streaming use cases. There are many last-minute delivery use cases. We are trying to build on Apache Spark Stream but the latency has to be better.
We're primarily using the solution for anomaly detection.
Near Real time analytics using Near real time data ingestion.
The primary use of the solution is to implement predictive maintenance qualities.