Most features in Apache Spark Streaming are used for database operations, focusing on speed, fault tolerance, scalability in terms of batch, real-time, SQL analytics, machine learning, graph processing, lazy evaluation, and compatibility. Distributed systems provide more accuracy and clustering of machines across large data sets. The data is divided into portions, partitions, or small pieces and processed in parallel across multiple work nodes, significantly accelerating processing time compared to single solutions. It helps with in-memory computing, storing memory, reducing frequent disk input-output, and enabling faster algorithms. We use NumPy and Pandas for matrix operations, creating algorithms that generate models fitting our deep learning or machine learning techniques. The accuracy level typically reaches 90% and above based on the data quality. When dealing with various data types including COBOL, Excel, JSON, video, audio, and MPG files, challenges can arise with incomplete or missing values. This particularly affects GIS data accuracy, such as predicting transport routes or electrical pole placements. While we achieve 90% efficiency, working with historical data versus current data presents challenges in business growth predictions. When encountering fault tolerance issues, we communicate directly with the Apache Spark Streaming development team through LinkedIn channels or their on-site team. They provide customer support where issues can be reported via SMS or email with the file name for solution assistance. The team helps address issues with data frames, data sets, RDD functionality, version migrations, and integration with tools such as Miniconda, Anaconda, and Node.js server. I rate Apache Spark Streaming 9 out of 10.
I would suggest Apache Spark for streaming processing if they want to manage clusters on their own. For serverless options, exploring other use cases could be beneficial. Apache Spark is a good starting point, considering its strong open source community contributions. We have not experienced downtime with Apache Spark Streaming, but we have had crashes. Sometimes the state memory keeps piling up, so we have to make tuning. We had crashes, but not very often. We were able to check specific load times and resolve those issues. On a scale of 1-10, I rate Apache Spark Streaming an 8.
Spark does not encounter integration issues, particularly due to its utilization of JDBC connectors. These connectors facilitate seamless integration with third-party solutions. Furthermore, successful integration with tools like SAP HANA indicates its versatility in handling various data sources. Additionally, its performance surpasses Informatica in certain scenarios, especially when real-time streaming capabilities are crucial. It remains a preferred choice for businesses requiring efficient real-time data processing. I rate it an eight.
Chief Data-strategist and Director at Theworkshop.es
Real User
Top 10
2024-01-25T11:39:24Z
Jan 25, 2024
For those starting with Apache Spark Streaming, I recommend studying and understanding data relationships. While it might seem complex at first, there are helpful resources available. Overall, I would rate Apache Spark Streaming as a nine out of ten.
Data Engineer at a comms service provider with 201-500 employees
Real User
Top 10
2023-07-24T08:33:07Z
Jul 24, 2023
Apache Spark Streaming has very specific use cases and needs to be evaluated based on the needs of an individual before choosing it. Overall, I rate the solution an eight out of ten.
Chief Technology Officer at Teslon Technologies Pvt Ltd
Real User
Top 20
2023-06-08T10:44:00Z
Jun 8, 2023
I would highly recommend Spark Streaming for standard streaming or IoT use cases. The entire Spark ecosystem, including Spark Core, streaming, ML, and other components, can be highly beneficial. It's better to stick with the Spark ecosystem rather than use other platforms and frameworks. For streaming and IoT, Spark Streaming is a great choice. Overall, I would rate the solution an eight out of ten. The only issue I found, at least during the time I actively worked with it, was that it was resource-intensive, even for small-scale applications. In comparison, some other platforms, like Pulsar, had lighter resource consumption and performed better in terms of resource usage and associated costs. At least, to begin with, it performs better with the resource usage and dollar value associated with it. But at least to begin with it is a bit heavy and resource intensive, which is why I rate it an eight.
It's important to be familiar with Spark Streaming and Spark libraries, because familiarity with those scripts and coding languages makes it easier to work with the Spark code ecosystem to get the integrations of Spark Streaming or any Spark cluster creations. I rate this solution eight out of 10.
There are 18 to 20 people needed for maintenance with our 1,000 users. My advice to others is they need to fine-tune their job so the testing becomes important. Fine-tuning becomes important. It would be beneficial to have consulting or some background in the solution before using it. I rate Apache Spark Streaming an eight out of ten.
Head of Data Science at a energy/utilities company with 10,001+ employees
Real User
2022-04-11T16:30:40Z
Apr 11, 2022
We are a customer and end-user. We're using it in Azure, in Databricks. I don't know the exact version of Spark I'm using; it's one of the recent ones. I would rate the product an eight out of ten.
Chief Data-strategist and Director at Theworkshop.es
Real User
Top 10
2021-08-18T14:55:15Z
Aug 18, 2021
It's cheaper for companies to use cloud systems, however, you can implement it on-premise. We use the cloud. As it is the cloud, it's always on the latest version and updates itself regularly. I would rate the product at a nine out of ten. It's very good in terms of its capabilities and I have been very happy with it. I would recommend the solution to other users.
Most features in Apache Spark Streaming are used for database operations, focusing on speed, fault tolerance, scalability in terms of batch, real-time, SQL analytics, machine learning, graph processing, lazy evaluation, and compatibility. Distributed systems provide more accuracy and clustering of machines across large data sets. The data is divided into portions, partitions, or small pieces and processed in parallel across multiple work nodes, significantly accelerating processing time compared to single solutions. It helps with in-memory computing, storing memory, reducing frequent disk input-output, and enabling faster algorithms. We use NumPy and Pandas for matrix operations, creating algorithms that generate models fitting our deep learning or machine learning techniques. The accuracy level typically reaches 90% and above based on the data quality. When dealing with various data types including COBOL, Excel, JSON, video, audio, and MPG files, challenges can arise with incomplete or missing values. This particularly affects GIS data accuracy, such as predicting transport routes or electrical pole placements. While we achieve 90% efficiency, working with historical data versus current data presents challenges in business growth predictions. When encountering fault tolerance issues, we communicate directly with the Apache Spark Streaming development team through LinkedIn channels or their on-site team. They provide customer support where issues can be reported via SMS or email with the file name for solution assistance. The team helps address issues with data frames, data sets, RDD functionality, version migrations, and integration with tools such as Miniconda, Anaconda, and Node.js server. I rate Apache Spark Streaming 9 out of 10.
I would suggest Apache Spark for streaming processing if they want to manage clusters on their own. For serverless options, exploring other use cases could be beneficial. Apache Spark is a good starting point, considering its strong open source community contributions. We have not experienced downtime with Apache Spark Streaming, but we have had crashes. Sometimes the state memory keeps piling up, so we have to make tuning. We had crashes, but not very often. We were able to check specific load times and resolve those issues. On a scale of 1-10, I rate Apache Spark Streaming an 8.
The solution rates a nine out of ten.
Spark does not encounter integration issues, particularly due to its utilization of JDBC connectors. These connectors facilitate seamless integration with third-party solutions. Furthermore, successful integration with tools like SAP HANA indicates its versatility in handling various data sources. Additionally, its performance surpasses Informatica in certain scenarios, especially when real-time streaming capabilities are crucial. It remains a preferred choice for businesses requiring efficient real-time data processing. I rate it an eight.
For those starting with Apache Spark Streaming, I recommend studying and understanding data relationships. While it might seem complex at first, there are helpful resources available. Overall, I would rate Apache Spark Streaming as a nine out of ten.
Apache Spark Streaming has very specific use cases and needs to be evaluated based on the needs of an individual before choosing it. Overall, I rate the solution an eight out of ten.
I would highly recommend Spark Streaming for standard streaming or IoT use cases. The entire Spark ecosystem, including Spark Core, streaming, ML, and other components, can be highly beneficial. It's better to stick with the Spark ecosystem rather than use other platforms and frameworks. For streaming and IoT, Spark Streaming is a great choice. Overall, I would rate the solution an eight out of ten. The only issue I found, at least during the time I actively worked with it, was that it was resource-intensive, even for small-scale applications. In comparison, some other platforms, like Pulsar, had lighter resource consumption and performed better in terms of resource usage and associated costs. At least, to begin with, it performs better with the resource usage and dollar value associated with it. But at least to begin with it is a bit heavy and resource intensive, which is why I rate it an eight.
It's important to be familiar with Spark Streaming and Spark libraries, because familiarity with those scripts and coding languages makes it easier to work with the Spark code ecosystem to get the integrations of Spark Streaming or any Spark cluster creations. I rate this solution eight out of 10.
There are 18 to 20 people needed for maintenance with our 1,000 users. My advice to others is they need to fine-tune their job so the testing becomes important. Fine-tuning becomes important. It would be beneficial to have consulting or some background in the solution before using it. I rate Apache Spark Streaming an eight out of ten.
We are a customer and end-user. We're using it in Azure, in Databricks. I don't know the exact version of Spark I'm using; it's one of the recent ones. I would rate the product an eight out of ten.
It's cheaper for companies to use cloud systems, however, you can implement it on-premise. We use the cloud. As it is the cloud, it's always on the latest version and updates itself regularly. I would rate the product at a nine out of ten. It's very good in terms of its capabilities and I have been very happy with it. I would recommend the solution to other users.
I rate Apache Spark Streaming a six out of ten.