We performed a comparison between Apache Spark vs.Azure Stream Analytics based on our users’ reviews in five categories. After reading all of the collected data, you can find our conclusion below.
Comparison Results: Apache Spark and Azure Stream Analytics come out about equal in this comparison. Some users are more satisfied with Apache Spark’s stability, and pricing, but Azure Stream Analytics has an edge when it comes to ROI and technical support.
"Spark helps us reduce startup time for our customers and gives a very high ROI in the medium term."
"One of Apache Spark's most valuable features is that it supports in-memory processing, the execution of jobs compared to traditional tools is very fast."
"One of the key features is that Apache Spark is a distributed computing framework. You can help multiple slaves and distribute the workload between them."
"There's a lot of functionality."
"Its scalability and speed are very valuable. You can scale it a lot. It is a great technology for big data. It is definitely better than a lot of earlier warehouse or pipeline solutions, such as Informatica. Spark SQL is very compliant with normal SQL that we have been using over the years. This makes it easy to code in Spark. It is just like using normal SQL. You can use the APIs of Spark or you can directly write SQL code and run it. This is something that I feel is useful in Spark."
"The solution has been very stable."
"This solution provides a clear and convenient syntax for our analytical tasks."
"The most valuable features of Azure Stream Analytics are the ease of provisioning and the interface is not terribly complex."
"The integrations for this solution are easy to use and there is flexibility in integrating the tool with Azure Stream Analytics."
"We find the query editor feature of this solution extremely valuable for our business."
"Real-time analytics is the most valuable feature of this solution. I can send the collected data to Power BI in real time."
"The solution has a lot of functionality that can be pushed out to companies."
"I like the IoT part. We have mostly used Azure Stream Analytics services for it"
"It's a product that can scale."
"The life cycle, report management and crash management features are great."
"The initial setup was not easy."
"Spark could be improved by adding support for other open-source storage layers than Delta Lake."
"Apache Spark is very difficult to use. It would require a data engineer. It is not available for every engineer today because they need to understand the different concepts of Spark, which is very, very difficult and it is not easy to learn."
"We are building our own queries on Spark, and it can be improved in terms of query handling."
"Apache Spark could improve the connectors that it supports. There are a lot of open-source databases in the market. For example, cloud databases, such as Redshift, Snowflake, and Synapse. Apache Spark should have connectors present to connect to these databases. There are a lot of workarounds required to connect to those databases, but it should have inbuilt connectors."
"The logging for the observability platform could be better."
"Stream processing needs to be developed more in Spark. I have used Flink previously. Flink is better than Spark at stream processing."
"This solution currently cannot support or distribute neural network related models, or deep learning related algorithms. We would like this functionality to be developed."
"Sometimes when we connect Power BI, there is a delay or it throws up some errors, so we're not sure."
"Azure Stream Analytics could improve by having clearer metrics as to the scale, more metrics around the data set size that is flowing through it, and performance tuning recommendations."
"The solution offers a free trial, however, it is too short."
"The collection and analysis of historical data could be better."
"The initial setup is complex."
"The solution doesn't handle large data packets very efficiently, which could be improved upon."
"The solution could be improved by providing better graphics and including support for UI and UX testing."
"The UI should be a little bit better from a usability perspective."
Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory
Azure Stream Analytics is a robust real-time analytics service that has been designed for critical business workloads. Users are able to build an end-to-end serverless streaming pipeline in minutes. Utilizing SQL, users are able to go from zero to production with a few clicks, all easily extensible with unique code and automatic machine learning abilities for the most advanced scenarios.
Azure Stream Analytics has the ability to analyze and accurately process exorbitant volumes of high-speed streaming data from numerous sources at the same time. Patterns and scenarios are quickly identified and information is gathered from various input sources, such as social media feeds, applications, clickstreams, sensors, and devices. These patterns can then be implemented to trigger actions and launch workflows, such as feeding data to a reporting tool, storing data for later use, or creating alerts. Azure Stream Analytics is also offered on Azure IoT Edge runtime, so the data can be processed on IoT devices.
Reviews from Real Users
“Azure Stream Analytics is something that you can use to test out streaming scenarios very quickly in the general sense and it is useful for IoT scenarios. If I was to do a project with IoT and I needed a streaming solution, Azure Stream Analytics would be a top choice. The most valuable features of Azure Stream Analytics are the ease of provisioning and the interface is not terribly complex.” - Olubisi A., Team Lead at a tech services company.
“It's used primarily for data and mining - everything from the telemetry data side of things. It's great for streaming and makes everything easy to handle. The streaming from the IoT hub and the messaging are aspects I like a lot.” - Sudhendra U., Technical Architect at Infosys
Apache Spark is ranked 1st in Hadoop with 13 reviews while Azure Stream Analytics is ranked 4th in Streaming Analytics with 11 reviews. Apache Spark is rated 8.2, while Azure Stream Analytics is rated 7.8. The top reviewer of Apache Spark writes "Provides fast aggregations, AI libraries, and a lot of connectors". On the other hand, the top reviewer of Azure Stream Analytics writes "A serverless scalable event processing engine with a valuable IoT feature". Apache Spark is most compared with Spring Boot, AWS Batch, AWS Lambda, SAP HANA and Apache NiFi, whereas Azure Stream Analytics is most compared with Amazon Kinesis, Databricks, Apache Flink, Apache Spark Streaming and Amazon MSK.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.