Try our new research platform with insights from 80,000+ expert users

Apache Spark Streaming vs Starburst Galaxy comparison

 

Comparison Buyer's Guide

Executive Summary

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Categories and Ranking

Apache Spark Streaming
Ranking in Streaming Analytics
7th
Average Rating
7.8
Reviews Sentiment
6.4
Number of Reviews
17
Ranking in other categories
No ranking in other categories
Starburst Galaxy
Ranking in Streaming Analytics
12th
Average Rating
9.8
Reviews Sentiment
1.0
Number of Reviews
9
Ranking in other categories
Data Science Platforms (9th)
 

Mindshare comparison

As of October 2025, in the Streaming Analytics category, the mindshare of Apache Spark Streaming is 3.6%, up from 3.4% compared to the previous year. The mindshare of Starburst Galaxy is 1.3%, down from 1.3% compared to the previous year. It is calculated based on PeerSpot user engagement data.
Streaming Analytics Market Share Distribution
ProductMarket Share (%)
Apache Spark Streaming3.6%
Starburst Galaxy1.3%
Other95.1%
Streaming Analytics
 

Featured Reviews

Himansu Jena - PeerSpot reviewer
Efficient real-time data management and analysis with advanced features
There are various ways we can improve Apache Spark Streaming through best practices. The initial part requires attention to batch interval tuning, which helps small intervals in micro batches based on latency requirements and helps prevent back pressure. We can use data formats such as Parquet or ORC for storage that needs faster reads and leveraging feature predicate push-down optimizations. We can implement serialization which helps with any Kyro in terms of .NET or Java. We have boxing and unboxing serialization for XML and JSON for converting key-pair values stored in browser. We can also implement caching mechanisms for storing and recomputing multiple operations. We can use specified joins which help with smaller databases, and distributed joins can minimize users. We can implement project optimization memory for CPU efficiency, known as Tungsten. Additionally, load balancing, checkpointing, and schema evaluation are areas to consider based on performance and bottlenecks. We can use Bugzilla tools for tracking and Splunk to monitor the performance of process systems, utilization, and performance based on data frames or data sets.
Stephen-Howard - PeerSpot reviewer
Federated querying delivers integrated data at record speed and reduces processing time
The biggest win has been the ability to combine data from multiple sources and deliver it to the business at record speed. This capability has allowed us to query directly through Starburst Galaxy, enabling teams to access integrated data that would otherwise be hard to pull together. This has reduced both our ETL processing time and storage costs. We are answering questions that would have been hard, if not impossible, to answer previously because the data came from disparate, disconnected sources.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"Spark Streaming is critical, quite stable, full-featured, and scalable."
"The main benefits of Apache Spark Streaming include cost savings, time savings, and efficiency improvements about data storage."
"Apache Spark's capabilities for machine learning are quite extensive and can be used in a low-code way."
"The solution is better than average and some of the valuable features include efficiency and stability."
"The solution is very stable and reliable."
"I appreciate Apache Spark Streaming's micro-batching capabilities; the watermarking functionality and related features are quite good."
"By integrating Apache Spark Streaming, the data freshness rate, and latency have significantly improved from 24-hour batch processing to less than one minute, facilitating faster communication to downstream systems, aiding marketing campaigns."
"Apache Spark Streaming has features like checkpointing and Streaming API that are useful."
"Starburst has provided us with virtually guaranteed performance on complex queries across datasets that are in the tens of gigabytes which complete in seconds."
"Starburst has provided us with virtually guaranteed performance on complex queries across datasets that are in the tens of gigabytes which complete in seconds."
"The most fundamental feature is the query engine, which is much faster than any of the competitors; Starburst is able to finish most queries within 10 seconds, which is especially important for many non-technical employees."
"Starburst Galaxy has improved our organization by unifying access to all major data sources, reducing the need for complex ETL processes."
"Starburst Galaxy is becoming a cornerstone of our data platform, empowering us to make smarter and faster decisions across the organization."
"Starburst Galaxy serves as our primary SQL-based data processing engine, a strategic decision driven by its seamless integration with our AWS cloud infrastructure and its ability to deliver high performance with low-latency responses."
"Starburst on Trino, combined with our SQL-native data transformation tool SQLMesh, has delivered anywhere from a two to five times improvement in compute performance across our transformation DAG."
"Starburst Galaxy has improved our organization by unifying access to all major data sources, reducing the need for complex ETL processes."
 

Cons

"The solution itself could be easier to use."
"The initial setup is quite complex."
"We don't have enough experience to be judgmental about its flaws."
"Monitoring is an area where they could definitely improve Apache Spark Streaming. When you have a streaming application, it generates numerous logs. After some time, the logs become meaningless because they're quite large and impossible to open."
"There could be an improvement in the area of the user configuration section, it should be less developer-focused and more business user-focused."
"Integrating event-level streaming capabilities could be beneficial."
"The downside is when you have this the other way around in the columns, it becomes really hard to use."
"One improvement I would expect is real-time processing instead of micro-batch or near real-time."
"Multi-tenancy could be improved. In order to have multiple environments for SSO, we maintain multiple tenants that are connected to different AWS accounts via the Marketplace."
"Cluster startup time is another pain point, typically 3 to 5 minutes, which is not the worst with proper planning but can be annoying for ad-hoc work."
"The most persistent issue is the cluster spin-up time."
"Cluster startup time can be slow, sometimes taking over a minute."
"I would like Starburst to leverage AI to improve usability. Data lakes are complicated and difficult for users to explore."
 

Pricing and Cost Advice

"People pay for Apache Spark Streaming as a service."
"I was using the open-source community version, which was self-hosted."
"On a scale from one to ten, where one is expensive, or not cost-effective, and ten is cheap, I rate the price a seven."
"Spark is an affordable solution, especially considering its open-source nature."
Information not available
report
Use our free recommendation engine to learn which Streaming Analytics solutions are best for your needs.
869,202 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Computer Software Company
23%
Financial Services Firm
20%
Healthcare Company
6%
University
6%
Financial Services Firm
29%
Computer Software Company
14%
Government
8%
Consumer Goods Company
6%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
By reviewers
Company SizeCount
Small Business9
Midsize Enterprise2
Large Enterprise7
By reviewers
Company SizeCount
Small Business4
Midsize Enterprise2
Large Enterprise1
 

Questions from the Community

What do you like most about Apache Spark Streaming?
Apache Spark Streaming is versatile. You can use it for competitive intelligence, gathering data from competitors, or for internal tasks like monitoring workflows.
What needs improvement with Apache Spark Streaming?
I believe the downsides of Apache Spark Streaming are that it primarily supports structured data. Currently, in my organization, we require thousands of transcripts that need to be handled during l...
What is your primary use case for Apache Spark Streaming?
My use cases for Apache Spark Streaming were during my academics. During that time, I used Apache Spark Streaming to transmit data live from one source to another.
What is your experience regarding pricing and costs for Starburst Galaxy?
You pay for cluster uptime. It is important to be aggressive about autoscaling, as a single worker will get you a long way. I recommend never connecting a BI tool to your Galaxy cluster. Instead, w...
What needs improvement with Starburst Galaxy?
As a hosted option, I wish I had more control over the cluster configuration, specifically regarding some of the more advanced options. Trino is extremely flexible and powerful, but some of this fu...
What is your primary use case for Starburst Galaxy?
I use Starburst as a cost-efficient hosted option for Trino for data integration and ad-hoc analysis across a broad range of data sources. It is surprisingly useful to query SQL Server, a Google Sh...
 

Also Known As

Spark Streaming
No data available
 

Overview

 

Sample Customers

UC Berkeley AMPLab, Amazon, Alibaba Taobao, Kenshoo, eBay Inc.
Information Not Available
Find out what your peers are saying about Apache Spark Streaming vs. Starburst Galaxy and other solutions. Updated: September 2025.
869,202 professionals have used our research since 2012.