AWS Glue vs StreamSets comparison

 

Comparison Buyer's Guide

Executive Summary
 

Categories and Ranking

AWS Glue
Average Rating
7.8
Number of Reviews
39
Ranking in other categories
Cloud Data Integration (1st)
StreamSets
Average Rating
8.4
Number of Reviews
24
Ranking in other categories
Data Integration (8th)
 

Mindshare comparison

As of July 2024, in the Cloud Data Integration category, the mindshare of AWS Glue is 29.3%, up from 24.7% compared to the previous year. The mindshare of StreamSets is 4.0%, up from 3.0% compared to the previous year. It is calculated based on PeerSpot user engagement data.
Cloud Data Integration
Unique Categories:
No other categories found
Data Integration
1.9%
 

Featured Reviews

Ajaykumar Myana - PeerSpot reviewer
Jul 31, 2023
Provides serverless mechanism, easy data transformation and automated infrastructure management
I had the source data, which was unstructured and non-fixable, and my responsibility was to convert it into structured data. For this task, I used PySpark as the programming language. With Python, I implemented the creation of a data frame using Glue jobs. Since Glue jobs are a serverless…
MI
Mar 17, 2023
It's lightweight and well-integrated, and it saves a lot of money and time
There are so many things that need to be improved. For the StreamSets cloud user interface, there aren't enough use cases and examples for the main problems. In addition, the hybrid data sets cannot be joined in a data connector, which is a significant limitation. There aren't enough hands-on labs, and debugging is also an issue because it takes a lot of time. Logs are not that clear when you are debugging, and you can only select a single source for a pipeline. It isn't helpful when you need to apply the same logic for multiple sources. It becomes difficult because you need to create more pipelines and then add coordination between them. Initially, it's hard to find out or master the logic behind it. It can be hard if you aren't technical enough. There is scope for improvement because it's not straightforward. You need to go through the documentation and make sure that you understand every step. For me, it was a challenging model.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"It is AWS-integrated. There is end-to-end integration with the other AWS services. It is also user-friendly."
"The most valuable feature of AWS Glue is scalability."
"The solution is highly user-friendly, and its features are easy to use. The new addition of AWS Glue Data Catalog is also very beneficial, making the tool even more helpful for its users."
"The most valuable features currently are glue studio, jobs, and triggers."
"I appreciate AWS Glue for its cost-effectiveness."
"AWS Glue is a good solution for developers, they have the ability to write code in different languages and other software."
"The most valuable feature for me is the visual interface of AWS Glue."
"I like the fact that AWS Glue works with Python scripts."
"The Ease of configuration for pipes is amazing. It has a lot of connectors. Mainly, we can do everything with the data in the pipe. I really like the graphical interface too"
"For me, the most valuable features in StreamSets have to be the Data Collector and Control Hub, but especially the Data Collector. That feature is very elegant and seamlessly works with numerous source systems."
"The ETL capabilities are very useful for us. We extract and transform data from multiple data sources, into a single, consistent data store, and then we put it in our systems. We typically use it to connect our Apache Kafka with data lakes. That process is smooth and saves us a lot of time in our production systems."
"It's very easy to integrate. It integrates with Snowflake, AWS, Google Cloud, and Azure. It's very helpful for DevOps, DataOps, and data engineering because it provides a comprehensive solution, and it's not complicated."
"I have used Data Collector, Transformer, and Control Hub products from StreamSets. What I really like about these products is that they're very user-friendly. People who are not from a technological or core development background find it easy to get started and build data pipelines and connect to the databases. They would be comfortable like any technical person within a couple of weeks."
"Also, the intuitive canvas for designing all the streams in the pipeline, along with the simplicity of the entire product are very big pluses for me. The software is very simple and straightforward. That is something that is needed right now."
"It is a very powerful, modern data analytics solution, in which you can integrate a large volume of data from different sources. It integrates all of the data and you can design, create, and monitor pipelines according to your requirements. It is an all-in-one day data ops solution."
"It is really easy to set up and the interface is easy to use."
 

Cons

"The monitoring is not that good."
"The product is expensive for data streaming. This area needs improvement."
"There should be more connectors for different databases."
"The mapping area and the use of the data catalog from Glue could be better."
"If there's a cluster-related configuration, we have to make worker notes, which is quite a headache when processing a large amount of data."
"The solution's visual ETL tool is of no use for actual implementation."
"I would like to see a more robust interface on the no-code side. This would be nice to be able to split cells."
"I haven't looked into Glue in terms of seeking out flaws. I've not come across missing features."
"The documentation is inadequate and has room for improvement because the technical support does not regularly update their documentation or the knowledge base."
"The design experience is the bane of our existence because their documentation is not the best. Even when they update their software, they don't publish the best information on how to update and change your pipeline configuration to make it conform to current best practices. We don't pay for the added support. We use the "freeware version." The user community, as well as the documentation they provide for the standard user, are difficult, at best."
"We've seen a couple of cases where it appears to have a memory leak or a similar problem."
"The data collector in StreamSets has to be designed properly. For example, a simple database configuration with MySQL DB requires the MySQL Connector to be installed."
"In terms of the product, I don't think there is any room for improvement because it is very good. One small area of improvement that is very much needed is on the knowledge base side. Sometimes, it is not very clear how to set up a certain process or a certain node for a person who's using the platform for the first time."
"One thing that I would like to add is the ability to manually enter data. The way the solution currently works is we don't have the option to manually change the data at any point in time. Being able to do that will allow us to do everything that we want to do with our data. Sometimes, we need to manually manipulate the data to make it more accurate in case our prior bifurcation filters are not good. If we have the option to manually enter the data or make the exact iterations on the data set, that would be a good thing."
"StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target."
"The monitoring visualization is not that user-friendly. It should include other features to visualize things, like how many records were streamed from a source to a destination on a particular date."
 

Pricing and Cost Advice

"The solution's pricing is based on DPUs so it is a good idea to optimize use or it can get expensive."
"AWS Glue is quite costly, especially for small organizations."
"If you are using the solution for an enterprise business, it will be expensive."
"I would rate the solution a six or seven on a scale of one to ten, with ten being very expensive. Specifically, I rate its pricing a six out of ten."
"AWS Glue follows a pay-as-you-go model, wherein the cost of the data you use will be counted as a monthly bill."
"I rate the tool an eight on a scale of one to ten, where one is expensive, and ten is expensive."
"I rate the tool's pricing a four out of ten."
"I rate the product's pricing a five on a scale of one to ten, where one is a high price, and ten is a low price."
"We use the free version. It's great for a public, free release. Our stance is that the paid support model is too expensive to get into. They should honestly reevaluate that."
"StreamSets Data Collector is open source. One can utilize the StreamSets Data Collector, but the Control Hub is the main repository where all the jobs are present. Everything happens in Control Hub."
"It has a CPU core-based licensing, which works for us and is quite good."
"StreamSets is an expensive solution."
"The overall cost is very flexible so it is not a burden for our organization... However, the cost should be improved. For small and mid-size organizations it might be a challenge."
"The overall cost for small and mid-size organizations needs to be better."
"The pricing is too fixed. It should be based on how much data you need to process. Some businesses are not so big that they process a lot of data."
"We are running the community version right now, which can be used free of charge."
report
Use our free recommendation engine to learn which Cloud Data Integration solutions are best for your needs.
792,905 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
20%
Computer Software Company
14%
Manufacturing Company
8%
Insurance Company
7%
Financial Services Firm
17%
Computer Software Company
13%
Manufacturing Company
8%
Government
7%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

How do you select the right cloud ETL tool?
AWS Glue and Azure Data factory for ELT best performance cloud services.
How does Talend Open Studio compare with AWS Glue?
We reviewed AWS Glue before choosing Talend Open Studio. AWS Glue is the managed ETL (extract, transform, and load) from Amazon Web Services. AWS Glue enables AWS users to create and manage jobs in...
What are the most common use cases for AWS Glue?
AWS Glue's main use case is for allowing users to discover, prepare, move, and integrate data from multiple sources. The product lets you use this data for analytics, application development, or ma...
What do you like most about StreamSets?
The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customiz...
What needs improvement with StreamSets?
We often faced problems, especially with SAP ERP. We struggled because many columns weren't integers or primary keys, which StreamSets couldn't handle. We had to restructure our data tables, which ...
What is your primary use case for StreamSets?
StreamSets is used for data transformation rather than ETL processes. It focuses on transforming data directly from sources without handling the extraction part of the process. The transformed data...
 

Learn More

 

Overview

 

Sample Customers

bp, Cerner, Expedia, Finra, HESS, intuit, Kellog's, Philips, TIME, workday
Availity, BT Group, Humana, Deluxe, GSK, RingCentral, IBM, Shell, SamTrans, State of Ohio, TalentFulfilled, TechBridge
Find out what your peers are saying about AWS Glue vs. StreamSets and other solutions. Updated: July 2024.
792,905 professionals have used our research since 2012.