Try our new research platform with insights from 80,000+ expert users

AWS Glue vs StreamSets comparison

 

Comparison Buyer's Guide

Executive Summary
 

Categories and Ranking

AWS Glue
Average Rating
7.8
Number of Reviews
41
Ranking in other categories
Cloud Data Integration (1st)
StreamSets
Average Rating
8.4
Number of Reviews
24
Ranking in other categories
Data Integration (9th)
 

Featured Reviews

Ajaykumar Myana - PeerSpot reviewer
Jul 31, 2023
Provides serverless mechanism, easy data transformation and automated infrastructure management
I had the source data, which was unstructured and non-fixable, and my responsibility was to convert it into structured data. For this task, I used PySpark as the programming language. With Python, I implemented the creation of a data frame using Glue jobs. Since Glue jobs are a serverless…
Reyansh Kumar - PeerSpot reviewer
Mar 10, 2023
We no longer need to hire highly skilled data engineers to create and monitor data pipelines
The things I like about StreamSets are its * overall user interface * efficiency * product features, which are all good. Also, the scheduling within the data engineering pipeline is very much appreciated, and it has a wide range of connectors for connecting to any data sources like SQL Server, AWS, Azure, etc. We have used it with Kafka, Hadoop, and Azure Data Factory Datasets. Connecting to these systems with StreamSets is very easy. You just need to configure the data sources, the paths and their configurations, and you are ready to go. It is very efficient and very easy to use for ETL pipelines. It is a GUI-based interface in which you can easily create or design your own data pipelines with just a few clicks. As for moving data into modern analytics systems, we are using it with Microsoft Power BI, AWS, and some on-premises solutions, and it is very easy to get data from StreamSets into them. No hardcore coding or special technical expertise is required. It is also a no-code platform in which you can configure your data sources and data output for easy configuration of your data pipeline. This is a very important aspect because if a tool requires code development, we need to hire software developers to get the task done. By using StreamSets, it can be done with a few clicks.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"It's fairly straightforward as a product; it's not very complicated."
"Our entire use case was very easily handled or solved using this solution."
"AWS Glue is quite better than other tools, but you have to learn it properly before you start using it."
"It's very good to manage."
"The most valuable feature of AWS Glue is that it provides a GUI format with a drag-and-drop feature."
"Its ease of use, cost-effectiveness, and highly secure architecture are some of the most valuable features."
"The solution is highly user-friendly, and its features are easy to use. The new addition of AWS Glue Data Catalog is also very beneficial, making the tool even more helpful for its users."
"What I like best about AWS Glue is its real-time data backup feature. Last week, there was a production push, and what used to take almost ten days to send out around fifty-six thousand emails now takes only two hours."
"The most valuable features are the option of integration with a variety of protocols, languages, and origins."
"The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customize it to do what you need. Many other tools have started to use features similar to those introduced by StreamSets, like automated workflows that are easy to set up."
"The ETL capabilities are very useful for us. We extract and transform data from multiple data sources, into a single, consistent data store, and then we put it in our systems. We typically use it to connect our Apache Kafka with data lakes. That process is smooth and saves us a lot of time in our production systems."
"The most valuable feature is the pipelines because they enable us to pull in and push out data from different sources and to manipulate and clean things up within them."
"Important features include that it comprises lots of functionality to connect data from various sources through connector availability, scheduling pipelines at any time, and integration with third-party and security solutions for encryption."
"What I love the most is that StreamSets is very light. It's a containerized application. It's easy to use with Docker. If you are a large organization, it's very easy to use Kubernetes."
"It is really easy to set up and the interface is easy to use."
"The entire user interface is very simple and the simplicity of creating pipelines is something that I like very much about it. The design experience is very smooth."
 

Cons

"AWS Glue is more costly compared to other tools like Airflow."
"Cost-wise, AWS Glue is expensive, so that's an area for improvement. The process for setting up the solution was also complex, which is another area for improvement."
"The start-up time is really high right now. For instance, when you start up a new job, you have to wait for five or eight minutes before it starts. If the start-up time is reduced to one or two minutes, it will be great. It will be better to have a direct linkage to Redshift in AWS. If we can use data catalogs from Redshift, it will be so easy to create some data catalogs. Currently, we can only use data catalogs from S3."
"AWS Glue's error handling is difficult."
"If there's a cluster-related configuration, we have to make worker notes, which is quite a headache when processing a large amount of data."
"I have encountered challenges with multi-region support."
"The product has only a few built-in transformations."
"There is a learning curve to this tool."
"Sometimes, it is not clear at first how to set up nodes. A site with an explanation of how each node works would be very helpful."
"The logging mechanism could be improved. If I am working on a pipeline, then create a job out of it and it is running, it will generate constant logs. So, the logging mechanism could be simplified. Now, it is a bit difficult to understand and filter the logs. It takes some time."
"The monitoring visualization is not that user-friendly. It should include other features to visualize things, like how many records were streamed from a source to a destination on a particular date."
"The documentation is inadequate and has room for improvement because the technical support does not regularly update their documentation or the knowledge base."
"The software is very good overall. Areas for improvement are the error logging and the version history. I would like to see better, more detailed error logging information."
"One area for improvement could be the cloud storage server speed, as we have faced some latency issues here and there."
"We've seen a couple of cases where it appears to have a memory leak or a similar problem."
"We create pipelines or jobs in StreamSets Control Hub. It is a great feature, but if there is a way to have a folder structure or organize the pipelines and jobs in Control Hub, it would be great. I submitted a ticket for this some time back."
 

Pricing and Cost Advice

"I rate the product's pricing a five on a scale of one to ten, where one is a high price, and ten is a low price."
"This solution is affordable and there is an option to pay for the solution based on your usage."
"AWS Glue is a high-priced solution that bills the client $150,000 to $250,000 annually."
"The pricing is a bit higher than other solutions like Athena and EC2. If the pricing becomes more scaled or flexible, it will be good because you have to pay 44 cents just for one DPU for an hour. If you increase DPUs to 5 or 10, the pricing gets multiplied. There are also some time limits like 0 to 10 minutes or 10 to 20 minutes. If the pricing is according to the minutes, it would be better because you have to limit your job to 10 minutes or 20 minutes."
"AWS Glue uses a pay-as-you-go approach which is helpful. The price of the overall solution is low and is a great advantage."
"It is not expensive. AWS Glue works on the serverless architecture. We get charged for the time the server is up. For our use case, we have to use it once in a day, and it is not expensive for us."
"The solution's pricing is based on DPUs so it is a good idea to optimize use or it can get expensive."
"If you are using the solution for an enterprise business, it will be expensive."
"We are running the community version right now, which can be used free of charge."
"Its pricing is pretty much up to the mark. For smaller enterprises, it could be a big price to pay at the initial stage of operations, but the moment you have the Seed B or Seed C funding and you want to scale up your operations and aren't much worried about the funds, at that point in time, you would need a solution that could be scaled."
"StreamSets is an expensive solution."
"The overall cost is very flexible so it is not a burden for our organization... However, the cost should be improved. For small and mid-size organizations it might be a challenge."
"The overall cost for small and mid-size organizations needs to be better."
"The pricing is too fixed. It should be based on how much data you need to process. Some businesses are not so big that they process a lot of data."
"It has a CPU core-based licensing, which works for us and is quite good."
"I believe the pricing is not equitable."
report
Use our free recommendation engine to learn which Cloud Data Integration solutions are best for your needs.
813,418 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
21%
Computer Software Company
14%
Manufacturing Company
8%
Insurance Company
6%
Financial Services Firm
18%
Computer Software Company
13%
Manufacturing Company
10%
Government
7%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

How do you select the right cloud ETL tool?
AWS Glue and Azure Data factory for ELT best performance cloud services.
How does Talend Open Studio compare with AWS Glue?
We reviewed AWS Glue before choosing Talend Open Studio. AWS Glue is the managed ETL (extract, transform, and load) from Amazon Web Services. AWS Glue enables AWS users to create and manage jobs in...
What are the most common use cases for AWS Glue?
AWS Glue's main use case is for allowing users to discover, prepare, move, and integrate data from multiple sources. The product lets you use this data for analytics, application development, or ma...
What do you like most about StreamSets?
The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customiz...
What needs improvement with StreamSets?
We often faced problems, especially with SAP ERP. We struggled because many columns weren't integers or primary keys, which StreamSets couldn't handle. We had to restructure our data tables, which ...
What is your primary use case for StreamSets?
StreamSets is used for data transformation rather than ETL processes. It focuses on transforming data directly from sources without handling the extraction part of the process. The transformed data...
 

Learn More

Video not available
 

Overview

 

Sample Customers

bp, Cerner, Expedia, Finra, HESS, intuit, Kellog's, Philips, TIME, workday
Availity, BT Group, Humana, Deluxe, GSK, RingCentral, IBM, Shell, SamTrans, State of Ohio, TalentFulfilled, TechBridge
Find out what your peers are saying about AWS Glue vs. StreamSets and other solutions. Updated: October 2024.
813,418 professionals have used our research since 2012.