We are using StreamSets for batch loading.
StreamSets streamlines data pipeline creation, connecting data from multiple sources to destinations like cloud platforms with minimal coding. Its centralized platform and intuitive design enhance ETL and data migration processes.


| Product | Mindshare (%) |
|---|---|
| StreamSets | 1.2% |
| Informatica Intelligent Data Management Cloud (IDMC) | 3.7% |
| SSIS | 3.6% |
| Other | 91.5% |
| Type | Title | Date | |
|---|---|---|---|
| Category | Data Integration | Jun 22, 2026 | Download |
| Product | Reviews, tips, and advice from real users | Jun 22, 2026 | Download |
| Comparison | StreamSets vs Informatica Intelligent Data Management Cloud (IDMC) | Jun 22, 2026 | Download |
| Comparison | StreamSets vs SSIS | Jun 22, 2026 | Download |
| Comparison | StreamSets vs Informatica PowerCenter | Jun 22, 2026 | Download |
| Title | Rating | Mindshare | Recommending | |
|---|---|---|---|---|
| Informatica Intelligent Data Management Cloud (IDMC) | 4.0 | 3.7% | 92% | 215 interviewsAdd to research |
| SSIS | 3.8 | 3.6% | 80% | 76 interviewsAdd to research |
| Company Size | Count |
|---|---|
| Small Business | 7 |
| Midsize Enterprise | 2 |
| Large Enterprise | 9 |
| Company Size | Count |
|---|---|
| Small Business | 106 |
| Midsize Enterprise | 42 |
| Large Enterprise | 190 |
StreamSets integrates seamlessly with analytics platforms, offering tools such as Data Collector and Control Hub to facilitate data ingestion, transformation, and machine learning integrations. Its user-friendly interface and ready connectors aid in configuring complex data pipelines. With built-in data drift resilience and scheduling options, users experience efficient, scalable data management, despite challenges like latency in cloud storage and interface enhancement needs. Users often employ StreamSets for batch loading, real-time data processing, and smart data pipeline management, offering comprehensive data integration solutions.
What are the key features of StreamSets?In industries like finance and technology, StreamSets supports data migration, machine learning integrations, and analytics by simplifying data transformation and enhancing decision-making capabilities through its robust pipeline management.
Availity, BT Group, Humana, Deluxe, GSK, RingCentral, IBM, Shell, SamTrans, State of Ohio, TalentFulfilled, TechBridge
| Author info | Rating | Review Summary |
|---|---|---|
| Enterprise Solutions Architect at a energy/utilities company with 1,001-5,000 employees | 4.5 | I find StreamSets excellent for batch loading, praising its GUI and hybrid model. My main concern is memory usage with large datasets, requiring EC2 upgrades. I also desire a lineage graph and smoother customer support transitions. |
| Senior Data Platform Manager at a manufacturing company with 10,001+ employees | 4.0 | I rate StreamSets 8/10. It excels at data transformation, offering useful plugins and scalability for large data. However, I faced issues with SAP ERP data types, frequent pipeline failures, and expensive licensing. Despite this, it's a cost-effective alternative. |
| Sales Manager at Soft Hostings Limited | 4.0 | I use StreamSets for developing data feeds and managing scheduling options, valuing its integration with various protocols and ease of use. However, issues with reconnection and outdated documentation need improvement. Despite its cost, it has delivered significant ROI and improved client trust. |
| Technical Specialist at Accenture | 4.5 | We use StreamSets for healthcare data engineering, appreciating its user-friendly interface, easy connectivity with sources like SQL Server and Azure, and efficient pipeline management. Despite some UI improvements needed, it offers a significant ROI and reduces coding needs. |
| Senior Software Developer at a tech vendor with 10,001+ employees | 4.5 | We use StreamSets for data integration and analytics, appreciating its Control Hub and Data Collector features. It's cost-effective, requires minimal coding, and improves efficiency by 20-25%. Despite great features, error logging could be more detailed. |
| Principal Engineer at Tata Consultancy Services | 4.5 | I find StreamSets very user-friendly, enabling easy data pipeline building for diverse use cases without coding. Its data drift resilience saves significant time and reduces the need for specialized skills, accelerating data analytics and meeting SLAs, despite minor UI improvements needed in Control Hub. |
| Software Engineer at Soft Hostings Limited | 5.0 | StreamSets is used in our IT department for stable configuration, data analytics, and real-time predictions. It's user-friendly, integrates well with platforms like Kafka, but needs better debugging tools. We've achieved a 40% ROI using AWS. |
| Director Data Engineering, Governance, Operation and Analytics Platform at a financial services firm with 10,001+ employees | 4.0 | I use StreamSets for cloud data migration, appreciating its many connectors, media format support, and easy pipeline management. It's stable, scalable, and economical. My main concern is the lack of built-in data quality assessment during data movement. |
| Product Manager at a hospitality company with 51-200 employees | 4.0 | StreamSets effectively bifurcated healthcare data, improving accuracy and saving my organization significant time and cost by automating tasks. Although setup was tricky and manual data entry is missing, it proved stable and valuable for our needs. |
| CEO-founder at Tubayo | 4.5 | I use StreamSets for data lakes and transformation, valuing its no-code pipelines, ease of integration, and significant time and cost savings. Despite some node setup and UX issues, it's stable and delivers great ROI. |
StreamSets is GUI-based and takes care of load balancing. It allows a hybrid installation approach, rather than being completely cloud-based or on-premises. Additionally, StreamSets provides good enterprise support with a quick turnaround.
One issue I observed with StreamSets is that the memory runs out quickly when processing large volumes of data. Because of this memory issue, we have to upgrade our EC2 boxes in the Amazon AWS infrastructure. I had to switch to a new EC2 box, even though the processor was not fully utilized. It would be beneficial if StreamSets addressed any potential memory leak issues to prevent unnecessary upgrades. Additionally, it would be a great enhancement if StreamSets could produce a lineage graph to visualize how the data has passed through the system.
I started using StreamSets in 2022, so it's been almost four years now.
From one to ten, I would rate the stability of the product at eight point five.
For scalability, I would also rate it at eight point five.
IBM technical support sometimes transfers tickets between different teams due to shift changes, which can be frustrating. The transition can make resolution slow, as I have to explain the issue multiple times. Overall, I would rate the technical support as eight out of ten.
Positive
The initial setup of StreamSets isn't simple, but it's not too complex either. It’s a standard setup and is fine.
StreamSets is the leader in the market. There are many products, and the choice depends on needed features and use cases, but I view StreamSets as the leader due to its capabilities.
If asked, I definitely recommend StreamSets to other users. My overall rating for the solution is nine.

StreamSets is used for data transformation rather than ETL processes. It focuses on transforming data directly from sources without handling the extraction part of the process. The transformed data is loaded into Amazon Redshift or other data warehousing solutions.
The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customize it to do what you need. Many other tools have started to use features similar to those introduced by StreamSets, like automated workflows that are easy to set up.
We often faced problems, especially with SAP ERP. We struggled because many columns weren't integers or primary keys, which StreamSets couldn't handle. We had to restructure our data tables, which was painful. Also, pipeline failures were common, and data drifting wasn't addressed, which made things worse. Licensing was another issue we encountered.
I have been working with the product for five years.
The tool's flexibility and performance are good. It allows for task dependency management so others won't be affected if one task fails. It can handle large volumes of data and supports features like change data capture for tracking changes.
Around six months ago, many people in my company were using StreamSets. In the US team, about 42 people across different projects were using it. Similarly, in 2021, there were around 43 users. About 16-18 people in Mumbai used it in my previous company.
The tool's support is good.
Installing StreamSets can take time because it has two versions: a data controller and a data transformer. The data controller is easier to install, but the transformer is more complicated and requires more steps, like setting up tasks and configurations.
It would be best to ensure the environment was ready, including that it worked well with other servers. The process can be both easy and difficult, but if you follow the documentation, it should be manageable.
Whether the tool is worth the money depends on the situation. If you don't want to spend a lot on competing products like Databricks or Glue, then StreamSets might be a better option. It's particularly valuable if you prefer not to invest heavily in training your team on new technologies. If your ETL developers or data engineers are comfortable with StreamSets, it can be worth the money.
The licensing is expensive, and there are other costs involved too. I know from using the software that you have to buy new features whenever there are new updates, which I don't really like. But initially, it was very good.
We use various tools and alerting systems to notify us of pipeline errors or failures. StreamSets supports data governance and compliance by allowing us to encrypt incoming data based on specified rules. We can easily encrypt columns by providing the column name and hash key.
If you're considering using StreamSets for the first time, I would advise first understanding why you want to use it and how it will benefit you. If you're dealing with change tracking or handling large amounts of data, it could be cost-effective compared to services like Amazon. It's easy to schedule and manage tasks with the tool, and you can enhance your skills as an ETL developer. You can easily migrate traditional pipelines built on platforms like Informatica or Talend to StreamSets. I rate the overall solution an eight out of ten.

I use StreamSets to develop data feeds for different balance streams, I use it to control options for scheduling my data plane, and for internal version control.
The design experience when implementing batch streaming or ECL pipelines is very easy and straightforward.
When we initially attempted to integrate StreamSets with Kafka, it was somewhat challenging until we consulted the documentation, after which it became straightforward.
We use StreamSets to move data into modern analytics platforms. Moving the data into modern analytics platforms is still complex. It requires a lot of understanding of logic.
StreamSets enables us to build data pipelines without knowing how to code. StreamSets' ability to build data pipelines without requiring us to know complex programming is very important, as it allows us to focus on our projects without spending time writing code.
StreamSets' Transformer for Snowflake is simple to use for designing both simple and complex transformation logic. StreamSets' Transformer for Snowflake is extremely important to me as it helps me to connect external data sources and keep my internal workflow organized. Transformer for Snowflake's functionality is a perfect ten out of ten.
It is important and cost-effective that Transformer for Snowflake is a serverless engine embedded within the platform, as without this feature, it would be very expensive. This feature helps us to sell at lower budget costs, which would otherwise be at a high cost with other servers.
StreamSets has helped improve our organization. StreamSets simplified pipelines for our organization. It is easier to complete a project when we know where and how to start, and working with the team remotely makes it more efficient. This helps us to save time and be more organized when creating data pipelines. Being a structured company that produces reliable resources for our application benefits both our clients and contacts.
StreamSets' built-in data drift resilience plays a part in our ETL operations.
With prior knowledge, the built-in data drift resilience is very effective, but it can be challenging to implement without the preexisting knowledge.
The built-in data drift resilience reduced the time it takes us to fix data drift breakages by 45 percent.
StreamSets helped us break down data silos within our organization.
The use of StreamSets to break down data silos enabled us to be confident in the services and products we provide, as well as the real-time streaming we offer. This has had a positive impact on our business, as it allowed us to accurately determine the analytics we need to present to stakeholders, clients, and our sources while ensuring that the process is secure and transparent.
StreamSets saved us time because anyone can use StreamSets not just developers. We can save around 40 percent of our time. StreamSets' reusable assets helped us reduce workload by around 25 percent.
StreamSets saved us money by not having to hire developers with specialized skills. We saved around $2,000 US.
StreamSets helped us scale our data operations. Since StreamSets makes it easy to scale our data operations, it enabled us to know exactly where to start at any time. We are aware of the timeline for completing the project, and depending on our familiarity with the software, we can come up with a solution quickly.
The most valuable features are the option of integration with a variety of protocols, languages, and origins. I used the solution to integrate with Kafka and send emails and different types of data feeds. The UI is quite nice and easy to use, making it a simple task for me to find the processes, execute them, and achieve my goals.
I identified that if the connection is disconnected and the pipeline is restarted, it sometimes does not reconnect and that has room for improvement.
The documentation is inadequate and has room for improvement because the technical support does not regularly update their documentation or the knowledge base. This leads to discrepancies between the software and the documentation, making it difficult to understand.
I have used the solution for two years.
StreamSets is stable and we have never had downtime.
StreamSets is scalable and we can add as many protocols as required to meet our needs.
If we contact the technical support team, they will ensure that our issue is resolved promptly.
Neutral
The initial setup was complex; we had to contact the technical team for assistance in order to begin the deployment and get synced. This was difficult because we lacked the necessary knowledge. However, we read the instructions and contacted support, the process became much simpler.
The deployment took around seven business days.
We used a team of three for the deployment.
The implementation was completed in-house.
We have seen a great return on investment from the package we are paying for, which is expensive. On the other hand, we are collaborating with other people in the organization to ensure it is delivered on time. We are producing live resources and reliable data, creating everything in a very secure and transparent way. I think StreamSets improved trust with the clients using our services, which can attract more sales. Since we started using StreamSets, we have saved around 40 percent.
StreamSets is an expensive solution.
I give the solution an eight out of ten.
We use StreamSets in multiple departments, not just in different locations such as the IT department and the software team. We have ten people using StreamSets in our organization.
StreamSets does not require any maintenance.
I highly recommend StreamSets as it is a highly customizable streaming application. However, before selecting this application, the user must analyze their data transfer requirements and mode of transfer. For example, StreamSets is expensive. The data combined nodes are good, but they still need to be configured correctly. Data processing and file conversion can slow down the process, so it is important to have enough memory to support the requirements.

Our company builds products mainly for healthcare divisions and we use StreamSets for all our data engineering tasks.
StreamSets helps us with our data engineering, creating data sets and pipelines, and streaming data sets to enable us to utilize all the databases and resources for our back-end use cases.
It saves our organization time, about 25 to 30 percent, by automating the data pipelines. And automating the data pipelines helps with our overall efficiency because we are eliminating all of our manual efforts. We have multiple ETL processes running within its pipelines, scheduled and running directly from StreamSets, and they are all running very efficiently.
And because it has built-in log analytics capabilities, you can easily analyze the logs in case a pipeline fails. That helps save time and configure them as required.
Another benefit is that now, every employee is empowered on their own to monitor the data activities within their own credentials. That helps break the data silos. It also has a dashboard in which you can monitor the progress of your data pipeline in a single place. Whenever our C-level requests data operations reports we can easily fetch them from StreamSets.
Previously, we had five to six resources dedicated to designing and deploying our data pipelines, but now we have eliminated three or four resources. We have only one working on the data pipelines and monitoring StreamSets. We used to need to hire highly skilled data engineers for these tasks, but after deploying StreamSets we saved on the cost of resources.
The things I like about StreamSets are its
Also, the scheduling within the data engineering pipeline is very much appreciated, and it has a wide range of connectors for connecting to any data sources like SQL Server, AWS, Azure, etc. We have used it with Kafka, Hadoop, and Azure Data Factory Datasets. Connecting to these systems with StreamSets is very easy. You just need to configure the data sources, the paths and their configurations, and you are ready to go.
It is very efficient and very easy to use for ETL pipelines. It is a GUI-based interface in which you can easily create or design your own data pipelines with just a few clicks.
As for moving data into modern analytics systems, we are using it with Microsoft Power BI, AWS, and some on-premises solutions, and it is very easy to get data from StreamSets into them. No hardcore coding or special technical expertise is required.
It is also a no-code platform in which you can configure your data sources and data output for easy configuration of your data pipeline. This is a very important aspect because if a tool requires code development, we need to hire software developers to get the task done. By using StreamSets, it can be done with a few clicks.
The user interface requires some corrections in terms of the menu settings, menu items, and report generation. Also, report generation takes some time.
I have been using StreamSets for three years.
It is stable now. Earlier it was not. All the products in the market are fully matured and stable now, and StreamSets is one of them. We are fully satisfied with that aspect. The stability is a 10 out of 10.
It is a highly scalable solution. It can be scaled up as and when required. When our data velocity is high, we can scale it. We pay for the scalability charges, nothing else. The scalability is also a 10 out of 10.
Our number of end-users is between 80 and 100 in finance, sales, and primarily the data team.
They need to improve their customer care services. Sometimes it has taken more than 48 hours to resolve an issue. That should be reduced. They are aware of small or generic issues, but not the more technical or deep issues. For those, they require some time, generally 48 to 72 hours to respond. That should be improved.
Positive
We have used many open-source solutions and Azure Data Factory, as well as solutions for AWS. We have also evaluated Talend and Alteryx.
We switched to StreamSets because, for our requirements, including cost, deployment time, and feature availability, we felt that it was the right choice for our organization.
I was a part of the team that evaluated, onboarded, and implemented StreamSets for the organization. We had a team of four to five people involved. It is straightforward to deploy. All software these days uses the software as a service model, so implementation is quite easy.
On our side, there is no maintenance. It is a managed service that provides all the upgrades and updates. We don't have to deploy any patches or framework updates.
For some tasks, we got some initial help from their consulting team and the solution-architect people. But later on, we were able to handle it.
Our ROI is 30 to 35 percent with StreamSets.
The overall cost is very flexible so it is not a burden for our organization. We have a dedicated budget for it. It saves us 30 to 50 percent.
However, the cost should be improved. For small and mid-size organizations it might be a challenge.
We frequently use StreamSets' Transformer for Snowflake functionality for large amounts of data. Using it is not that simple but not that difficult. It is moderate. You need some training or an introduction to design pipelines within the transformers. Still, it's a useful feature because we have a large amount of data that we need to transform. Being serverless, Transformer for Snowflake doesn't depend so much on infrastructure or cloud users. That aspect is beneficial for any organization because you don't need to deploy additional resources for it.
I highly recommend this solution, if you have the budget for deployment and your requirements are fulfilled by it.

The main use case of StreamSets is to work on data integration and ingesting data for DataOps and modern analytics. We also use it for integrating data files from multiple sources. We use it to build, monitor, and manage smart, continuous data pipelines.
The introduction of StreamSets in our organization has improved things in a significant way. The efficiency of our entire process has increased a lot and we derive high value from it. The integration of data files from multiple sources is what makes it great software for us.
The transfer of information between our teams is very smooth and efficient as well. It saves us time in transferring, collating, and integrating all of the data.
The integration part has been customized for our particular systems. Previously, we had different data silos. Now, with the introduction of StreamSets, the data silo approach has been eradicated. It has integrated all the data files into one software system, creating a central point for it.
And it has reduced our workload by 50 to 60 percent and that has definitely saved us some money on human resources.
There are two features that are most valuable for us. One is the Control Hub and the other is the Data Collector. With Data Collector, data migration has become much easier for us.
Also, the ETL capabilities are very useful for us. We extract and transform data from multiple data sources, into a single, consistent data store, and then we put it in our systems. We typically use it to connect our Apache Kafka with data lakes. That process is smooth and saves us a lot of time in our production systems.
We use the platform to incorporate modern analytics as well. That is one of our main use cases. It integrates well with our requirements. It is quite easy to move data into these analytics platforms using StreamSets because there are minimal coding requirements. The built-in applications and systems allow us to do it with ease. A first-time user could easily do it.
If there were coding requirements, it would take three or four extra resources to get things done. That aspect is very important for us. It saves us money by not needing coding manpower.
In addition, the system's data drift resilience is very effective and efficient. On our particular team, it has reduced the time it takes to fix data drift breakages by 10 to 12 man-hours per week.
The software is very good overall. Areas for improvement are the error logging and the version history. I would like to see better, more detailed error logging information. Apart from that, I don't think much improvement is required, because the software and features are very good.
I have been using StreamSets for the past year.
The software is very stable. The stability is a solid 10 out of 10.
It's definitely scalable. We started with around 10 to 12 users, and now it has reached 35 to 40 users in our particular organization. We are now using it across four to five teams.
There are a lot of other teams in our company that are trying out the free version of the software. If it's suitable for them, they will obviously go for it as well.
Through email, they have been very good at supporting us and they're very knowledgeable as well. They are going to various lengths to provide us with clear-cut answers.
Positive
We didn't use any other similar software.
It took three to four months to assess the efficiency improvements in our team. There's definitely a return on investment from the use of StreamSets. Our efficiency has been increased by 20 to 25 percent and it has helped increase revenue by 7 to 10 percent.
I imagine the pricing is moderate because our company is renewing its license, but I'm not sure about the exact price. There are no hidden costs that I have come across.
It's cloud-based software, so there are only minimal maintenance requirements. Our IT team takes care of the maintenance of the software, but I don't think much time is required for that. Only regular updates need to be done. It is a minimal task that can be done by one or two personnel.
Overall, it provides us a lot with efficiency and increases the effectiveness of our transformation of data sets. The value and increase in revenue it has helped us achieve make it a very good software package.
Try the free version and, if the software meets your requirements, I would definitely say get the Enterprise version. It's pretty easy to understand and it generates a great deal of smoothness for your business processes. It's a must-have for every business to improve its efficiency and effectiveness.
The major takeaway for me has to be the improvement in the efficiency of our entire process. That stands out for us. StreamSets is a great platform. And the best thing about it is that there are minimal coding requirements. Any person, even someone with a non-technical background, can easily get accustomed to the software and start using it.

I worked mostly on data injection use cases when I was using Data Collector. Later on, I got involved with some Spark-based transformations using Transformer.
Currently, we are not using CI/CD. We are not using automated deployments. We are manually deploying in prod, but going forward, we are planning to use CI/CD to have automated deployments.
I worked on on-prem and cloud deployments. The current implementation is on-prem, but in my previous project, we worked on AWS-based implementation. We did a small PoC with GCP as well.
It is very easy to use when connecting to enterprise data stores such as OLTP databases or messaging systems such as Kafka. I have had integration with OLTP as well as Kafka. Until a few years ago, we didn't have a good way of connecting to the streaming databases or streaming products. This ability is important because most of our use cases in recent times are of streaming nature. We have to deliver certain messages or data as per our SLA, and the combination of Kafka and StreamSets helps us meet those timelines. I'm not sure what I would have used to achieve the same five years ago. The combination of Kafka and StreamSets has opened up a new world of opportunities to explore. I recently used orchestration wherein you can have multiple jobs, and you can orchestrate them. For example, you can specify to let Job A run first, then Job B, and then Job C in an automated fashion. You don't need any manual intervention. In one of my projects, I had a data hub from 10 different databases. It was all automated by using Kafka and StreamSets.
It enables you to build data pipelines without knowing how to code. You can build data pipelines even if you don't know how to code. You can just drag and drop. If you know how to code, you can do some custom coding as well, but you don't need to know coding to work with StreamSets, which is important if somebody in your team is not familiar with coding. The nature of coding is changing, and the number of technologies is changing. The range is so wide right now. Even if I know Java or Oracle, it may not be enough in today's times because we might have databases in Teradata. We might have Snowflake or other different kinds of databases. StreamSets is a great solution because you don't need to know all different databases or all different coding mechanisms to work with StreamSets. Rather than learning each and every technology and building your data pipelines, you can just plug and play at a faster pace.
StreamSets’ built-in data drift resilience plays a part in our ETL operations. It is a very helpful feature. Previously, we had a lot of jobs coming from different source systems, and whenever there was any change in columns, it was not informed. It required a lot of changes on our end, which would take from a couple of weeks to a month. Because of the data drift feature, which is embedded in StreamSets, we don't have to spend that much time taking care of the columns and making sure they are in sync. All this is taken care of. We don't have to worry about it. It is a very helpful feature to have.
StreamSets' data drift resilience reduces the time to fix data drift breakages. It has definitely saved around two to three weeks of development time. Previously, any kind of changes in our jobs used to require changing our code or table structure and doing some testing. It required at least two to three weeks of effort, which is now taken care of because of StreamSets.
StreamSets’ reusable assets helped to reduce workload. We can use pipeline fragments across multiple projects, which saves development time. The time saved varies from team to team.
It saves us money by not having to hire people with specialized skills. Without StreamSets, for example, I would've had to hire someone to work on Teradata or Db2. We definitely save some money on creating a new position or hiring a new developer. StreamSets provides a lot of features from AWS, Azure, or Snowflake. So, we don't have to find specialized, skilled resources for each of these technologies to create data pipelines. We just need to have StreamSets and one or two DBAs from each team to get the right configuration items, and we can just use it. We don't have to find a specialized resource for each database or technology.
It has helped us to scale our data operations. It saves the licensing costs on some legacy software, and we can reuse pipelines. Once we have a template for a certain use case, we can reuse the same template across different projects to move data to the cloud, which saves us money.
I have used Data Collector, Transformer, and Control Hub products from StreamSets. What I really like about these products is that they're very user-friendly. People who are not from a technological or core development background find it easy to get started and build data pipelines and connect to the databases. They would be comfortable like any technical person within a couple of weeks. I really like its user-friendliness. It is easy to use. They have a single snapshot across different products, which is very helpful to learn and use the product based on your use case.
Its interface is very cool. If I'm using a batch project or an ETL, I just have to configure appropriate stages. It is the same process if you go with streaming. The only difference is that the stages will change. For example, in a batch, you might connect to Oracle Database, or in streaming, you may connect to Kafka or something like that. The process is the same, and the look-and-feel is the same. The interface is the same across different use cases.
It is a great product if you are looking to ramp up your teams and you are working with different databases or different transformations. Even if you don't have any skilled developers in Spark, Python, Java, or any kind of database, you can still use this product to ramp up your team and scale up your data migration to cloud or data analytics. It is a fantastic product.
There are a few things that can be better. We create pipelines or jobs in StreamSets Control Hub. It is a great feature, but if there is a way to have a folder structure or organize the pipelines and jobs in Control Hub, it would be great. I submitted a ticket for this some time back.
There are certain features that are only available at certain stages. For example, HTTP Client has some great features when it is used as a processor, but those features are not available in HTTP Client as a destination.
There could be some improvements on the group side. Currently, if I want to know which users are a part of certain groups, it is not straightforward to see. You have to go to each and every user and check the groups he or she is a part of. They could improve it in that direction. Currently, we have to put in a manual effort. In case something goes wrong, we have to go to each and every user account to check whether he or she is a part of a certain group or not.
I got exposed to StreamSets in late 2018. Initially, I worked on StreamSets Data Collector, and then, for a year or so, I got exposed to Transformer as well.
It is stable, and they're growing rapidly.
It is pretty scalable, but it also depends on where it is installed, which is something a lot of developers misunderstand. Most of the time, the implementation is done on on-prem servers, which is not very scalable. If you install it on cloud-based servers, it is fast. So, the problem is not with StreamSets; the problem is with the underlying hardware. I have worked on both sides. Therefore, I'm aware of the scenarios, but if I were to work purely in the development team, I might not be aware that it is underlying hardware that is causing problems.
In terms of its usage, it is available enterprise-wide. I don't know the exact number of users now because I am not a part of the platform or admin team, but at one time, we had more than 200 users working on this platform. We had one implementation on AWS Cloud and one on GCP. We had Dev, QA, and prod environments. Even now, we have about four environments. We have SIT and NFT, and in prod, we have two environments.
We plan to increase its usage. We are rapidly increasing its usage in our projects. There is a lot of excitement around it. A lot of people want to explore this tool in our organization. A lot of people are trying to learn this technology or use it to migrate their data from legacy databases to the cloud. This will actually encourage more folks to join the data engineering or analytics team. There is a lot of curiosity around the product.
Currently, I'm not involved with them on a daily basis. I'm no longer a part of the platform team, but when I was involved with them two years back, their support was good. Most of the interactions I have had with them were pretty good. They were responsive, and they responded within a day or two. I would rate them a nine out of ten. They were good most of the time, but it could be a challenge to get the right person. They are still a growing company. You need to be a little patient with them to get to the right person to help you with the issues you have.
Positive
About three or four years ago, I worked on Trifacta, which is now acquired by Alteryx. The features were different, and the requirements were different.
Talend is a good product. It seems quite close to StreamSets, but I have not worked on Talend. I just got a demo of Talend a couple of years ago, but I never worked on it. I felt that StreamSets had more features. Its UI was good, and functionality-wise, I found it a little bit more comfortable to use.
I was involved with AWS deployment. At that time, I was a part of the platform team. Now, I work with the application development team, and I'm not involved in that. It was complex at that time. About four years ago, when StreamSets was new, we had a tough time deploying because the documentation was not very clear at that time. A lot of the documents were very good and available on the web, but the documentation wasn't exhaustive or elaborate. We also had our own learning curve. We had someone from StreamSets to help us with the deployment. So, it went well. Now, it is better, but when we did it, it was very complex.
We implemented it in phases. We just implemented or installed the StreamSets platform in our company, and we let a couple of teams use it. We started with Data Collector, and we allowed teams to use and feel it. When they said that this is a good tool to use, we got the enterprise license, and we installed Control Hub and Data Collector. It was not implemented enterprise-wide at the same time. It was released to teams in phases.
It was a mix of a consultant and reseller. It probably was Access Group that helped us with this implementation. At that time, I was in the US, and they were good. Our experience with them was fantastic. We had a couple of consultants from their team to help us with the installation. Now, we have a different vendor in the UK. We have a different partner to help us with that.
We started with about three people, and now, we have more than 20 people on the team. It requires regular maintenance in terms of user management. It is not because of StreamSets; it is because of the underlying software. Data Collector can support a certain number of jobs in parallel. In case we have more tenants on board, we have to increase the Data Collector or Transformer instances to support the increased number of users.
We have definitely seen an ROI. It has helped us in moving into the data analytics world at a faster pace than any other tool would've done. The traditional tools we had didn't provide the functionality that StreamSets offers.
The time for realizing its benefits from deployment depends on the use case or the end requirement. For example, we deployed one project last year, and within a couple of months, we could see a lot of benefits for that team. For some use cases, it could be two months to six months or one year. You can build data pipelines, and you can move data to Snowflake or any cloud database using StreamSets in a matter of a few weeks.
There are different versions of the product. One is the corporate license version, and the other one is the open-source or free version. I have been using the corporate license version, but they have recently launched a new open-source version so that anybody can create an account and use it.
The licensing cost varies from customer to customer. I don't have a lot of input on that. It is taken care of by PMO, and they seem fine with its pricing model. It is being used enterprise-wide. They seem to have got a good deal for StreamSets.
It is very user-friendly, and I promote it big time in my organization among my peers, my juniors, and across different departments.
They're growing rapidly. I can see them having a lot of growth based on the features they are bringing. They could capture a lot more market in coming times. They're providing a lot of new features.
I love the way they are constantly upgrading and improving the product. They're working on the product, and they're upgrading it to close the gaps. They have developed a data portal recently, and they have made it free. Anyone who doesn't know StreamSets can just create an account and start using that portal. It is a great initiative. I learned directly on the corporate portal license, but if I were to train somebody in my team who doesn't yet have a license, I would just recommend them to go to the free portal, register, and learn how to use StreamSets. It is available for anyone who wants to learn how to work on the tool.
We use StreamSets' ability to move data into modern analytics platforms. We use it for Tableau, and we use it for ThoughtSpot. It is quite easy to move data into these analytics platforms. It is not very complicated. The problems that we had were mostly outside of StreamSets. For example, most of our databases were on-prem, and StreamSets was installed on the cloud, such as AWS Cloud. There were some issues with that. It wasn't a drawback because of StreamSets. It was pretty straightforward to plug and play.
I have used StreamSets Transformer, but I haven't yet used it with Snowflake. We are planning to use it. We have a couple of use cases we are trying to migrate to Snowflake. I've seen a couple of demos, and I found it to be very easy to use. I didn't see any complications there. It is a great product with the integration of StreamSets Transformer and Snowflake. When we move data from legacy databases to Snowflake, I anticipate there could be a lot of data drift. There could be some column mismatches or table mismatches, but what I saw in the demo was really fantastic because it was creating tables during runtime. It was creating or taking care of the missing columns at runtime. It is a great feature to have, and it will definitely be helpful because we will be migrating our databases to Snowflake on the cloud. It will definitely help us meet our customer goals at a faster pace.
I would rate it a nine out of ten. They're improving it a lot, and they need to improve a lot, but it is a great product to use.
StreamSets is being used in the IT department to make sure that we have a stable solution and that our configuration is secure and running smoothly. We are using it for our data analytic tool as well as for real-time prediction for various real-life business use cases. It's helping us in generating new business ideas. It's a tool that allows us to share data between platforms, which also removes the dependency on other ETL tools, such as SSIS.
StreamSets is straightforward to use for implementing batch, streaming, or ETL pipelines once you know how to use it. The pipeline can be integrated with Azure Key Vault, which eliminates the need of sharing credentials with developers. The same goes for parameters. It's very easy and straightforward.
It's easy for me to connect StreamSets to enterprise data stores such as OLTP databases and Hadoop, or messaging systems such as Kafka. I've got a good experience with it, and I've been working with it for a long time. It's very easy to connect and integrate for me. However, if you are a beginner, it might not go that well in the first step.
It's easy to move data into analytics platforms using StreamSets.
StreamSets enables us to build data pipelines without knowing how to code. We don't require the best coding skills. We can use the code-free environment to quickly create pipelines. It's very helpful for that.
StreamSets is a helpful tool for pipelines. It's very easy, so we can register data collectors to control hubs using provisioning agents.
StreamSets has helped to break down data silos within our organization. It hasn't negatively affected our business. It has fortunately enhanced our development time. We are able to develop secure, stable platforms faster and even remotely.
StreamSets has saved us a lot of time. It saved us the time that we were spending developing applications manually. One budget can be used by the team to come up with a stable solution. Our time savings are 30%. Out of five hours, it has saved us around two hours.
StreamSets has reduced our workload by 35%. It has also saved us money. When you subscribe to StreamSets, it seems very expensive, but when you get to know how their integration and documentation are and how things move, it's definitely efficient. It saves a lot of money. Before implementing it, we spent around 10,000 USD to hire experts. It has saved us 10,000 USD that we would have spent on hiring experts.
What I love the most is that StreamSets is very light. It's a containerized application. It's easy to use with Docker. If you are a large organization, it's very easy to use Kubernetes.
It has a very easy and user-friendly interface. It only takes a few days for new developers to start and deploy their first pipeline. It provides an easy and powerful integrated environment with different platforms such as Kafka, Salesforce, Oracle Database, REST API, etc. The user interface is a powerful feature of StreamSets.
There are so many things that need to be improved. For the StreamSets cloud user interface, there aren't enough use cases and examples for the main problems. In addition, the hybrid data sets cannot be joined in a data connector, which is a significant limitation.
There aren't enough hands-on labs, and debugging is also an issue because it takes a lot of time. Logs are not that clear when you are debugging, and you can only select a single source for a pipeline. It isn't helpful when you need to apply the same logic for multiple sources. It becomes difficult because you need to create more pipelines and then add coordination between them.
Initially, it's hard to find out or master the logic behind it. It can be hard if you aren't technical enough. There is scope for improvement because it's not straightforward. You need to go through the documentation and make sure that you understand every step. For me, it was a challenging model.
I've been using StreamSets for two and a half years.
It's stable enough.
It's good enough. We don't use it at multiple locations. We use it at one location, and it's being used by the IT and development departments. We have five users who are using it.
Its deployment was hard. I had to contact them so that they could help me set things up. They are good people. They make sure that you are getting the best experience and that you are getting things in the right way. Their support is good and technical. I'd rate them a 10 out of 10 because of the fact that they were able to troubleshoot the issue.
Positive
We did not use a different solution.
In the beginning, it's very hard, but after reading the documentation, you can set up things easily. The documentation is very good and helpful.
For me, deployment was initially very hard because it required a lot of technical skills that I didn't have at that time. I had to contact the team, and they helped me with how to deploy it. The following day, I was able to set up everything. So, deployment is initially very hard, but after you become familiar with StreamSets, you can deploy it more easily.
I deployed it myself. It doesn't require any maintenance because they take care of that.
There has been a great return on investment. We can use a single package of one thousand USD to have different applications with different people and different skills. It has saved us the money that we would have spent individually to develop those applications. Using StreamSets has saved us expenses. We have seen 40% ROI.
It's not so favorable for small companies.
We didn't evaluate other options. We found StreamSets to be aligned with our expectations.
To those evaluating this solution, I'd advise ensuring that they have someone who is an expert in StreamSets so that you can deploy it in less time. Otherwise, it won't be a great option.
I'd recommend StreamSets if you want to design a very good pipeline, but you also have to think about the budget. Its budget is not so favorable for small companies, but it's great software for businesses that want to create good data pipelines and have secure platforms. It will help your business in making sure that you are providing a stable solution to your clients.
Overall, I'd rate StreamSets a 10 out of 10.
I really appreciate the numerous ready connectors available on both the source and target sides, the support for various media file formats, and the ease of configuring and managing pipelines centrally. It's like a plug-and-play setup.
StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target. So the ability to validate the data against various data rules. Then, based on the failure of data quality assessment, be able to send alerts or information to help people understand the data validation issues.
I have been using StreamSets for a year and a half.
It's reasonably stable.
It's reasonably easy to scale. Around 25 to 30 end users are using this solution in our organization.
Customer service and support are good.
Positive
It's reasonably easy to deploy. However, since it is used at an enterprise level, it requires maintenance. So we had a maintenance contract.
In the financial industry, we have very strict regulations around deploying something in the cloud. So, it requires a lot of permission and other processes.
Just one person is enough for the maintenance.
The pricing was reasonably economical and easy for us to afford when we engaged with StreamSets. It was not part of Software AG at that time.
It's a very good tool. Overall, I would rate the solution an eight out of ten.

We were receiving data from hospitals or any kind of healthcare service providers in the country. We were dominantly operating in the US. When we received that data, we had to classify it into different repositories or different datasets. This data was sent to different vendors, and for that, the data needed to get processed in different ways. We needed to bifurcate data at many steps with different kinds of filters. For that, we used StreamSets.
We could bifurcate the datasets that we received from different hospitals. We could bifurcate it on the basis of the medical requirements of the hospitals, and sometimes, on the basis of the schedule or purpose. We were obtaining data that we could then supply to some consulting firms or other sources.
StreamSets saved us time. The accuracy was pretty good, and it was definitely better than what we were using previously. Earlier, we had hired two people who were doing the job manually, and we were also using some other platform. We had to pay for them. Overall, we have saved a lot of time, and the accuracy has improved as well. We didn't calculate the time savings, but I believe we saved about three days in a week, so there were about 30% to 40% time savings.
StreamSets reduced the workload. There was a 10% to 15% reduction in the workload.
StreamSets helped us to scale our data operations. The limit at which we purchased this solution was incredible. We were never able to reach the limit that we purchased, but it helped us to increase or scale our operation. Especially in months when we received a higher number of entries, we were able to perform our work on time.
The ability to have a good bifurcation rate and fewer mistakes is valuable. In the scenario we had, when we had to bifurcate the data, we did not completely cut the data. We made a different route for one set of data, which went into a different operating system. There was also a complete set of data along with the original data that got cut, which once again went through the filtration process, and in this way, it kept on happening. Different solutions that were in place were not providing this feasibility. With the other solutions that we were using earlier, we had to reuse the data again and again from the start. It was a time-taking process.
Their support system was pretty good. When we were setting up the bifurcation protocols that we wanted to set up, we had a few support calls with them, and those were really helpful.
The design or the way they have set up the protocol is pretty good. One thing that I would like to add is the ability to manually enter data. The way the solution currently works is we don't have the option to manually change the data at any point in time. Being able to do that will allow us to do everything that we want to do with our data. Sometimes, we need to manually manipulate the data to make it more accurate in case our prior bifurcation filters are not good. If we have the option to manually enter the data or make the exact iterations on the data set, that would be a good thing. It does not have that feature. None of the solutions provides this feature, but this is the feature that we are looking for. If we could bifurcate the data or do manual manipulation of data at any point in time, it would be a game changer.
Its initial setup could also be a bit easier.
I used this solution for about a year.
It's a stable product. We used it for about a year, and we hardly had to shut it down.
We are a medium enterprise. We only have three departments in our company, and only one of the departments is using it. Salespeople don't use it. The development people don't use it. We are the ones using it, and our job is to process the information, so only one department is using the solution. We have about 18 people in the department.
Up to medium enterprises, it's a good choice. You can scale between one million to ten million data files. I don't believe they offer the service for a hundred million or one billion datasets. It isn't too scalable for large enterprises, but for small and medium enterprises, it's good.
I'd rate them an eight out of ten. The only reason for not giving them a ten out of ten is that if you're doing very important work and you need to get the solution the same day, it's a bit tough to have the team support you in a very short period of time. They usually give you appointments about a day or two days later. Other than that, everything is good.
Positive
We were using another solution previously. The major reason for switching to StreamSets was that we needed to scale our operations. Our prior solution could have been scaled, but the cost of scaling was a bit higher. We would have had to hire one more person to be able to scale, but we did not want to hire more people, so we decided to use a completely automated solution for this part so that it could be handled by only one of our team members. That was the primary requirement. The cost-benefit analysis was done by one of our peers. His proposal was pretty good, and everyone agreed to it.
Its initial setup is a bit tough. You need to have the technical expertise to do that. The support team is good. They help you around, but if they could make it a bit easier, it would be better.
I believe it operates only from the cloud. We also received the data from our associations on the cloud. We processed it on the cloud, and everything happened on the cloud.
The initial setup was complex because we were not able to directly link the data we were receiving with the StreamSets solution. Linking it required us to fill in or enter some information in StreamSets, but we were not able to figure out what to enter. For that part, we needed their help.
We spent about a week. For the first three days, our team members were trying their best to do it, but then we had to schedule a meeting with them. In terms of the number of people, only one person was working with our team, and there were three people working with the product. I was also involved in the product as a product manager, but I was not directly operating that system.
It didn't require any maintenance as such. Any maintenance activities were related to our side of things. There were mistakes on our end. When we were entering different data, we had to do different configurations in the system.
We did the cost-benefit analysis before buying the solution, and it performed even better than that. We were able to replace two of our staff members who were doing this work. The cost that we paid for this solution was pretty less as compared to their salaries, so on the cost-benefit side of things, it was a good deal. We saved about two persons' manual wage, which is about $6,000 a month, and we also saved 15% of a week's time. These two were the biggest returns on the investment. The accuracy was also a bit higher.
Its pricing is pretty much up to the mark. For smaller enterprises, it could be a big price to pay at the initial stage of operations, but the moment you have the Seed B or Seed C funding and you want to scale up your operations and aren't much worried about the funds, at that point in time, you would need a solution that could be scaled. Simultaneously, you need a solution that you don't want to use on a very long-term basis. This solution could not be applied if we were operating with all the hospital chains in the US. We were operating just with one hospital. That's why it worked pretty well, so for medium enterprises, I believe it's very good.
To those evaluating StreamSets, I'd advise doing a cost-benefit analysis because the way of using StreamSets differs from person to person. Someone else might have a very different use case, and they may not run into profit using the solution. For us, it was a good solution because we were hiring people for this work. People were doing the job manually. We saved both time and money, so doing a cost-benefit analysis would be the best thing.
If you are looking to expand your domain or range of operations, StreamSets is very helpful. If you are just looking for a better data analytics tool that can do bifurcation on data, I believe there are other tools or services available in the market that do not focus on the expansion of operations. They focus on doing better and more complex bifurcations.
StreamSets enables you to build data pipelines without knowing how to code. After generating a few responses, you have to enter some basic syntax or code, but generally, one can do a lot of no-code stuff, which was not an important aspect for us because we were operating in the IT space, and our entire team was capable of entering all the syntaxes that were required. It was not an issue for us at any point in time. In fact, in the operations that we were performing, we only used code. When we were testing out our initial datasets, we used some no-code features that were there, but at the later stage, we used only syntaxes.
We did not connect to the messaging systems, but we connected some enterprise databases. We were operating with a set of hospitals in the US, and we had to connect with them only the first time. Afterward, it was the data that was passing through the pipeline. Initially, for a completely new user, it's a bit tricky. Some technical expertise is required. It's a bit tough, but because the support team is there, one would be able to do it.
Overall, I would rate StreamSets an eight out of ten.

We use it for building a data lake in our content. We have sales multiple times during the day, and a sale is the trigger. Sales use the lake as a landing zone. We also use it for various types of data transformation.
It enables us to create data streams and pipelines that our team can use to identify areas for improvement. Our marketing team can read the data generated on sales to understand how we can integrate our product and focus on the areas in which we need more improvement. By the end of the day, we have an improved solution.
The lack of coding makes work easier and faster, and after creating a template you can immediately transform any source. It saves a lot of time and makes things efficient. You complete things on time.
The impact that it has had on my company is that when we have a variety of data that we want to convert or transform, StreamSets is helpful. We can store a maximum amount of data, and transfer various data from different departments and use the analysis to understand how to improve our business.
And because it's a service, it's very helpful to me as a CEO. It's serverless and secure.
In addition, the data drift resilience has reduced the time it takes to fix data drift breakages by 35 percent. Overall, StreamSets, as a solution, saves me about 45 percent of time, and has reduced workload by 25 percent. It also saves me about $500 a month.
Another benefit is that breaking down sums of data gives you the ability to create graphical reports and present them to any team, and they will be understood.
One of the things I like is the data pipelines. They have a very good design. Implementing pipelines is very straightforward. It doesn't require any technical skill.
We have also integrated it with Kafka messaging and it is not complex to do. It is really so easy to connect or integrate with data interfaces. And moving data into analytics platforms using StreamSets is easy. It doesn't require any coding, meaning your can transfer or move data into data payloads without coding skills. It's a good move, for someone in the beginning, who doesn't have any knowledge because it's quite easy.
Sometimes, it is not clear at first how to set up nodes. A site with an explanation of how each node works would be very helpful.
Also, it doesn't provide a very good user experience.
I have been using StreamSets for three years.
It is stable. I've never seen any negative downtime.
Their technical support is very supportive. They really know what to do, and they are very good people, very friendly.
Positive
It took me three days to deploy it. I did it on my own. We use it in two departments in one location and there are four users.
There is no maintenance of the solution on our side.
Since I implemented StreamSets, we have more generated sales, on the order of 50 percent.
The pricing is affordable for any business.
The transformation logic is a bit complex when you begin and you may need to read the documentation. When you create logic, you have to be sure of the scenarios in the logic.
Any company that is looking for data engineering should use StreamSets because the pricing is quite favorable. I would recommend it.