Try our new research platform with insights from 80,000+ expert users
reviewer1889181 - PeerSpot reviewer
Data Engineer at a consultancy with 11-50 employees
Real User
Effective, and helps scale data operations, but sometimes the support's response is slow
Pros and Cons
  • "In StreamSets, everything is in one place."
  • "If you use JDBC Lookup, for example, it generally takes a long time to process data."

What is our primary use case?

The project which I work on is developed in StreamSets and I lead the team. I'm the team leader and the Solution Architect. I also train my juniors and my team.

For the last year and a half, I’ve been using this tool and this tool is very effective for data processing from source to destination. This tool is very effective and I developed many integrations in this tool.

How has it helped my organization?

The solution is really effective.

What is most valuable?

It's very effective in project delivery. This month, at the end of June, I will deploy all the integrations which I developed in StreamSets to production remit. The business users and customers are happy with the data flow optimizer from the SoPlex cloud. It all looks good.

Not many challenges are there in terms of learning new technologies and using new tools. We will try and do an R&D analysis more often.

Everything is in place and it comes as a package. They install everything. The package includes Python, Ruby, and others. I just need to configure the correct details in the pipeline and go ahead with my work.

The ease of the design experience when implementing batch streaming and ETL pipelines is very good. The streaming is very good. Currently, I'm using Data Collector and it’s effective. If I'm going to use less streaming, like in Java core, I need to follow up on different processes to deploy the core and connect the database. There are not so many cores that I need to write.

In StreamSets, everything is in one place. If you want to connect a database and configure it, it is easy. If you want to connect to HTTP, it’s simple. If I'm going to do the same with my other tools, I don’t need many configurations or installations. StreamSets' ability to connect enterprise data stores such as OLTP databases and Hadoop, or messaging systems such as Kafka is good. I also send data to both the database and Kafka as well.

You will get all the drives that you will need to install with the database. If you use other databases, you're going to need a JDBC, which is not difficult to use.

I'm sending data to different CDP URL databases, cloud areas, and Azure areas.

StreamSets' built-in data drift resilience plays a part in our ETL operations. We have some processors in StreamSets, and it will tell us what data has been changed and how data needs to be sent.

It's an easy tool. If you're going to use it as a customer, then it should take a long time to process data. I'm not sure if in the future, it will take some time to process the billions of records that I'm working on. We generally process billions of records on a daily basis. I will need to see when I work on this new project with Snowflake. We might need to process billions of records, and that will happen from the source. We’ll see how long it needs to take and how this system is handling it. At that point, I’ll be able to say how effectivly StreamSets processes it.

The Data Collector saves time. However, there are some issues with the DPL.

StreamSets helped us break down data silos within our organizations.

One advantage is that everything happens in one place, if you want to develop or create something, you can get those details from StreamSets. The portal, however, takes time. However, they are focusing on this.

StreamSets' reusable assets have helped to reduce workload by 32% to 40%.

StreamSets helped us to scale our data operations.

If you get a request to process data for other processing tools, it might take a long time, like two to three hours. With this, I can do it within half an hour, 20 or 30 minutes. It’s more effective. I have everything in one place and I can configure everything. It saves me time as it is so central. 

What needs improvement?

If you use JDBC Lookup, for example, it generally takes a long time to process data.

StreamSets enables us to build data pipelines without knowing how to code. You can do it, however, you need to know data flow. Without knowing anything, it's a bit difficult for new people. You need some technical skills if you are to create a data pipeline. When procuring the data pipeline, for example, you need the original processor and destination. If you don't know where you're going to read the data, where to send the data, and if you have to send the data, you have to configure it. If the destination you're looking for is some particular message permit or data permit, then you should write your own code there. You need some knowledge of coding as StreamSets does not provide any coding.

StreamSets data drift resilience has not exactly reduced the time it takes for us to fix data drift breakages. A lot of improvements are required from StreamSets. I'm not sure how they're planning to make it happen. There are some issues in the case of data processing, and other scenarios.

If the data processing in StreamSets takes a long time as compared to the previous solution, then we will reconsider why we use StreamSets.

Buyer's Guide
StreamSets
May 2025
Learn what your peers think about StreamSets. Get advice and tips from experienced pros sharing their opinions. Updated: May 2025.
857,028 professionals have used our research since 2012.

For how long have I used the solution?

I've been using this StreamSets for the last two years.

What do I think about the stability of the solution?

In terms of stability, there have been one or two issues. Good people work on the solutions when we have issues. However, sometimes we don't get a good solution. 

As a user, I expect a lot more and that the solution will come quicker as compared to keeping projects on hold or keeping them for a long time. If they do not have any solution, then we can plan accordingly how to use the other processors. They just need to let us know quickly. 

What do I think about the scalability of the solution?

The scalability is good.

We do plan to increase usage. 

How are customer service and support?

In terms of technical support, they generally do a detailed analysis from their end. They always try to give a proper solution. However, sometimes, they won't get to any proper solution. They'll come back and look into it and sometimes it takes time. If they can speed up the process a little bit that would be ideal. We are always sitting on the edge. If we don't get a proper response from them, then it will be very difficult for us to answer to higher management. 

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

This is my first solution of this kind. Previously, I was working in open source systems, with scripting, et cetera. This is the first time I've worked in the data area. I've got full support. As a new data user, I'm still getting used to it.

How was the initial setup?

The setup is straightforward, it's not complex and it is simple. 

We treat it like a pipeline. We are not writing code and putting things in. In the case of a pipeline, you can export it and input it, or you can make it a pipeline. It can be auto-deployed into a respective environment. That's what we did.

We have different destinations we need to send to. We aren't using a single destination. In that sense, we do have multiple computations. We set up, send the data and do the deployments. 

There is occasional maintenance needed. Sometimes, if something goes wrong, we'll have to correct the data. We just check here and there for the most part.

What about the implementation team?

We did not need an integrator or consultant to assist with the setup. 

As a team, we do the deployment. We won't send it to others, whatever we develop, we will test and deploy. We already have the system in place and it is really helpful for the deployment of the solution.

What was our ROI?

I haven't seen an ROI. 

It's not exactly saving us money as it's a new tool. If I'm going to hire someone new, I will not hire based on the StreamSets tool or some specific tools, and I might save money right away. However, I'm spending time on my side. StreamSets is not being used by many horizons. In some places in Europe, fewer companies are using StreamSets. People should get to know StreamSets and they should get some expertise in the area, the way AWS and Azure do. I’m spending a lot more time and therefore I’m not saving money. That said, I’m also not losing money.

What's my experience with pricing, setup cost, and licensing?

Higher management handled the licensing. However, I can't say how much it costs. I'm more on the user side.

Which other solutions did I evaluate?

I did not evaluate other options. 

What other advice do I have?

I have not yet used StreamSets' Transformer for Snowflake functionality. I created one POC, not with Snowflake, however, I'm going to use Snowflake in my next project.

I'd rate the solution seven out of ten. They are doing a good job. Using this solution I can feel the data and see the user flows. 

If you are going to withdraw on-premise, and you're just copying the data to a table, you're not going to see how much data has been copied. With this, I'm seeing how much data has been transferred, and where the processor is. It gives a clear picture with metric details and notifications. That's the reason I used this tool for the last two years. 

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
AbhishekKatara - PeerSpot reviewer
Technical Lead at Sopra Steria
Real User
Easy-to-use tool with no coding required
Pros and Cons
  • "StreamSets’ data drift resilience has reduced the time it takes us to fix data drift breakages. For example, in our previous Hadoop scenario, when we were creating the Sqoop-based processes to move data from source to destinations, we were getting the job done. That took approximately an hour to an hour and a half when we did it with Hadoop. However, with the StreamSets, since it works on a data collector-based mechanism, it completes the same process in 15 minutes of time. Therefore, it has saved us around 45 minutes per data pipeline or table that we migrate. Thus, it reduced the data transfer, including the drift part, by 45 minutes."
  • "The logging mechanism could be improved. If I am working on a pipeline, then create a job out of it and it is running, it will generate constant logs. So, the logging mechanism could be simplified. Now, it is a bit difficult to understand and filter the logs. It takes some time."

What is our primary use case?

StreamSets is a wonderful data engineering, data ops tool where we can design and create data pipelines, loading on-prem data to the cloud. One of our major projects was to move data from on-premises to Azure and GCP Cloud. From there, once data is loaded, the data scientist and data analyst teams use that data to generate patterns and insights. 

For a US healthcare service provider company, we designed a StreamSets pipeline to connect to relational database sources. We did generate schema from the source data loaded into Azure Data Lake Storage (ADLS) or any cloud, like S3 or GCP. This was one of our batch use cases. 

With StreamSets, we have also tried to solve our real-time streaming use cases as well, where we were streaming data from source Kafka topic to Azure Event Hubs. This was a trigger-based streaming pipeline, which moved data when it appeared in a Kafka topic. Since this pipeline was a streaming pipeline, it was continuously streaming data from Kafka to Azure for further analysis.

How has it helped my organization?

We can securely fetch the passwords and credentials stored in Azure Key Vault. This is a fundamentally very strong feature that has improved our day-to-day life.

What is most valuable?

It is a pretty easy tool to use. There is no coding required. StreamSets provides us a canvas to design our pipeline. At the beginning of any project, it gives us a picture, which is an advantage. For example, if I want to do a data migration from on-premise to cloud, I will draw it for easier understanding based on my target system, and StreamSets does exactly the same thing by giving us a canvas where I can design our pipeline.

There are a wide range of available stages: various sources, relational sources, streaming sources. There are various processes like to transform the source data. It is not only to migrate data from source to destination, but we can utilize different processes to transform the data. When I was working on the healthcare project, there was personal identification information on the personal health information (PHI) data that we needed to mask. We can't simply move it from source to destination. Therefore, StreamSets provides masking of that sensitive data.

It provides us a facility to generate schema. There are different executors available, e.g., Pipeline Finisher executor, which helps us in finishing the pipeline. 

There are different destinations, such as S3, Azure Data Lake, Hive, and Kafka Hadoop-based systems. There are a wide range of available stages. It supports both batch and streaming. 

Scheduling is quite easy in StreamSets. From a security perspective, there is integration with keywords, e.g., for password fetching or secrets fetching. 

It is pretty easy to connect to Hadoop using StreamSets. Someone just needs to be aware about the configuration details, such as which Hadoop cluster to connect and what credentials will be available. For example, if I am trying with my generic user, how do I connect with the Hadoop distributed system? Once we have the details of our cluster and the credential, we can load data to the Hadoop standalone file system. In our use case, we collected data from our RDBMS sources using JDBC Query Consumer. We queried the data from the source table, captured that data, and then loaded the data into the destination Hadoop distributed file system. Thus, configuration details are required. Once we have the configuration details, i.e., the required credentials, we can connect with Hadoop and Hive. 

It takes care of data drift. There are certain data rules, matrix rules, or capabilities provided by StreamSets that we can set. So, if the source schema gets deviated somehow, StreamSets will automatically notify us or send alerts in automated fashion about what is going wrong. StreamSets also provides Change Data Capture (CDC). As soon as the source data is changed, it can capture that and update the details into the required destination. 

What needs improvement?

The logging mechanism could be improved. If I am working on a pipeline, then create a job out of it and it is running, it will generate constant logs. So, the logging mechanism could be simplified. Now, it is a bit difficult to understand and filter the logs. It takes some time. For example, if I am starting with StreamSets, everything is fine. However, if I want to dig into problems that my pipeline ran into, it initially takes some time to get familiar with it and understand it.

I feel the visualization part can be simplified or enhanced a bit, so I can easily see what happened with my job seven days earlier and how many records it transmitted. 

For how long have I used the solution?

I have been using StreamSets for close to four and a half years when creating my data pipelines in our projects.

What do I think about the stability of the solution?

Stability-wise, it is wonderful and quite good. Mostly, since the solution is completely cloud-based in our project, we just need to hit a URL and then we are logged into StreamSets with our credentials. Everything is present there. Other than some rare occasions, StreamSets behaves pretty well. 

There were certain memory leak issues for a few stages, like Azure Data Lake, but those were corrected with immediate solutions, like patches and version upgrades. 

Stability-wise, I would rate it as eight and a half or nine out of 10.

What do I think about the scalability of the solution?

I would like auto scaling for heavy load transfer. This applied particularly when we were our data migration project. The tables had more than 10 millions of records in them. When we utilized StreamSets, it took a huge amount of time. Though we were doing every schema generation, we were using ADLS as a destination, and it hung for a good amount of time. So, we considered PySpark processes for our tables, which have greater than 10 millions of records. Usually, it works pretty well with the source tables and the data size is close to five to six million records, but when it is closer to 10 million, I personally feel the auto scaling feature could be improved.

How are customer service and support?

We have spent a good amount of time dealing with their technical support team. The first step is to check the documentation, then work with them. 

I had a chance to work with StreamSets during our use case. They helped us out in a good manner with a memory leak issue that we were facing in our production pipeline. So, there was one issue where our pipelines were running fine in dev and the lower environment, i.e., dev and QA, but when we moved those pipelines into production, we were getting a memory leak issue where the JVM ran out of memory exception. 

We tried reducing the number of threads and the batch size for the small table, but it was still creating issues. Then, we connected with StreamSets' support team. They gave us a customized patch, which our platform team installed in our production environment. With some collaborative effort of around a week, we were finally able to run our pipeline pretty well.

I would rate the customer support and the technical support as quite good and knowledgeable (eight out of 10). They helped with issues that were occurring in our work. They accepted that there were some issues with the version, which StreamSets released and we were using. They accepted that the version particularly had some issues with the memory management. Therefore, the immediate solution that they provided was a patch, which our platform team installed. However, the long-term solution was to update or upgrade our StreamSets Data Collector platform from version 3.11 to 4.2, and that solved our problem.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We were using Cloudera distribution. All our projects were running, utilizing Hadoop, and the distribution was Cloudera Hortonworks. We were utilizing Sqoop and Hive as well as PySpark or Scala-based processes to code. However, StreamSets helped us a lot in designing our data pipeline quickly in a very fast way.

It has made our job pretty easy in terms of designing, managing, and running our data engineering pipeline. Previously, if I needed to transfer data from source to destination, I would need to use Sqoop, which is a Hadoop stack technology used to establish connectivity with the RDBMS, then load it to the Hadoop distributed file system. With Sqoop, I needed to have my coding skills ready. I needed to be very precise about the connection details and syntax. I needed to be very aware of them. StreamSets solved this problem. 

Its greatest feature is that it provides an easy way to design your pipeline. I just need to drag and drop source JDBC Query Consumer to my canvas as well as drag and drop my destination to the canvas. I then need to connect both these stages and be ready with my configuration details. As soon as I am done with that, I will validate the pipeline. I can create a job out of it and schedule it, even the monitoring. All these things can be achieved by a single control panel. So, it not only solves the developer's basic problems, but it also has greatly improved the experience.

We were previously completely using the Hadoop technology stack. Slowly, we started converting our processes into data engineering pipelines, which are designed into StreamSets. Earlier, the problem area was to write code into Sqoop or create Sqoop scripts to capture data from source, then put it into HDFS. Once data was in HDFS, we would write another PySpark process, which did the optimization and faster loading of the data, which is in Hadoop Distributed File System to a cloud-based storage data lake, like ADLS or S3. However, when StreamSets came into picture, we didn't need an intermediary, three-storage distributed file system like HDFS. We could simply create a pipeline that connects to RDBMS and load data directly to the cloud-based Azure Data Lake. So there is no requirement for an intermediary Hadoop Distributed File System (HDFS), which saves us a great amount of time and also helps us a lot in creating our data engineering pipelines.

Microsoft provided Change Data Capture tools, which one of our team members was using. Performance-wise, I personally feel StreamSets is way faster. A few of the support team members were using Informatica as well, but it does not provide powerful features that can handle big amounts of data.

How was the initial setup?

For our deployment model, we were following three environments: dev, QA and prod. Our team's main responsibility is to hydrate Azure Data Lake and GCP from the source system. Control Hub is hosted on GCP, and we were hitting the URL to log into StreamSets. All the data collector machines are created on Google Cloud Platform, and we use a dev environment. Whenever we create and do a PoC, we work in a dev environment. Once our pipeline and jobs are working fine, we move our pipelines to our QA environment, which is export and import. It is pretty easy to do via StreamSets Control Hub. We can simply select a job and export it, then log back into the QA environment and import the job. Once we import the job, the associated pipeline, and all the parameters, we have an option to import the whole bundle, like the pipeline, parameter, and instances. We can import everything. Once this is also working fine, we have another final environment, which is the production which is based on the source refresh frequencies. 

What about the implementation team?

In our company, we have a good data engineering team. We have a separate administrator team who is mainly responsible for deploying it on cloud, providing us libraries whenever required. There is a separate team who is taking care of all the installations and platform-related activities. We are primarily data engineers who utilize the product for solutions.

What was our ROI?

StreamSets’ data drift resilience has reduced the time it takes us to fix data drift breakages. For example, in our previous Hadoop scenario, when we were creating the Sqoop-based processes to move data from source to destinations, we were getting the job done. That took approximately an hour to an hour and a half when we did it with Hadoop. However, with the StreamSets, since it works on a data collector-based mechanism, it completes the same process in 15 minutes of time. Therefore, it has saved us around 45 minutes per data pipeline or table that we migrate. Thus, it reduced the data transfer, including the drift part, by 45 minutes.

What's my experience with pricing, setup cost, and licensing?

StreamSets Data Collector is open source. One can utilize the StreamSets Data Collector, but the Control Hub is the main repository where all the jobs are present. Everything happens in Control Hub. 

What other advice do I have?

For people who are starting out, the simple advice is to first try out the cloud login of StreamSets. It is freely available for everyone these days. StreamSets has released its online practice platform to design and create pipelines. Someone simply needs to go to cloud.login.streamsets.com, which is StreamSets official website. It is there that people who are starting out can log into StreamSets cloud and spin up their StreamSets Data Collector machines. Then, they can choose their execution mode. It is all in a Docker-containerized fashion. You don't need to do anything. 

You simply need to have your laptop ready and step-by-step instructions are given. You just simply spin up your Data Collector, the execution mode, and then you are ready with the canvas. You can design your pipeline, practice, and test there. So, if you want to evaluate StreamSets in basic mode, you can take a look online. This is the easiest way to evaluate StreamSets.

It is a drag-and-drop, UI-based approach with a canvas, where you design the pipeline. It is pretty easy to follow. So, once your team feels confident, then they can purchase the StreamSets add-ons, which will provide them end-to-end solutions and vendor support. The best way is to log into their cloud practice platform and create some pipelines.

In my current project, there is a requirement to integrate with Snowflake, but I don't have Snowflake experience. I have not integrated Snowflake with StreamSets yet.

I personally love working on StreamSets. It is part of my day-to-day activities. I do a lot of work on StreamSets, so I would rate them pretty well as nine out of 10.

Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Buyer's Guide
StreamSets
May 2025
Learn what your peers think about StreamSets. Get advice and tips from experienced pros sharing their opinions. Updated: May 2025.
857,028 professionals have used our research since 2012.
BahatiAsher Faith - PeerSpot reviewer
Software Developer at Appnomu Business Services
Real User
Simplifies the way we perform tasks and engineer pipelines at all stages
Pros and Cons
  • "StreamSets Transformer is a good feature because it helps you when you are developing applications and when you don't want to write a lot of code. That is the best feature overall."
  • "The monitoring visualization is not that user-friendly. It should include other features to visualize things, like how many records were streamed from a source to a destination on a particular date."

What is our primary use case?

It is primarily being used by our IT department to configure things and see what is missing and what the issues are. 

How has it helped my organization?

I'm using StreamSets to find issues with our software and it is helping us to do so, and to make sure that we are able to debug on time. It makes things much simpler. We can use the solution to know what issue is happening at the moment. We are able to easily identify a leak and resolve it on time.

It reduces our workload by about 30 percent. And it saves us a lot on having to hire expensive technical experts or software engineers. You purchase a package with a reasonable pricing model, and then you can use it with your team. It saves us from hiring a technical person to carry out the tasks. With StreamSets, you can do a task easily.

It also makes it easy to send data from one place to another.

StreamSets is doing a lot in our IT operations because it is simplifying the way we perform tasks and the way we engineer pipelines at all stages, including the sources, processes, and destination use. We can schedule data pipelines and that's easy.

And because it is low-code software, you don't need to develop the code and that really saves a lot of time. Using the canvas to create and engineer data pipelines is very easy. StreamSets saves me three hours that it would take me to manually do a task.

What is most valuable?

StreamSets Transformer is a good feature because it helps you when you are developing applications and when you don't want to write a lot of code. That is the best feature overall. They really help you to come up with a solution more quickly. The Transformer logic is very easy, as long as you understand the concept of what you intend to develop. It doesn't require any technical skills.

The overall GUI and user interface are also good because you don't need to write complex programming for any implementation. You just drag and configure what you want to implement. It's very easy and you can use it without knowing any programming language.

The design experience is much easier when you want to integrate other systems and tools and make them work in a particular format. It helps you improve the topologies. You can view the status of all the pipelines you have developed and monitor them.

Connecting to enterprise data stores is also very easy, as is monitoring and managing things in one place.

What needs improvement?

The monitoring visualization is not that user-friendly. It should include other features to visualize things, like how many records were streamed from a source to a destination on a particular date. 

I would also like better, detailed logging of error information. 

It also needs a fragment drill-down feature when monitoring a data flow. That needs a lot of improvement, especially when you are running a job.

For how long have I used the solution?

This is my second year using StreamSets.

What do I think about the stability of the solution?

It's stable.

What do I think about the scalability of the solution?

It is a scalable solution for any company that needs to know about its data processing.

How are customer service and support?

It is hard to get technical support from the company. To receive one-on-one communication requires a budget, which we don't really have. The way we get technical support is through the documentation and knowledge base.

It is missing a live instant chat on the dashboard.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We did not have a previous solution.

How was the initial setup?

Initially, the deployment could be very hard if you do not have a lot of technical skills, but as you get used to the software, within a day, the deployment becomes straightforward and becomes easy. It took two weeks to have everything configured in the right manner. I worked with one other colleague to set everything up.

It is hard, especially when you are a beginner, but when you read the documentation you can set things up quickly. The documentation helps out if you don't have good knowledge of the solution.

It doesn't require maintenance.

What was our ROI?

The solution is helping a lot because we are not spending a lot of money on a technical team. We just subscribe to the software and we're able to configure things. It has helped us save on resources by 30 percent.

What's my experience with pricing, setup cost, and licensing?

The pricing is too fixed. It should be based on how much data you need to process. Some businesses are not so big that they process a lot of data. They process a lot of debugging. The pricing is not so favorable for a small enterprise because it is too limited.

What other advice do I have?

I would recommend the software to any business that needs to do data engineering. If they design data pipelines, it is really a great idea to test out StreamSets. Unfortunately, you need a good budget for it. If a small business doesn't have the budget, I cannot recommend it. But if they have a good budget, I really recommend it because it has so many features that can really help data scientists and analysts generate patterns or insights for their businesses. And it will benefit their customers as well.

Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
reviewer1339230 - PeerSpot reviewer
Data Engineer at a energy/utilities company with 10,001+ employees
Real User
Easy to set up and use, and the functionality for transforming data is good
Pros and Cons
  • "It is really easy to set up and the interface is easy to use."
  • "We've seen a couple of cases where it appears to have a memory leak or a similar problem."

What is our primary use case?

We typically use it to transport our Oracle raw datasets up to Microsoft Azure, and then into SQL databases there.

What is most valuable?

It is really easy to set up and the interface is easy to use.

We found it pretty easy to transform data.

The online documentation is pretty good.

What needs improvement?

We've seen a couple of cases where it appears to have a memory leak or a similar problem. It grows for a bit and then we'd have to restart the container, maybe once a month when it gets high.

For how long have I used the solution?

We have been using StreamSets for about one year. We may have been experimenting with it slightly before that time.

What do I think about the stability of the solution?

Other than the memory issue that we occasionally see, the stability has been really good.

What do I think about the scalability of the solution?

We haven't seen a problem with scaling it.

How are customer service and technical support?

I haven't had to deal with technical support. We would first check the online documentation or web documentation, and usually found what we needed. We haven't had to call them.

Which solution did I use previously and why did I switch?

Prior to using StreamSets, we were using Microsoft CDC (Change Data Capture). It was a fairly old product and there were lots of workaround and lots of issues that we had with it. We were looking for something more user-friendly. It was pretty stable, so that was not an issue. 

How was the initial setup?

This product was a lot easier to use than the one we had before it. It took us half an hour and we were set up and running it, the first time.

What's my experience with pricing, setup cost, and licensing?

We are running the community version right now, which can be used free of charge. We were debating whether to move it to the commercial version, but we haven't had the need to, just yet.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
reviewer912129 - PeerSpot reviewer
Senior Technical Manager at a financial services firm with 501-1,000 employees
Real User
The ease of configuration for pipes is amazing, and the GUI is very nice
Pros and Cons
  • "The Ease of configuration for pipes is amazing. It has a lot of connectors. Mainly, we can do everything with the data in the pipe. I really like the graphical interface too"
  • "I would like to see it integrate with other kinds of platforms, other than Java. We're going to have a lot of applications using .NET and other languages or frameworks. StreamSets is very helpful for the old Java platform but it's hard to integrate with the other platforms and frameworks."
  • "StreamSet works great for batch processing but we are looking for something that is more real-time. We need latency in numbers below milliseconds."

What is our primary use case?

It performs very well. The main use is to extract information from some of our Kafka topics and put it in our internal systems, flat files, and integration with Java.

How has it helped my organization?

It facilitates the consumption of the data in batch mode to the system where it is required. We don't do a lot of transformations or joining or forking of the information. It's more point-to-point connectivity that we implement over StreamSets.

What is most valuable?

The ease of configuration for pipes is amazing. It has a lot of connectors. Mainly, we can do everything with the data in the pipe. I really like the graphical interface too. It's pretty nice.

What needs improvement?

I would like to see it integrate with other kinds of platforms, other than Java. We're going to have a lot of applications using .NET and other languages or frameworks. StreamSets is very helpful for the old Java platform but it's hard to integrate with the other platforms and frameworks.

StreamSets works great for batch processing but we are looking for something that is more real-time. We need latency in numbers below milliseconds.

For how long have I used the solution?

One to three years.

What do I think about the stability of the solution?

It's pretty stable. StreamSets has been up and running up for months without any intervention in terms of the operations team. It's great.

I don't know if they can implement some kind of high-availability. I really don't go deep into that kind of configuration because, with only one node and running as stably as it is, we have no problem with that. But for critical operations, I'd like to know if I can facilitate some kind of high-availability, in case one of the nodes go down.

What do I think about the scalability of the solution?

It's pretty scalable.

How is customer service and technical support?

I don't use support. I mainly use the community or web searches; self-learning.

How was the initial setup?

The initial setup is pretty straightforward.

What other advice do I have?

If you are looking for something to do batch processing in Java, this is the right solution. We did the exploration when we were trying to implement a batch processing system and decided that StreamSets is the best for that. If you're looking for real-time, you may want to look at another system or the next version of this one.

Because of the kind of system that we need to implement with this kind of solution, the most important factors I look at when selecting a vendor are things like latency and real-time processing.

I would rate it at nine out of 10. What would make it a 10 would be, as I said, I'd like to have more integration with other kinds of languages or frameworks and also more real-time processing, not batch.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Download our free StreamSets Report and get advice and tips from experienced pros sharing their opinions.
Updated: May 2025
Product Categories
Data Integration
Buyer's Guide
Download our free StreamSets Report and get advice and tips from experienced pros sharing their opinions.