Al Mercado - PeerSpot reviewer
AI Engineer at Techvanguard
Real User
A no-code solution with a drag-and-drop UI, but the execution engine should be better
Pros and Cons
  • "The most valuable would be the GUI platform that I saw. I first saw it at a special session that StreamSets provided towards the end of the summer. I saw the way you set it up and how you have different processes going on with your data. The design experience seemed to be pretty straightforward to me in terms of how you drag and drop these nodes and connect them with arrows."
  • "The execution engine could be improved. When I was at their session, they were using some obscure platform to run. There is a controller, which controls what happens on that, but you should be able to easily do this at any of the cloud services, such as Google Cloud. You shouldn't have any issues in terms of how to run it with their online development platform or design platform, basically their execution engine. There are issues with that."

What is our primary use case?

I was working on an integration project where I was using the StreamSets platform. I was looking at both their data collector and their transformer. The idea was to integrate it with AWS SageMaker Canvas. Both of them are what they call no-code options. StreamSets is for data pipelining, managing your data flow, and transforming your data. SageMaker is AWS, and Canvas is basically their no-code option for machine learning.

I was trying to connect it to a data object repository. For AWS, that's a specific managed service called S3. I wasn't trying to run it with a data warehouse.

How has it helped my organization?

It's still in the trial stage. I don't get a 30-day trial period or anything like that. I just got to write about what's involved and then see if that's something that justifies the use case for going ahead and purchasing the license for it.

It enables you to build data pipelines without knowing how to code. It abstracts away the need for Spark or anything like that. This ability is highly important because it reduces development time.

It saves time because you don't have to write code. 

It saves money by not having to hire people with specialized skills. You don't need Spark or anything like that for doing the same thing.

It helps to scale your data operations. You can get to the execution engine and provision bigger machines or bigger clusters. You can scale out to however much data you need to scale out to.

What is most valuable?

The most valuable would be the GUI platform that I saw. I first saw it at a special session that StreamSets provided towards the end of the summer. I saw the way you set it up and how you have different processes going on with your data. The design experience seemed to be pretty straightforward to me in terms of how you drag and drop these nodes and connect them with arrows.

What needs improvement?

The execution engine could be improved. When I was at their session, they were using some obscure platform to run. There is a controller, which controls what happens on that, but you should be able to easily do this at any of the cloud services, such as Google Cloud. You shouldn't have any issues in terms of how to run it with their online development platform or design platform, basically their execution engine. There are issues with that.

It can break down data silos within the organization. One person can do the whole thing with StreamSets and SageMaker Canvas, but it hasn't yet had any effect on our operations or business because it's one of those situations where you can either get a demo from them or you basically have to go to one of these sessions and they give you temporary credentials and try to work with your use case. Personally, I would change their model a bit and give a two-week trial license for a cloud platform at the very least. You can then try to get something to work or call up their technical department and say, "Look, I've been evaluating this thing for the last few days. I don't know exactly how to resolve this issue."

Buyer's Guide
StreamSets
March 2024
Learn what your peers think about StreamSets. Get advice and tips from experienced pros sharing their opinions. Updated: March 2024.
768,857 professionals have used our research since 2012.

For how long have I used the solution?

I started using it in June of this year. 

What do I think about the stability of the solution?

The whole issue of the execution engine needs to be better resolved. If you pick a cloud, why isn't it working with this cloud? Or what do I need to do to get it to work with one specific cloud service if it can be deployed across multiple clouds?

What do I think about the scalability of the solution?

It seems pretty highly scalable to me. That's not going to be an issue. Just the administration of it could be an issue.

It's currently being used in a dev department for machine learning. It's being used by the business analyst team.

How are customer service and support?

I haven't contacted their support.

Which solution did I use previously and why did I switch?

AWS has native solutions. There are AWS Data Wrangler and others that come bundled with their services, like AWS Glue. We haven't yet switched to StreamSets. It's still in the evaluation stage, but the no-code and the drag-and-drop option with a GUI are some of the things that seem to resonate with people. 

How was the initial setup?

I was involved in its setup. I was the one who basically had to try to get it to run with whatever process or custom processor I developed. 

It was complex to set up. I had to go to the sessions. On a couple of occasions, I was doing it directly from the cloud platform, and apparently, that wasn't the way to do it. You have to go through their universal designer platform first. 

In terms of maintenance, once you're deployed from the cloud, that's all handled for you. It's managed for you directly from the cloud service. So, you don't have to worry about that. They maintain their design platform.

What about the implementation team?

I didn't use any consultant.

What's my experience with pricing, setup cost, and licensing?

I didn't get into that with the StreamSets representative. It seems to be pay-as-you-go, but I don't know exactly how they do it.

Which other solutions did I evaluate?

Alteryx is another option. It's a similar tool, and it looks almost the same as StreamSets. Alteryx is something that's available for any cloud. It doesn't matter which cloud. You go on the various clouds, and you look and see what they have.

What other advice do I have?

To those evaluating this solution, I would advise looking into how it integrates with the cloud service that they're going to try it with. Does it naturally integrate better with AWS or Azure? It's one of those situations.

I used StreamSets' ability to move data into a modern analytics platform. That's what the AWS SageMaker Canvas is. It's like predictive analytics. In terms of ease of moving data into this analytics platform, doing the design on the StreamSets platform is one thing, but having the execution engine and getting that provision is a totally different ball game. Basically, that's where its limitation comes in.

Overall, I would rate it a seven out of ten. The issue that was never resolved for me was if you're running a compute or execution engine on AWS versus Azure versus GCP, how does that integration work because that has got nothing to do with StreamSets? That is outside of StreamSets. You're now dealing with the cloud service, and there's a good reason for that.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Ramesh Kuppuswamy - PeerSpot reviewer
Senior Software Developer at a tech vendor with 10,001+ employees
Real User
Top 5
Eradicated our data silos, integrating all data files into one central system
Pros and Cons
  • "The ETL capabilities are very useful for us. We extract and transform data from multiple data sources, into a single, consistent data store, and then we put it in our systems. We typically use it to connect our Apache Kafka with data lakes. That process is smooth and saves us a lot of time in our production systems."
  • "The software is very good overall. Areas for improvement are the error logging and the version history. I would like to see better, more detailed error logging information."

What is our primary use case?

The main use case of StreamSets is to work on data integration and ingesting data for DataOps and modern analytics. We also use it for integrating data files from multiple sources. We use it to build, monitor, and manage smart, continuous data pipelines.

How has it helped my organization?

The introduction of StreamSets in our organization has improved things in a significant way. The efficiency of our entire process has increased a lot and we derive high value from it. The integration of data files from multiple sources is what makes it great software for us.

The transfer of information between our teams is very smooth and efficient as well. It saves us time in transferring, collating, and integrating all of the data.

The integration part has been customized for our particular systems. Previously, we had different data silos. Now, with the introduction of StreamSets, the data silo approach has been eradicated. It has integrated all the data files into one software system, creating a central point for it.

And it has reduced our workload by 50 to 60 percent and that has definitely saved us some money on human resources.

What is most valuable?

There are two features that are most valuable for us. One is the Control Hub and the other is the Data Collector. With Data Collector, data migration has become much easier for us.

Also, the ETL capabilities are very useful for us. We extract and transform data from multiple data sources, into a single, consistent data store, and then we put it in our systems. We typically use it to connect our Apache Kafka with data lakes. That process is smooth and saves us a lot of time in our production systems.

We use the platform to incorporate modern analytics as well. That is one of our main use cases. It integrates well with our requirements. It is quite easy to move data into these analytics platforms using StreamSets because there are minimal coding requirements. The built-in applications and systems allow us to do it with ease. A first-time user could easily do it. 

If there were coding requirements, it would take three or four extra resources to get things done. That aspect is very important for us. It saves us money by not needing coding manpower.

In addition, the system's data drift resilience is very effective and efficient. On our particular team, it has reduced the time it takes to fix data drift breakages by 10 to 12 man-hours per week.

What needs improvement?

The software is very good overall. Areas for improvement are the error logging and the version history. I would like to see better, more detailed error logging information. Apart from that, I don't think much improvement is required, because the software and features are very good.

For how long have I used the solution?

I have been using StreamSets for the past year.

What do I think about the stability of the solution?

The software is very stable. The stability is a solid 10 out of 10.

What do I think about the scalability of the solution?

It's definitely scalable. We started with around 10 to 12 users, and now it has reached 35 to 40 users in our particular organization. We are now using it across four to five teams.

There are a lot of other teams in our company that are trying out the free version of the software. If it's suitable for them, they will obviously go for it as well.

How are customer service and support?

Through email, they have been very good at supporting us and they're very knowledgeable as well. They are going to various lengths to provide us with clear-cut answers.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We didn't use any other similar software.

What was our ROI?

It took three to four months to assess the efficiency improvements in our team. There's definitely a return on investment from the use of StreamSets. Our efficiency has been increased by 20 to 25 percent and it has helped increase revenue by 7 to 10 percent.

What's my experience with pricing, setup cost, and licensing?

I imagine the pricing is moderate because our company is renewing its license, but I'm not sure about the exact price. There are no hidden costs that I have come across.

What other advice do I have?

It's cloud-based software, so there are only minimal maintenance requirements. Our IT team takes care of the maintenance of the software, but I don't think much time is required for that. Only regular updates need to be done. It is a minimal task that can be done by one or two personnel.

Overall, it provides us a lot with efficiency and increases the effectiveness of our transformation of data sets. The value and increase in revenue it has helped us achieve make it a very good software package.

Try the free version and, if the software meets your requirements, I would definitely say get the Enterprise version. It's pretty easy to understand and it generates a great deal of smoothness for your business processes. It's a must-have for every business to improve its efficiency and effectiveness.

The major takeaway for me has to be the improvement in the efficiency of our entire process. That stands out for us. StreamSets is a great platform. And the best thing about it is that there are minimal coding requirements. Any person, even someone with a non-technical background, can easily get accustomed to the software and start using it.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Buyer's Guide
StreamSets
March 2024
Learn what your peers think about StreamSets. Get advice and tips from experienced pros sharing their opinions. Updated: March 2024.
768,857 professionals have used our research since 2012.
BahatiAsher Faith - PeerSpot reviewer
Software Developer at Appnomu Business Services
Real User
Top 5
Simplifies the way we perform tasks and engineer pipelines at all stages
Pros and Cons
  • "StreamSets Transformer is a good feature because it helps you when you are developing applications and when you don't want to write a lot of code. That is the best feature overall."
  • "The monitoring visualization is not that user-friendly. It should include other features to visualize things, like how many records were streamed from a source to a destination on a particular date."

What is our primary use case?

It is primarily being used by our IT department to configure things and see what is missing and what the issues are. 

How has it helped my organization?

I'm using StreamSets to find issues with our software and it is helping us to do so, and to make sure that we are able to debug on time. It makes things much simpler. We can use the solution to know what issue is happening at the moment. We are able to easily identify a leak and resolve it on time.

It reduces our workload by about 30 percent. And it saves us a lot on having to hire expensive technical experts or software engineers. You purchase a package with a reasonable pricing model, and then you can use it with your team. It saves us from hiring a technical person to carry out the tasks. With StreamSets, you can do a task easily.

It also makes it easy to send data from one place to another.

StreamSets is doing a lot in our IT operations because it is simplifying the way we perform tasks and the way we engineer pipelines at all stages, including the sources, processes, and destination use. We can schedule data pipelines and that's easy.

And because it is low-code software, you don't need to develop the code and that really saves a lot of time. Using the canvas to create and engineer data pipelines is very easy. StreamSets saves me three hours that it would take me to manually do a task.

What is most valuable?

StreamSets Transformer is a good feature because it helps you when you are developing applications and when you don't want to write a lot of code. That is the best feature overall. They really help you to come up with a solution more quickly. The Transformer logic is very easy, as long as you understand the concept of what you intend to develop. It doesn't require any technical skills.

The overall GUI and user interface are also good because you don't need to write complex programming for any implementation. You just drag and configure what you want to implement. It's very easy and you can use it without knowing any programming language.

The design experience is much easier when you want to integrate other systems and tools and make them work in a particular format. It helps you improve the topologies. You can view the status of all the pipelines you have developed and monitor them.

Connecting to enterprise data stores is also very easy, as is monitoring and managing things in one place.

What needs improvement?

The monitoring visualization is not that user-friendly. It should include other features to visualize things, like how many records were streamed from a source to a destination on a particular date. 

I would also like better, detailed logging of error information. 

It also needs a fragment drill-down feature when monitoring a data flow. That needs a lot of improvement, especially when you are running a job.

For how long have I used the solution?

This is my second year using StreamSets.

What do I think about the stability of the solution?

It's stable.

What do I think about the scalability of the solution?

It is a scalable solution for any company that needs to know about its data processing.

How are customer service and support?

It is hard to get technical support from the company. To receive one-on-one communication requires a budget, which we don't really have. The way we get technical support is through the documentation and knowledge base.

It is missing a live instant chat on the dashboard.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We did not have a previous solution.

How was the initial setup?

Initially, the deployment could be very hard if you do not have a lot of technical skills, but as you get used to the software, within a day, the deployment becomes straightforward and becomes easy. It took two weeks to have everything configured in the right manner. I worked with one other colleague to set everything up.

It is hard, especially when you are a beginner, but when you read the documentation you can set things up quickly. The documentation helps out if you don't have good knowledge of the solution.

It doesn't require maintenance.

What was our ROI?

The solution is helping a lot because we are not spending a lot of money on a technical team. We just subscribe to the software and we're able to configure things. It has helped us save on resources by 30 percent.

What's my experience with pricing, setup cost, and licensing?

The pricing is too fixed. It should be based on how much data you need to process. Some businesses are not so big that they process a lot of data. They process a lot of debugging. The pricing is not so favorable for a small enterprise because it is too limited.

What other advice do I have?

I would recommend the software to any business that needs to do data engineering. If they design data pipelines, it is really a great idea to test out StreamSets. Unfortunately, you need a good budget for it. If a small business doesn't have the budget, I cannot recommend it. But if they have a good budget, I really recommend it because it has so many features that can really help data scientists and analysts generate patterns or insights for their businesses. And it will benefit their customers as well.

Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Data Engineer at a consultancy with 11-50 employees
Real User
Effective, and helps scale data operations, but sometimes the support's response is slow
Pros and Cons
  • "In StreamSets, everything is in one place."
  • "If you use JDBC Lookup, for example, it generally takes a long time to process data."

What is our primary use case?

The project which I work on is developed in StreamSets and I lead the team. I'm the team leader and the Solution Architect. I also train my juniors and my team.

For the last year and a half, I’ve been using this tool and this tool is very effective for data processing from source to destination. This tool is very effective and I developed many integrations in this tool.

How has it helped my organization?

The solution is really effective.

What is most valuable?

It's very effective in project delivery. This month, at the end of June, I will deploy all the integrations which I developed in StreamSets to production remit. The business users and customers are happy with the data flow optimizer from the SoPlex cloud. It all looks good.

Not many challenges are there in terms of learning new technologies and using new tools. We will try and do an R&D analysis more often.

Everything is in place and it comes as a package. They install everything. The package includes Python, Ruby, and others. I just need to configure the correct details in the pipeline and go ahead with my work.

The ease of the design experience when implementing batch streaming and ETL pipelines is very good. The streaming is very good. Currently, I'm using Data Collector and it’s effective. If I'm going to use less streaming, like in Java core, I need to follow up on different processes to deploy the core and connect the database. There are not so many cores that I need to write.

In StreamSets, everything is in one place. If you want to connect a database and configure it, it is easy. If you want to connect to HTTP, it’s simple. If I'm going to do the same with my other tools, I don’t need many configurations or installations. StreamSets' ability to connect enterprise data stores such as OLTP databases and Hadoop, or messaging systems such as Kafka is good. I also send data to both the database and Kafka as well.

You will get all the drives that you will need to install with the database. If you use other databases, you're going to need a JDBC, which is not difficult to use.

I'm sending data to different CDP URL databases, cloud areas, and Azure areas.

StreamSets' built-in data drift resilience plays a part in our ETL operations. We have some processors in StreamSets, and it will tell us what data has been changed and how data needs to be sent.

It's an easy tool. If you're going to use it as a customer, then it should take a long time to process data. I'm not sure if in the future, it will take some time to process the billions of records that I'm working on. We generally process billions of records on a daily basis. I will need to see when I work on this new project with Snowflake. We might need to process billions of records, and that will happen from the source. We’ll see how long it needs to take and how this system is handling it. At that point, I’ll be able to say how effectivly StreamSets processes it.

The Data Collector saves time. However, there are some issues with the DPL.

StreamSets helped us break down data silos within our organizations.

One advantage is that everything happens in one place, if you want to develop or create something, you can get those details from StreamSets. The portal, however, takes time. However, they are focusing on this.

StreamSets' reusable assets have helped to reduce workload by 32% to 40%.

StreamSets helped us to scale our data operations.

If you get a request to process data for other processing tools, it might take a long time, like two to three hours. With this, I can do it within half an hour, 20 or 30 minutes. It’s more effective. I have everything in one place and I can configure everything. It saves me time as it is so central. 

What needs improvement?

If you use JDBC Lookup, for example, it generally takes a long time to process data.

StreamSets enables us to build data pipelines without knowing how to code. You can do it, however, you need to know data flow. Without knowing anything, it's a bit difficult for new people. You need some technical skills if you are to create a data pipeline. When procuring the data pipeline, for example, you need the original processor and destination. If you don't know where you're going to read the data, where to send the data, and if you have to send the data, you have to configure it. If the destination you're looking for is some particular message permit or data permit, then you should write your own code there. You need some knowledge of coding as StreamSets does not provide any coding.

StreamSets data drift resilience has not exactly reduced the time it takes for us to fix data drift breakages. A lot of improvements are required from StreamSets. I'm not sure how they're planning to make it happen. There are some issues in the case of data processing, and other scenarios.

If the data processing in StreamSets takes a long time as compared to the previous solution, then we will reconsider why we use StreamSets.

For how long have I used the solution?

I've been using this StreamSets for the last two years.

What do I think about the stability of the solution?

In terms of stability, there have been one or two issues. Good people work on the solutions when we have issues. However, sometimes we don't get a good solution. 

As a user, I expect a lot more and that the solution will come quicker as compared to keeping projects on hold or keeping them for a long time. If they do not have any solution, then we can plan accordingly how to use the other processors. They just need to let us know quickly. 

What do I think about the scalability of the solution?

The scalability is good.

We do plan to increase usage. 

How are customer service and support?

In terms of technical support, they generally do a detailed analysis from their end. They always try to give a proper solution. However, sometimes, they won't get to any proper solution. They'll come back and look into it and sometimes it takes time. If they can speed up the process a little bit that would be ideal. We are always sitting on the edge. If we don't get a proper response from them, then it will be very difficult for us to answer to higher management. 

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

This is my first solution of this kind. Previously, I was working in open source systems, with scripting, et cetera. This is the first time I've worked in the data area. I've got full support. As a new data user, I'm still getting used to it.

How was the initial setup?

The setup is straightforward, it's not complex and it is simple. 

We treat it like a pipeline. We are not writing code and putting things in. In the case of a pipeline, you can export it and input it, or you can make it a pipeline. It can be auto-deployed into a respective environment. That's what we did.

We have different destinations we need to send to. We aren't using a single destination. In that sense, we do have multiple computations. We set up, send the data and do the deployments. 

There is occasional maintenance needed. Sometimes, if something goes wrong, we'll have to correct the data. We just check here and there for the most part.

What about the implementation team?

We did not need an integrator or consultant to assist with the setup. 

As a team, we do the deployment. We won't send it to others, whatever we develop, we will test and deploy. We already have the system in place and it is really helpful for the deployment of the solution.

What was our ROI?

I haven't seen an ROI. 

It's not exactly saving us money as it's a new tool. If I'm going to hire someone new, I will not hire based on the StreamSets tool or some specific tools, and I might save money right away. However, I'm spending time on my side. StreamSets is not being used by many horizons. In some places in Europe, fewer companies are using StreamSets. People should get to know StreamSets and they should get some expertise in the area, the way AWS and Azure do. I’m spending a lot more time and therefore I’m not saving money. That said, I’m also not losing money.

What's my experience with pricing, setup cost, and licensing?

Higher management handled the licensing. However, I can't say how much it costs. I'm more on the user side.

Which other solutions did I evaluate?

I did not evaluate other options. 

What other advice do I have?

I have not yet used StreamSets' Transformer for Snowflake functionality. I created one POC, not with Snowflake, however, I'm going to use Snowflake in my next project.

I'd rate the solution seven out of ten. They are doing a good job. Using this solution I can feel the data and see the user flows. 

If you are going to withdraw on-premise, and you're just copying the data to a table, you're not going to see how much data has been copied. With this, I'm seeing how much data has been transferred, and where the processor is. It gives a clear picture with metric details and notifications. That's the reason I used this tool for the last two years. 

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Product Marketer at a media company with 1,001-5,000 employees
Real User
Top 5
We have been able to eliminate the vast majority of our break/fix costs and maintenance time
Pros and Cons
  • "The entire user interface is very simple and the simplicity of creating pipelines is something that I like very much about it. The design experience is very smooth."
  • "One area for improvement could be the cloud storage server speed, as we have faced some latency issues here and there."

What is our primary use case?

Our major use case with StreamSets is to build data pipelines from multiple sources to multiple destinations. We mainly use the StreamSets Data Collector Engine for seamless streaming from any source to any destination.

We also use it to deliver continuous data for database operations and modern analytics.

How has it helped my organization?

One great thing is that now, with the implementation of StreamSets, we have been able to eliminate about 80 percent of our break/fix costs and maintenance time. It is very easy to connect with streaming platforms and streaming services.

Also, we can integrate and stream databases by connecting with multiple streaming services. Before StreamSets, data transfer from source to destination took about three hours of time and it was prone to errors. Now, with the introduction of StreamSets, we primarily use the Data Collector and this has enabled us to complete the same job in less than 30 minutes. We save that much time per day or about 15 hours per week.

Another definite benefit is that it has helped us to break down data silos within our organization. We are able to work together, with the interaction of StreamSets. Previously, the data silos were extremely perilous because data would come from multiple, scattered sources. We were not able to consolidate it on time and we were not able to exactly pinpoint errors. But StreamSets has helped us streamline the use of multiple sources and destinations, completely eliminating the silos. That saves us a lot of time and we have reduced the number of errors by a lot.

What is most valuable?

The most valuable features of StreamSets, for me, are the Data Collector and the Control Hub platform. They are both very straightforward to use and user-friendly. And with the Data Collector and Control Hub, we get canvas selection for designing all our pipelines, which is very intuitive and useful for us.

In fact, the entire user interface is very simple and the simplicity of creating pipelines is something that I like very much about it. The design experience is very smooth. A great thing about StreamSets is that it is a single, centralized platform. All our design-pattern requirements are met with a single design experience through StreamSets. 

We can also easily build pipelines with minimal coding and minimal technical knowledge. It is very easy to start and very easy to scale as well. That is very important to me, personally, because I'm from a non-technical background. One of the most important criteria was for me to be able to use this platform efficiently.

Also, moving data to modern analytics platforms is very straightforward. That is why StreamSets is one of the top players in the market right now.

And one of the major advantages for us is the built-in functionality. StreamSets has a plethora of features that combine well with ETL.

What needs improvement?

In terms of features, I don't have any complaints so far. But one area for improvement could be the cloud storage server speed, as we have faced some latency issues here and there.

For how long have I used the solution?

I have been using StreamSets for about eight months.

What do I think about the stability of the solution?

It is stable. It's a cloud-based solution, so there is a little bit of latency, some server speed issues, but apart from that, there is no question about the stability of the solution.

What do I think about the scalability of the solution?

The platform is definitely scalable.

Maybe in the future we will increase our usage of StreamSets, but I don't see any immediate scalability requirements for us.

How are customer service and support?

I have not contacted their customer support, but my team contacts them. From what I understand they have a pretty healthy conversation with the StreamSets customer support. All of our queries are sent via email and they get them sorted out. They also join Google Meet sessions or calls, if required, to sort out our queries. It has been a very smooth journey so far. I don't have any complaints with regard to their customer service.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

StreamSets is the first solution that we are using in this space.

How was the initial setup?

I was not fully involved in the initial implementation, but we did the implementation in phases. We wanted to get it on board as soon as possible, so instead of doing a complete implementation, we did it in phases and it didn't take a lot of time. We were able to get on with the work as soon as possible with this model.

The initial setup was simple. We didn't require any additional training or third-party vendors. We were able to do it along with the StreamSets team, so it was smooth for us.

We have 15 people using StreamSets, all at one location. They are developers and users.

Because it is a cloud platform there isn't much maintenance required other than server updates, but that is expected with any cloud platform. No extensive maintenance is required. We have a team of two people who maintain it and handle updates and all the latest releases.

What was our ROI?

Tasks that took three hours can now be done in less than 30 minutes. This is one of the prime data points in terms of ROI for this product.

In terms of money saved, we still haven't seen any direct results from StreamSets. With its automation, we are able to focus on other tasks because StreamSets is taking care of the operations side. Theoretically, it should save us some money but it hasn't until now. We still have the same number of employees.

We are moving in a positive direction. Hopefully, this trend continues. We were able to see the time savings and reduced errors within three months of deployment.

What's my experience with pricing, setup cost, and licensing?

There are two editions, Professional and Enterprise, and there is a free trial. We're using the Professional edition and it is competitively priced. I wouldn't say it's cheap or moderate, but it's also not a high price.

What other advice do I have?

We have been experimenting with Hadoop, but apart from that, we do not use it to establish a connection with other services. As an organization, we have not faced any issues with connectivity using StreamSets. The platform is very stable.

Overall, StreamSets is very efficient and effective. It has helped us save a lot of time and also reduced errors a lot. I would definitely rate it very highly. The major reason is that it gives us a single, centralized platform for all our design-pattern requirements and we are able to produce results efficiently. With StreamSets, we are able to transfer or stream data from any source to any destination. It has increased the overall efficiency of our organization.

Software AG is constantly improving and evolving the product, and that is something that I like: using a product that is ever-evolving and being upgraded.

After deploying StreamSets, I learned a lot about how data planning works and how easy it is to stream from multiple sources to multiple destinations. That is one of my major takeaways. I thought it would be a very complex task, but that myth was broken by StreamSets. The complexity was made very simple for me.

My advice is to try the free edition. It's a very user-friendly and intuitive product as well. Try it to get a grasp of what's happening inside the product. Once you try the free edition, you'll definitely go for the Professional edition. I don't have any doubt about that. The product itself will lure you. That is the power of the product.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Senior Network Administrator at a energy/utilities company with 201-500 employees
Real User
Top 20
Helped us break down data silos and produce better, up-to-date reports, as well as save money
Pros and Cons
  • "The most valuable feature is the pipelines because they enable us to pull in and push out data from different sources and to manipulate and clean things up within them."
  • "The design experience is the bane of our existence because their documentation is not the best. Even when they update their software, they don't publish the best information on how to update and change your pipeline configuration to make it conform to current best practices. We don't pay for the added support. We use the "freeware version." The user community, as well as the documentation they provide for the standard user, are difficult, at best."

What is our primary use case?

We use the whole Data Collector application.

How has it helped my organization?

We now consume many more hundreds of terabytes of data than we used to before we had StreamSets. It has definitely enabled us to do things a lot faster, and be a lot more agile, with a lot more data consumption and a lot more reporting.

Another benefit is that it has helped us to break down data silos. We now consume data across different silos and then we aggregate it together so that we can do reporting that is not just for that one silo of people but for a number of different people across the entire organization. That has had a positive effect, enabling us to save money, spend money more effectively, and have more up-to-date data in reports, as well as in auditing. Our safety processes are better too.

One way we have saved money is thanks to how the solution streamlines the data that we pull in, data that we weren't pulling in before.

StreamSets allows more people to know what's going on. It helps us with better allocation of resources, better allocation of staff, and right-sizing. We're in oil and gas and, in our case, it allows us to optimize what we're pulling out of the ground and then what we're selling.

It has helped to scale our data operations and as a result, in addition to saving money and right-sizing, it's helped our field operations and provided us with more management reporting.

Also, the data drift resilience reduces the time it takes to fix data drift breakages.

What is most valuable?

The most valuable feature is the pipelines because they enable us to pull in and push out data from different sources and to manipulate and clean things up within them.

We use StreamSets to connect to enterprise data stores, including OLTP databases and  Hadoop. Connecting to them is pretty easy. It's the data manipulation and the data streaming that are the harder parts behind that, just because of the way the tool is written.

What needs improvement?

The design experience is the bane of our existence because their documentation is not the best. Even when they update their software, they don't publish the best information on how to update and change your pipeline configuration to make it conform to current best practices.

We don't pay for the added support. We use the "freeware version." The user community, as well as the documentation they provide for the standard user, are difficult, at best.

However, we have a couple of people in-house here who are experts in data analysis and they have figured out how to use this tool. We have to have people who are extremely skilled to go in and write the pipelines for this software because it's so complicated. The software works great for us, but there is an extremely steep learning curve because they don't provide a lot of information outside of paying their ridiculous support costs. Their support starts at $50,000 a year and up.

Also, the built-in data drift resilience for ETL operations requires a bunch of custom code development to be able to handle that. It's somewhat difficult because you have to customize it a fair amount.

I also would like a more user-friendly interface and better error-trap handling.

For how long have I used the solution?

We have been using StreamSets for about four years.

What do I think about the stability of the solution?

We just patched ourselves up to the latest release about a month ago, so it's actually pretty stable at this point. It used to be quite buggy, going back over the last little while, but it's pretty stable now.

What do I think about the scalability of the solution?

This software is very scalable.

Which solution did I use previously and why did I switch?

We did not have a previous solution.

How was the initial setup?

The initial setup was somewhere between straightforward and complex. It was pretty straightforward to start with, but then it started ramping up to be more difficult as we wanted to add more stuff in.

The difficulty depends upon your data sources. If you have just one data source and you want to consume a lot of different types of data from that one source, it's pretty straightforward. But when you have 20 or 25 different data sources, and you need to pipeline all that data into a couple of data warehouses so that you can use advanced data analytics software to do reporting, analysis, and notifications, it's a lot more complicated. With every data source, it becomes exponentially more complicated to manage.

We spent a significant amount of time doing it, but otherwise, it was seamless because it was our own staff. We didn't have to worry about trying to find money or resource time or do any of the prep work needed to get external resources.

Ours is a single deployment, but it is used across our entire staff base of 200-plus people. We need three people for deployment and maintenance, whose responsibilities include software management, application management, and data analysis and management.

What was our ROI?

The ROI we have seen is in savings of time and money.

What's my experience with pricing, setup cost, and licensing?

We use the free version. It's great for a public, free release. Our stance is that the paid support model is too expensive to get into. They should honestly reevaluate that.

We tried to go and get them to look at their licensing and support model and they said they were not interested in reevaluating that in any way.

Which other solutions did I evaluate?

We tried to use another freeware ETL tool. It's fairly well-known. We ran it for a couple of months but it was going to be even more difficult than StreamSets, so we chose that in the end.

What other advice do I have?

The ease of using StreamSet to move data into modern analytics platforms, on a scale of one to 10, is about a five.

The solution enables you to build data pipelines without knowing how to code if it's the latest, state-of-the-art cloud connecting stuff. If it's for anything structured for Oracle and SQL Server and other data sources, it's difficult. Without knowing how to write code, some of it's easy and some of it is not.

My advice to someone who is considering this software is to be very aware that their integrator and data analysis people will need a very specific skill set.

Which deployment model are you using for this solution?

On-premises
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Senior Technical Manager at a financial services firm with 501-1,000 employees
Real User
The ease of configuration for pipes is amazing, and the GUI is very nice
Pros and Cons
  • "The Ease of configuration for pipes is amazing. It has a lot of connectors. Mainly, we can do everything with the data in the pipe. I really like the graphical interface too"
  • "I would like to see it integrate with other kinds of platforms, other than Java. We're going to have a lot of applications using .NET and other languages or frameworks. StreamSets is very helpful for the old Java platform but it's hard to integrate with the other platforms and frameworks."
  • "StreamSet works great for batch processing but we are looking for something that is more real-time. We need latency in numbers below milliseconds."

What is our primary use case?

It performs very well. The main use is to extract information from some of our Kafka topics and put it in our internal systems, flat files, and integration with Java.

How has it helped my organization?

It facilitates the consumption of the data in batch mode to the system where it is required. We don't do a lot of transformations or joining or forking of the information. It's more point-to-point connectivity that we implement over StreamSets.

What is most valuable?

The ease of configuration for pipes is amazing. It has a lot of connectors. Mainly, we can do everything with the data in the pipe. I really like the graphical interface too. It's pretty nice.

What needs improvement?

I would like to see it integrate with other kinds of platforms, other than Java. We're going to have a lot of applications using .NET and other languages or frameworks. StreamSets is very helpful for the old Java platform but it's hard to integrate with the other platforms and frameworks.

StreamSets works great for batch processing but we are looking for something that is more real-time. We need latency in numbers below milliseconds.

For how long have I used the solution?

One to three years.

What do I think about the stability of the solution?

It's pretty stable. StreamSets has been up and running up for months without any intervention in terms of the operations team. It's great.

I don't know if they can implement some kind of high-availability. I really don't go deep into that kind of configuration because, with only one node and running as stably as it is, we have no problem with that. But for critical operations, I'd like to know if I can facilitate some kind of high-availability, in case one of the nodes go down.

What do I think about the scalability of the solution?

It's pretty scalable.

How is customer service and technical support?

I don't use support. I mainly use the community or web searches; self-learning.

How was the initial setup?

The initial setup is pretty straightforward.

What other advice do I have?

If you are looking for something to do batch processing in Java, this is the right solution. We did the exploration when we were trying to implement a batch processing system and decided that StreamSets is the best for that. If you're looking for real-time, you may want to look at another system or the next version of this one.

Because of the kind of system that we need to implement with this kind of solution, the most important factors I look at when selecting a vendor are things like latency and real-time processing.

I would rate it at nine out of 10. What would make it a 10 would be, as I said, I'd like to have more integration with other kinds of languages or frameworks and also more real-time processing, not batch.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Data Engineer at a energy/utilities company with 10,001+ employees
Real User
Easy to set up and use, and the functionality for transforming data is good
Pros and Cons
  • "It is really easy to set up and the interface is easy to use."
  • "We've seen a couple of cases where it appears to have a memory leak or a similar problem."

What is our primary use case?

We typically use it to transport our Oracle raw datasets up to Microsoft Azure, and then into SQL databases there.

What is most valuable?

It is really easy to set up and the interface is easy to use.

We found it pretty easy to transform data.

The online documentation is pretty good.

What needs improvement?

We've seen a couple of cases where it appears to have a memory leak or a similar problem. It grows for a bit and then we'd have to restart the container, maybe once a month when it gets high.

For how long have I used the solution?

We have been using StreamSets for about one year. We may have been experimenting with it slightly before that time.

What do I think about the stability of the solution?

Other than the memory issue that we occasionally see, the stability has been really good.

What do I think about the scalability of the solution?

We haven't seen a problem with scaling it.

How are customer service and technical support?

I haven't had to deal with technical support. We would first check the online documentation or web documentation, and usually found what we needed. We haven't had to call them.

Which solution did I use previously and why did I switch?

Prior to using StreamSets, we were using Microsoft CDC (Change Data Capture). It was a fairly old product and there were lots of workaround and lots of issues that we had with it. We were looking for something more user-friendly. It was pretty stable, so that was not an issue. 

How was the initial setup?

This product was a lot easier to use than the one we had before it. It took us half an hour and we were set up and running it, the first time.

What's my experience with pricing, setup cost, and licensing?

We are running the community version right now, which can be used free of charge. We were debating whether to move it to the commercial version, but we haven't had the need to, just yet.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free StreamSets Report and get advice and tips from experienced pros sharing their opinions.
Updated: March 2024
Product Categories
Data Integration
Buyer's Guide
Download our free StreamSets Report and get advice and tips from experienced pros sharing their opinions.