StreamSets Room for Improvement

Reyansh Kumar - PeerSpot reviewer
Technical Specialist at Accenture

The user interface requires some corrections in terms of the menu settings, menu items, and report generation. Also, report generation takes some time.

View full review »
Prateek Agarwal - PeerSpot reviewer
Manager at Indian Institute of Management Visakhapatnam

Sometimes, when we have large amounts of data that is very efficiently stored in Hadoop or Kafka, it is not very efficient to run it through StreamSets, due to the lack of efficiency or the resources that StreamSets is using.

Also, the hierarchy of names within the dropdowns and the drag-and-drop features are not familiar to users that do not have a technical or programming background. In those cases, the naming conventions are a challenge.

View full review »
Nantabo Jackie - PeerSpot reviewer
Sales Manager at Soft Hostings Limited

I identified that if the connection is disconnected and the pipeline is restarted, it sometimes does not reconnect and that has room for improvement.

The documentation is inadequate and has room for improvement because the technical support does not regularly update their documentation or the knowledge base. This leads to discrepancies between the software and the documentation, making it difficult to understand.

View full review »
Buyer's Guide
StreamSets
March 2024
Learn what your peers think about StreamSets. Get advice and tips from experienced pros sharing their opinions. Updated: March 2024.
767,667 professionals have used our research since 2012.
Karthik Rajamani - PeerSpot reviewer
Principal Engineer at Tata Consultancy Services

There are a few things that can be better. We create pipelines or jobs in StreamSets Control Hub. It is a great feature, but if there is a way to have a folder structure or organize the pipelines and jobs in Control Hub, it would be great. I submitted a ticket for this some time back.

There are certain features that are only available at certain stages. For example, HTTP Client has some great features when it is used as a processor, but those features are not available in HTTP Client as a destination.

There could be some improvements on the group side. Currently, if I want to know which users are a part of certain groups, it is not straightforward to see. You have to go to each and every user and check the groups he or she is a part of. They could improve it in that direction. Currently, we have to put in a manual effort. In case something goes wrong, we have to go to each and every user account to check whether he or she is a part of a certain group or not.

View full review »
Namanya Brian - PeerSpot reviewer
CEO-founder at Tubayo

Sometimes, it is not clear at first how to set up nodes. A site with an explanation of how each node works would be very helpful. 

Also, it doesn't provide a very good user experience.

View full review »
Saket Pandey - PeerSpot reviewer
Product Manager at a hospitality company with 51-200 employees

The design or the way they have set up the protocol is pretty good. One thing that I would like to add is the ability to manually enter data. The way the solution currently works is we don't have the option to manually change the data at any point in time. Being able to do that will allow us to do everything that we want to do with our data. Sometimes, we need to manually manipulate the data to make it more accurate in case our prior bifurcation filters are not good. If we have the option to manually enter the data or make the exact iterations on the data set, that would be a good thing. It does not have that feature. None of the solutions provides this feature, but this is the feature that we are looking for. If we could bifurcate the data or do manual manipulation of data at any point in time, it would be a game changer. 

Its initial setup could also be a bit easier.

View full review »
JA
IT Project Manager at Orange España

I would like to see further improvement in the UI. In addition, upgrades are not automatic and they should be automated. Currently, we have to manually upgrade versions.

View full review »
MI
Software Engineer at Soft Hostings Limited

There are so many things that need to be improved. For the StreamSets cloud user interface, there aren't enough use cases and examples for the main problems. In addition, the hybrid data sets cannot be joined in a data connector, which is a significant limitation. 

There aren't enough hands-on labs, and debugging is also an issue because it takes a lot of time. Logs are not that clear when you are debugging, and you can only select a single source for a pipeline. It isn't helpful when you need to apply the same logic for multiple sources. It becomes difficult because you need to create more pipelines and then add coordination between them.

Initially, it's hard to find out or master the logic behind it. It can be hard if you aren't technical enough. There is scope for improvement because it's not straightforward. You need to go through the documentation and make sure that you understand every step. For me, it was a challenging model.

View full review »
AbhishekKatara - PeerSpot reviewer
Technical Lead at Sopra Steria

The logging mechanism could be improved. If I am working on a pipeline, then create a job out of it and it is running, it will generate constant logs. So, the logging mechanism could be simplified. Now, it is a bit difficult to understand and filter the logs. It takes some time. For example, if I am starting with StreamSets, everything is fine. However, if I want to dig into problems that my pipeline ran into, it initially takes some time to get familiar with it and understand it.

I feel the visualization part can be simplified or enhanced a bit, so I can easily see what happened with my job seven days earlier and how many records it transmitted. 

View full review »
Avinash Mukesh - PeerSpot reviewer
IT Specialists at Soft Hostings

When using Transformer for Snowflake, it's a bit complex to understand the transformation logic. You need someone who has some technical skills to handle it. You need to have some skills to transform the data. However, it's important that Transformer for Snowflake is a serverless engine embedded within the platform, so there is no need for maintenance. Having a serverless engine makes it easy for any enterprise to not think about or worry about the cost of maintaining the software.

The data collector in StreamSets has to be designed properly. For example, a simple database configuration with MySQL DB requires the MySQL Connector to be installed.

View full review »
JM
Software Engineer at ZIDIYO

Using ETL pipelines is a bit complicated and requires some technical aid.

The Transformer for Snowflake functionality is complex and requires a lot of logic.

View full review »
Kevin Kathiem Mutunga - PeerSpot reviewer
Chief software engineer at Appnomu Business Services

There should be a concept of creating double variables because it's still missing.

The loading machine mechanism needs to be simplified. Currently, it takes some time to get familiar with and understand that. 

Visualization and monitoring need to be improved and refined. For example, it is difficult to monitor a job to see what happened in the past seven days when a transfer occurred.

The licensing model also has room for improvement. The solution is currently expensive.

View full review »
Sumesh Gansar - PeerSpot reviewer
Product Marketing Manager at a tech vendor with 10,001+ employees

In terms of the product, I don't think there is any room for improvement because it is very good. One small area of improvement that is very much needed is on the knowledge base side. Sometimes, it is not very clear how to set up a certain process or a certain node for a person who's using the platform for the first time.

Some visual explanation or some visually appealing knowledge-based content would be very good. That is something that I could have done with, once I started using it, because I found it very difficult.

View full review »
SS
Senior Data Engineer at a energy/utilities company with 1,001-5,000 employees

One room for improvement is probably the GUI. It is pretty basic and a lot of improvement is required there. 

In terms of security, from an architecture perspective, when we want to implement something, and because our organization is very strict when it comes to cybersecurity, we have been struggling a bit because the platform has a few gaps. Those gaps are really gaps based on our organization's requirements. These are not gaps on StreamSets' side. The solution could improve a lot in terms of having more features added to the security model, which would help us.

There are quite a few features that we wanted. One is SAP HANA. Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful.

View full review »
MB
Director Data Engineering, Governance, Operation and Analytics Platform at a financial services firm with 10,001+ employees

StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target. So the ability to validate the data against various data rules. Then, based on the failure of data quality assessment, be able to send alerts or information to help people understand the data validation issues.

View full review »
Al Mercado - PeerSpot reviewer
AI Engineer at Techvanguard

The execution engine could be improved. When I was at their session, they were using some obscure platform to run. There is a controller, which controls what happens on that, but you should be able to easily do this at any of the cloud services, such as Google Cloud. You shouldn't have any issues in terms of how to run it with their online development platform or design platform, basically their execution engine. There are issues with that.

It can break down data silos within the organization. One person can do the whole thing with StreamSets and SageMaker Canvas, but it hasn't yet had any effect on our operations or business because it's one of those situations where you can either get a demo from them or you basically have to go to one of these sessions and they give you temporary credentials and try to work with your use case. Personally, I would change their model a bit and give a two-week trial license for a cloud platform at the very least. You can then try to get something to work or call up their technical department and say, "Look, I've been evaluating this thing for the last few days. I don't know exactly how to resolve this issue."

View full review »
Ramesh Kuppuswamy - PeerSpot reviewer
Senior Software Developer at a tech vendor with 10,001+ employees

The software is very good overall. Areas for improvement are the error logging and the version history. I would like to see better, more detailed error logging information. Apart from that, I don't think much improvement is required, because the software and features are very good.

View full review »
Ved Prakash Yadav - PeerSpot reviewer
Senior Data Platform Manager at a manufacturing company with 10,001+ employees

We often faced problems, especially with SAP ERP. We struggled because many columns weren't integers or primary keys, which StreamSets couldn't handle. We had to restructure our data tables, which was painful. Also, pipeline failures were common, and data drifting wasn't addressed, which made things worse. Licensing was another issue we encountered.

View full review »
BahatiAsher Faith - PeerSpot reviewer
Software Developer at Appnomu Business Services

The monitoring visualization is not that user-friendly. It should include other features to visualize things, like how many records were streamed from a source to a destination on a particular date. 

I would also like better, detailed logging of error information. 

It also needs a fragment drill-down feature when monitoring a data flow. That needs a lot of improvement, especially when you are running a job.

View full review »
BR
Data Engineer at a consultancy with 11-50 employees

If you use JDBC Lookup, for example, it generally takes a long time to process data.

StreamSets enables us to build data pipelines without knowing how to code. You can do it, however, you need to know data flow. Without knowing anything, it's a bit difficult for new people. You need some technical skills if you are to create a data pipeline. When procuring the data pipeline, for example, you need the original processor and destination. If you don't know where you're going to read the data, where to send the data, and if you have to send the data, you have to configure it. If the destination you're looking for is some particular message permit or data permit, then you should write your own code there. You need some knowledge of coding as StreamSets does not provide any coding.

StreamSets data drift resilience has not exactly reduced the time it takes for us to fix data drift breakages. A lot of improvements are required from StreamSets. I'm not sure how they're planning to make it happen. There are some issues in the case of data processing, and other scenarios.

If the data processing in StreamSets takes a long time as compared to the previous solution, then we will reconsider why we use StreamSets.

View full review »
SR
Product Marketer at a media company with 1,001-5,000 employees

In terms of features, I don't have any complaints so far. But one area for improvement could be the cloud storage server speed, as we have faced some latency issues here and there.

View full review »
TH
Senior Network Administrator at a energy/utilities company with 201-500 employees

The design experience is the bane of our existence because their documentation is not the best. Even when they update their software, they don't publish the best information on how to update and change your pipeline configuration to make it conform to current best practices.

We don't pay for the added support. We use the "freeware version." The user community, as well as the documentation they provide for the standard user, are difficult, at best.

However, we have a couple of people in-house here who are experts in data analysis and they have figured out how to use this tool. We have to have people who are extremely skilled to go in and write the pipelines for this software because it's so complicated. The software works great for us, but there is an extremely steep learning curve because they don't provide a lot of information outside of paying their ridiculous support costs. Their support starts at $50,000 a year and up.

Also, the built-in data drift resilience for ETL operations requires a bunch of custom code development to be able to handle that. It's somewhat difficult because you have to customize it a fair amount.

I also would like a more user-friendly interface and better error-trap handling.

View full review »
AC
Senior Technical Manager at a financial services firm with 501-1,000 employees

I would like to see it integrate with other kinds of platforms, other than Java. We're going to have a lot of applications using .NET and other languages or frameworks. StreamSets is very helpful for the old Java platform but it's hard to integrate with the other platforms and frameworks.

StreamSets works great for batch processing but we are looking for something that is more real-time. We need latency in numbers below milliseconds.

View full review »
MP
Data Engineer at a energy/utilities company with 10,001+ employees

We've seen a couple of cases where it appears to have a memory leak or a similar problem. It grows for a bit and then we'd have to restart the container, maybe once a month when it gets high.

View full review »
Buyer's Guide
StreamSets
March 2024
Learn what your peers think about StreamSets. Get advice and tips from experienced pros sharing their opinions. Updated: March 2024.
767,667 professionals have used our research since 2012.