StreamSets Room for Improvement

Karthik Rajamani - PeerSpot reviewer
Principal Engineer at Tata Consultancy Services

There are a few things that can be better. We create pipelines or jobs in StreamSets Control Hub. It is a great feature, but if there is a way to have a folder structure or organize the pipelines and jobs in Control Hub, it would be great. I submitted a ticket for this some time back.

There are certain features that are only available at certain stages. For example, HTTP Client has some great features when it is used as a processor, but those features are not available in HTTP Client as a destination.

There could be some improvements on the group side. Currently, if I want to know which users are a part of certain groups, it is not straightforward to see. You have to go to each and every user and check the groups he or she is a part of. They could improve it in that direction. Currently, we have to put in a manual effort. In case something goes wrong, we have to go to each and every user account to check whether he or she is a part of a certain group or not.

View full review »
AbhishekKatara - PeerSpot reviewer
Technical Lead at Sopra Steria

The logging mechanism could be improved. If I am working on a pipeline, then create a job out of it and it is running, it will generate constant logs. So, the logging mechanism could be simplified. Now, it is a bit difficult to understand and filter the logs. It takes some time. For example, if I am starting with StreamSets, everything is fine. However, if I want to dig into problems that my pipeline ran into, it initially takes some time to get familiar with it and understand it.

I feel the visualization part can be simplified or enhanced a bit, so I can easily see what happened with my job seven days earlier and how many records it transmitted. 

View full review »
SS
Senior Data Engineer at a energy/utilities company with 1,001-5,000 employees

One room for improvement is probably the GUI. It is pretty basic and a lot of improvement is required there. 

In terms of security, from an architecture perspective, when we want to implement something, and because our organization is very strict when it comes to cybersecurity, we have been struggling a bit because the platform has a few gaps. Those gaps are really gaps based on our organization's requirements. These are not gaps on StreamSets' side. The solution could improve a lot in terms of having more features added to the security model, which would help us.

There are quite a few features that we wanted. One is SAP HANA. Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful.

View full review »
Buyer's Guide
StreamSets
November 2022
Learn what your peers think about StreamSets. Get advice and tips from experienced pros sharing their opinions. Updated: November 2022.
653,757 professionals have used our research since 2012.
BR
Data Engineer at a consultancy with 11-50 employees

If you use JDBC Lookup, for example, it generally takes a long time to process data.

StreamSets enables us to build data pipelines without knowing how to code. You can do it, however, you need to know data flow. Without knowing anything, it's a bit difficult for new people. You need some technical skills if you are to create a data pipeline. When procuring the data pipeline, for example, you need the original processor and destination. If you don't know where you're going to read the data, where to send the data, and if you have to send the data, you have to configure it. If the destination you're looking for is some particular message permit or data permit, then you should write your own code there. You need some knowledge of coding as StreamSets does not provide any coding.

StreamSets data drift resilience has not exactly reduced the time it takes for us to fix data drift breakages. A lot of improvements are required from StreamSets. I'm not sure how they're planning to make it happen. There are some issues in the case of data processing, and other scenarios.

If the data processing in StreamSets takes a long time as compared to the previous solution, then we will reconsider why we use StreamSets.

View full review »
Prateek Agarwal - PeerSpot reviewer
Manager at NISG

Sometimes, when we have large amounts of data that is very efficiently stored in Hadoop or Kafka, it is not very efficient to run it through StreamSets, due to the lack of efficiency or the resources that StreamSets is using.

Also, the hierarchy of names within the dropdowns and the drag-and-drop features are not familiar to users that do not have a technical or programming background. In those cases, the naming conventions are a challenge.

View full review »
Buyer's Guide
StreamSets
November 2022
Learn what your peers think about StreamSets. Get advice and tips from experienced pros sharing their opinions. Updated: November 2022.
653,757 professionals have used our research since 2012.