We performed a comparison between IBM InfoSphere DataStage and StreamSets based on real PeerSpot user reviews.
Find out in this report how the two Data Integration solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."The Hierarchical Data Stage is good."
"I am impressed with the tool's ETL tracing."
"The most valuable feature of the solution is the ability to incorporate very complex business rules in Data Stage."
"Offers great flexibility."
"The performance optimization is quite good in DataStage. It provides parallelism and pipelining mechanisms"
"Highly customizable: Allowing you to handle multiple data latencies (scheduled batch, on-demand, and real-time) in the same job."
"The solution has improved the time it takes to perform tasks related to batch applications."
"The best feature of IBM InfoSphere DataStage for me was that it was very much user-friendly. The solution didn't require that much raw coding because most of its features were drag and drop, plus it had a large number of functionalities."
"It's very easy to integrate. It integrates with Snowflake, AWS, Google Cloud, and Azure. It's very helpful for DevOps, DataOps, and data engineering because it provides a comprehensive solution, and it's not complicated."
"The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customize it to do what you need. Many other tools have started to use features similar to those introduced by StreamSets, like automated workflows that are easy to set up."
"StreamSets’ data drift resilience has reduced the time it takes us to fix data drift breakages. For example, in our previous Hadoop scenario, when we were creating the Sqoop-based processes to move data from source to destinations, we were getting the job done. That took approximately an hour to an hour and a half when we did it with Hadoop. However, with the StreamSets, since it works on a data collector-based mechanism, it completes the same process in 15 minutes of time. Therefore, it has saved us around 45 minutes per data pipeline or table that we migrate. Thus, it reduced the data transfer, including the drift part, by 45 minutes."
"It is a very powerful, modern data analytics solution, in which you can integrate a large volume of data from different sources. It integrates all of the data and you can design, create, and monitor pipelines according to your requirements. It is an all-in-one day data ops solution."
"Important features include that it comprises lots of functionality to connect data from various sources through connector availability, scheduling pipelines at any time, and integration with third-party and security solutions for encryption."
"In StreamSets, everything is in one place."
"One of the things I like is the data pipelines. They have a very good design. Implementing pipelines is very straightforward. It doesn't require any technical skill."
"StreamSets Transformer is a good feature because it helps you when you are developing applications and when you don't want to write a lot of code. That is the best feature overall."
"So, there are some features that are missing. If I compare DataStage to Talend, Talend allows you to write custom code in Java or use these tools in your applications as well if you are building a job application. But in DataStage, it does not allow you to write custom code for any component."
"The setup is extremely difficult."
"The interface needs improvement."
"The pricing should be lower."
"It doesn't have any big data connections. It would be good to have them because most of the systems are moving towards big data. There should also be a user-friendly way to interact with the cloud. Its loading process is very slow. It takes a lot of time for around 5 or 6 million records, and we are not able to provide real-time data to the vendors due to this delay. Its performance needs to be improved. It is also like a legacy system. It is not updated much. In higher versions, they only do small changes. We would like to have new features and new technologies."
"Currently lacking virtualization ability."
"The graphical user interface (GUI) feels a lot like the interfaces from the 1980s."
"It would be great if they can include some basic version of data quality checking features."
"Sometimes, it is not clear at first how to set up nodes. A site with an explanation of how each node works would be very helpful."
"The monitoring visualization is not that user-friendly. It should include other features to visualize things, like how many records were streamed from a source to a destination on a particular date."
"We've seen a couple of cases where it appears to have a memory leak or a similar problem."
"Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful."
"We create pipelines or jobs in StreamSets Control Hub. It is a great feature, but if there is a way to have a folder structure or organize the pipelines and jobs in Control Hub, it would be great. I submitted a ticket for this some time back."
"One area for improvement could be the cloud storage server speed, as we have faced some latency issues here and there."
"The documentation is inadequate and has room for improvement because the technical support does not regularly update their documentation or the knowledge base."
"In terms of the product, I don't think there is any room for improvement because it is very good. One small area of improvement that is very much needed is on the knowledge base side. Sometimes, it is not very clear how to set up a certain process or a certain node for a person who's using the platform for the first time."
IBM InfoSphere DataStage is ranked 7th in Data Integration with 37 reviews while StreamSets is ranked 8th in Data Integration with 24 reviews. IBM InfoSphere DataStage is rated 7.8, while StreamSets is rated 8.4. The top reviewer of IBM InfoSphere DataStage writes "User-friendly with a lot of functions for transmission rules, but has slow performance and not suitable for a huge volume of data". On the other hand, the top reviewer of StreamSets writes "We no longer need to hire highly skilled data engineers to create and monitor data pipelines". IBM InfoSphere DataStage is most compared with SSIS, IBM Cloud Pak for Data, Azure Data Factory, Talend Open Studio and Informatica PowerCenter, whereas StreamSets is most compared with Fivetran, Azure Data Factory, Informatica PowerCenter, SSIS and Talend Open Studio. See our IBM InfoSphere DataStage vs. StreamSets report.
See our list of best Data Integration vendors.
We monitor all Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.