We performed a comparison between IBM InfoSphere DataStage and Spring Cloud Data Flow based on real PeerSpot user reviews.
Find out in this report how the two Data Integration Tools solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI.
"StreamSets data drift feature gives us an alert upfront so we know that the data can be ingested. Whatever the schema or data type changes, it lands automatically into the data lake without any intervention from us, but then that information is crucial to fix for downstream pipelines, which process the data into models, like Tableau and Power BI models. This is actually very useful for us. We are already seeing benefits. Our pipelines used to break when there were data drift changes, then we needed to spend about a week fixing it. Right now, we are saving one to two weeks. Though, it depends on the complexity of the pipeline, we are definitely seeing a lot of time being saved."
"StreamSets’ data drift resilience has reduced the time it takes us to fix data drift breakages. For example, in our previous Hadoop scenario, when we were creating the Sqoop-based processes to move data from source to destinations, we were getting the job done. That took approximately an hour to an hour and a half when we did it with Hadoop. However, with the StreamSets, since it works on a data collector-based mechanism, it completes the same process in 15 minutes of time. Therefore, it has saved us around 45 minutes per data pipeline or table that we migrate. Thus, it reduced the data transfer, including the drift part, by 45 minutes."
"It is really easy to set up and the interface is easy to use."
"It is a very powerful, modern data analytics solution, in which you can integrate a large volume of data from different sources. It integrates all of the data and you can design, create, and monitor pipelines according to your requirements. It is an all-in-one day data ops solution."
"In StreamSets, everything is in one place."
"I have used Data Collector, Transformer, and Control Hub products from StreamSets. What I really like about these products is that they're very user-friendly. People who are not from a technological or core development background find it easy to get started and build data pipelines and connect to the databases. They would be comfortable like any technical person within a couple of weeks."
"As a data integration platform, it is easy to use. It is quite robust and useful for volumetric analysis when you have huge volumes of data. We have tested it for up to ten million rows, and it is robust enough to process ten million rows internally with its parallel processing. Its error logging mechanism is far simpler and easier to understand than other data integration tools. The newer version of InfoSphere has the data catalog and IDC lineage. They are helpful in the easy traceability of columns and tables."
"The Hierarchical Data Stage is good."
"It's a robust solution."
"The performance optimization is quite good in DataStage. It provides parallelism and pipelining mechanisms"
"Offers great flexibility."
"The most valuable feature is the data integration for data warehousing."
"The best feature of IBM InfoSphere DataStage for me was that it was very much user-friendly. The solution didn't require that much raw coding because most of its features were drag and drop, plus it had a large number of functionalities."
"When we have needed help from the IBM team, they were helpful. Our company is a premium partner so we get fast responses."
"The most valuable feature is real-time streaming."
"There are a lot of options in Spring Cloud. It's flexible in terms of how we can use it. It's a full infrastructure."
"Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful."
"Sometimes, when we have large amounts of data that is very efficiently stored in Hadoop or Kafka, it is not very efficient to run it through StreamSets, due to the lack of efficiency or the resources that StreamSets is using."
"If you use JDBC Lookup, for example, it generally takes a long time to process data."
"We create pipelines or jobs in StreamSets Control Hub. It is a great feature, but if there is a way to have a folder structure or organize the pipelines and jobs in Control Hub, it would be great. I submitted a ticket for this some time back."
"We've seen a couple of cases where it appears to have a memory leak or a similar problem."
"The logging mechanism could be improved. If I am working on a pipeline, then create a job out of it and it is running, it will generate constant logs. So, the logging mechanism could be simplified. Now, it is a bit difficult to understand and filter the logs. It takes some time."
"In the future, I would like to see more integration with cloud technologies."
"It doesn't have any big data connections. It would be good to have them because most of the systems are moving towards big data. There should also be a user-friendly way to interact with the cloud. Its loading process is very slow. It takes a lot of time for around 5 or 6 million records, and we are not able to provide real-time data to the vendors due to this delay. Its performance needs to be improved. It is also like a legacy system. It is not updated much. In higher versions, they only do small changes. We would like to have new features and new technologies."
"The response time from support is slow and needs to be improved."
"Their web interface is good but the on-prem sites are outdated. The solution could also be improved if they could integrate the data pipeline scheduling part of their interface."
"The initial setup could be more straightforward."
"The interface needs improvement."
"Its documentation is not up to the mark. While building APIs, we had a lot of problems trying to get around it because it is not very user-friendly. We tried to get hold of API documentation, but the documentation is not very well thought out. It should be more structured and elaborate. In terms of additional features, I would like to see good reporting on performance and performance-tuning recommendations that can be based on AI. I would also like to see better data profiling information being reported on InfoSphere."
"It would be useful to provide support for Python, AR, and Java."
"The configurations could be better. Some configurations are a little bit time-consuming in terms of trying to understand using the Spring Cloud documentation."
"Some of the features, like the monitoring tools, are not very mature and are still evolving."
StreamSets offers an end-to-end data integration platform to build, run, monitor and manage smart data pipelines that deliver continuous data for DataOps, and power the modern data ecosystem and hybrid integration.
Only StreamSets provides a single design experience for all design patterns for 10x greater developer productivity; smart data pipelines that are resilient to change for 80% less breakages; and a single pane of glass for managing and monitoring all pipelines across hybrid and cloud architectures to eliminate blind spots and control gaps.
With StreamSets, you can deliver the continuous data that drives the connected enterprise.
Spring Cloud Data Flow is a toolkit for building data integration and real-time data processing pipelines.
Pipelines consist of Spring Boot apps, built using the Spring Cloud Stream or Spring Cloud Task microservice frameworks. This makes Spring Cloud Data Flow suitable for a range of data processing use cases, from import/export to event streaming and predictive analytics. Use Spring Cloud Data Flow to connect your Enterprise to the Internet of Anything—mobile devices, sensors, wearables, automobiles, and more.
IBM InfoSphere DataStage is ranked 7th in Data Integration Tools with 10 reviews while Spring Cloud Data Flow is ranked 17th in Data Integration Tools with 2 reviews. IBM InfoSphere DataStage is rated 7.8, while Spring Cloud Data Flow is rated 8.0. The top reviewer of IBM InfoSphere DataStage writes "Robust, easy to use, has a simple error logging mechanism, and works very well for huge volumes of data". On the other hand, the top reviewer of Spring Cloud Data Flow writes "Good logging mechanisms, a strong infrastructure and pretty scalable". IBM InfoSphere DataStage is most compared with SSIS, Talend Open Studio, AWS Glue, Informatica PowerCenter and Azure Data Factory, whereas Spring Cloud Data Flow is most compared with Apache Flink, Amazon Kinesis, Apache Spark Streaming, Cloudera DataFlow and SSIS. See our IBM InfoSphere DataStage vs. Spring Cloud Data Flow report.
See our list of best Data Integration Tools vendors.
We monitor all Data Integration Tools reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.