No more typing reviews! Try our Samantha, our new voice AI agent.

IBM InfoSphere DataStage vs SAP Data Hub comparison

 

Comparison Buyer's Guide

Executive Summary

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Categories and Ranking

IBM InfoSphere DataStage
Average Rating
7.8
Reviews Sentiment
6.7
Number of Reviews
43
Ranking in other categories
Data Integration (9th)
SAP Data Hub
Average Rating
7.6
Reviews Sentiment
6.8
Number of Reviews
3
Ranking in other categories
Data Governance (33rd), Metadata Management (15th)
 

Mindshare comparison

IBM InfoSphere DataStage and SAP Data Hub aren’t in the same category and serve different purposes. IBM InfoSphere DataStage is designed for Data Integration and holds a mindshare of 1.9%, down 5.4% compared to last year.
SAP Data Hub, on the other hand, focuses on Data Governance, holds 1.1% mindshare, up 1.0% since last year.
Data Integration Mindshare Distribution
ProductMindshare (%)
IBM InfoSphere DataStage1.9%
SSIS3.6%
Informatica Intelligent Data Management Cloud (IDMC)3.6%
Other90.9%
Data Integration
Data Governance Mindshare Distribution
ProductMindshare (%)
SAP Data Hub1.1%
Microsoft Purview Data Governance11.5%
Collibra Platform8.6%
Other78.8%
Data Governance
 

Featured Reviews

Prasad Bodduluri - PeerSpot reviewer
Senior Data Warehouse Developer at itcinfotech
Has required complex workarounds for scripts and struggles with unstructured data processing
There is no issue with IBM InfoSphere DataStage's graphical interface for designing data flows, but I will provide feedback that we are gathering the source from the Oracle database mainly, as well as from some spreadsheets. With respect to the Oracle DB Connector, if you write any PL/SQL or SQL with the connectors, there aren't many options, such as executing procedures in the PL/SQL, executing functions, or executing packages. The Oracle connector doesn't have many features and needs improvement. Nowadays many people are writing programs in Python or in PL/SQL with respect to Oracle, so especially in IBM InfoSphere DataStage, there are no features to call programs directly instead of calling them as a script. What I am facing, especially with parallel processing, is that a developer and admin have to sit together. They have to run the job multiple times with different combinations of parallel processing to get the best performance. Instead of that, if the job itself gave some guidance, such as running this parallel processing with this many nodes, it would help; I think that is missing. An additional feature I would want to see in the next release is the ability to work on logs, especially machine logs or artificial logs, to pull semi-structured or unstructured data without having to write extensive code in Python and integrate it. If IBM InfoSphere DataStage provided some feature for this, it would help.
VM
GTM Lead at Capgemini
The solution is seamless, but the database sometimes leads to confusion
We used to have multiple different kinds of databases, which internally, had different compliance levels. Retention management is very different now. If the policy is live and the claim has been completed, I couldn't archive the claim. I needed to keep a reference integrity of that claim and understand which policy paid out the claim. With this solution, the policy came in six months ago and qualified for archiving. The claim had been paid and in every environment, the claim had been closed, including the reporting system, the claims system, etc. With the payment set gateway, I can just go and archive. But, we had a hard time during this process. I rate the overall solution a seven out of ten.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"Offers great flexibility."
"I have leveraged IBM InfoSphere DataStage's integration with IBM's Information Server suite, and it is indeed beneficial."
"It's a robust solution."
"Compared to other ETL tools, DataStage has excellent debugging and development capabilities. And the availability of connectors, even though we sometimes have to opt for specific ones. Also, the availability of patches is good."
"The performance optimization is quite good in DataStage, and it provides parallelism and pipelining mechanisms."
"We are mostly using transmission rules. It has a lot of functions and logic related to transmission. It is a user-friendly tool with in-built functions."
"The solution is very easy to use."
"IBM InfoSphere DataStage is a good product; it is quite useful and powerful."
"Having this solution enables us to approach our clients to upgrade their databases, and we upgrade them according to their business requirements."
"Its connection to on-premise products is the most valuable. We mostly use the on-premise connection, which is seamless. This is what we prefer in this solution over other solutions. We are using it the most for the orchestration where the data is coming from different categories. Its other features are very much similar to what they are giving us in open source. Their push-down approach is the most advantageous, where they push most of the processing on to the same data source. This means that they have a serverless kind of thing, and they don't process the data inside a product such as Data Hub. They process the data from where the data is coming out. If it is coming from HANA, to capture the data or process it for analytics, orchestration, or management, they go to the HANA database and give it out. They don't process it on Data Hub. This push-down approach increases the processing speed a little bit because the data is processed where it is sitting. That's the best part and an advantage. I have used another product where they used to capture the data first and then they used to process it and give it. In Data Hub, it is in reverse. They process it first and give it, and then they put their own manipulations. They lead in terms of business functions. No other solution has business functions already implemented to perform business analysis. They have a lot of prebuilt business functions for machine learning and orchestration, which we can use directly to get an analysis out from the existing data. Most of the data is sitting as enterprise data there. That's a major advantage that they have."
"The most valuable feature is the S/4HANA 1909 On-Premise"
"SAP is one of the most seamless ERPs that have integrated SAP archiving within Excel. I have not seen this with any other database."
"They lead in terms of business functions, and no other solution has business functions already implemented to perform business analysis, with a lot of prebuilt business functions for machine learning and orchestration that we can use directly to get an analysis out from the existing enterprise data."
 

Cons

"For reading the machine log in IBM InfoSphere DataStage, I would rate it only a two because there's not much improvement in this area."
"The setup is extremely difficult."
"The platform also needs more stability; it caches a lot and crashes on the application servers that the host allows on the platform."
"The troubleshooting guide is very bad."
"The solution should be more user-friendly."
"Their web interface is good but the on-prem sites are outdated. The solution could also be improved if they could integrate the data pipeline scheduling part of their interface."
"I am unsure of whether it supports other databases like Postgres or Redshift."
"Reduced cost would allow more customers to choose the product. It's quite expensive in relation to the cost of other similar solutions."
"Nowadays there are some inconsistencies in data bases, however, they upgrade and release the versions to market."
"Nowadays there are some inconsistencies in data bases, however, they upgrade and release the versions to market."
"Its performance needs improvement. It is a little slow. It is not the best in the market, and there are other products that are much better than this."
"In 2018, connecting it to outside sources, such as IoT products or IoT-enabled big data Hadoop, was a little complex. It was not smooth at the beginning. It was unstable. It took a lot of time for the initial data load. Sometimes, the connection broke, and we had to restart the process, which was a major issue, but they might have improved it now. It is very smooth with SAP HANA on-premise system, SAP Cloud Platform, and SAP Analytics Cloud. It could be because these are their own products, and they know how to integrate them. With Hadoop, they might have used open-source technologies, and that's why it was breaking at that time. They are providing less embedded integration because they want us to use their other products. For example, they don't want to go and remove SAP Analytics Cloud and put everything in Data Hub. They want us to use SAP Analytics Cloud somewhere else and not inside the Data Hub. On the integration part, it lacks real-time analytics, and it is slow. They should embed the SAP Analytics Cloud inside Data Hub or support some kind of analysis. They do provide some analysis, but it is not extensive. They are moreover open source. So, we need a lot of developers or data scientists to go in and implement Python algorithms. It would be better if they can provide their own existing algorithms and give some connections and drop-down menus to go and just configure those. It will make things really quick by increasing the embedded integrations. It will also improve the process efficiency and processing power. Its performance needs improvement. It is a little slow. It is not the best in the market, and there are other products that are much better than this. In terms of technology and performance, it is a little slow as compared to Microsoft and other data orchestration products. I haven't used other products, but I have read about those products, their settings, and the milliseconds that they do. In Azure Purview, they say that they can copy, manage, or transform the data within milliseconds. They say that they can transform 100 gigabytes of data within three to five seconds, which is something SAP cannot do. It generally takes a lot of time to process that much amount of data. However, I have never tested out Azure."
"The company has everything offshore."
 

Pricing and Cost Advice

"It's very expensive."
"High-cost of ownership: They could take a page from open source software."
"Small and medium-sized companies cannot afford to pay for this solution."
"It's quite expensive."
"The pricing is competitive but on the higher side of the pricing scale."
"It is quite expensive."
"The product is expensive."
"The cost is too high."
"The Cloud is very expensive, but SAP HANA previous service is okay."
report
Use our free recommendation engine to learn which Data Integration solutions are best for your needs.
885,444 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
24%
Government
9%
Manufacturing Company
8%
Computer Software Company
6%
Manufacturing Company
18%
Financial Services Firm
13%
Construction Company
9%
Government
9%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
By reviewers
Company SizeCount
Small Business23
Midsize Enterprise4
Large Enterprise26
No data available
 

Questions from the Community

Would you upgrade to more premium versions of IBM InfoSphere DataStage?
My company currently uses the free version of the product, and we are definitely switching to a paid one. We needed a tool that can help us not only integrate our data but use it effectively. For ...
Is IBM InfoSphere DataStage more difficult to use compared to other tools in the field?
I think the tool may cause some difficulties if you have not used other data integration solutions before. I have worked at companies that used different tools for data integration, and they work ...
Do you rely on IBM Cloud Paks for your data? Have you utilized this product, or do you use IBM InfoSphere DataStage without it?
IBM Cloud Paks makes a big difference in your data integration. My company has been using it alongside IBM InfoSphere DataStage and while the main product is good on its own, this one truly expands...
Ask a question
Earn 20 points
 

Overview

 

Sample Customers

Dubai Statistics Center, Etisalat Egypt
Kaeser Kompressoren, HARTMANN
Find out what your peers are saying about Microsoft, Informatica, Qlik and others in Data Integration. Updated: March 2026.
885,444 professionals have used our research since 2012.