IBM InfoSphere DataStage vs SAP Data Hub comparison

Cancel
You must select at least 2 products to compare!
IBM Logo
11,157 views|9,214 comparisons
82% willing to recommend
SAP Logo
821 views|686 comparisons
66% willing to recommend
Comparison Buyer's Guide
Executive Summary

We performed a comparison between IBM InfoSphere DataStage and SAP Data Hub based on real PeerSpot user reviews.

Find out what your peers are saying about Microsoft, Informatica, Oracle and others in Data Integration.
To learn more, read our detailed Data Integration Report (Updated: April 2024).
768,924 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"The most valuable feature of the solution is the ability to incorporate very complex business rules in Data Stage.""The solution's scalability is really good...we are using multi-instance jobs where you can scale them easily.""When we have needed help from the IBM team, they were helpful. Our company is a premium partner so we get fast responses.""As a data integration platform, it is easy to use. It is quite robust and useful for volumetric analysis when you have huge volumes of data. We have tested it for up to ten million rows, and it is robust enough to process ten million rows internally with its parallel processing. Its error logging mechanism is far simpler and easier to understand than other data integration tools. The newer version of InfoSphere has the data catalog and IDC lineage. They are helpful in the easy traceability of columns and tables.""Offers great flexibility.""Finding logs is very easy on the solution.""I am impressed with the tool's ETL tracing.""The most valuable feature is the ability to transfer information via notes."

More IBM InfoSphere DataStage Pros →

"Its connection to on-premise products is the most valuable. We mostly use the on-premise connection, which is seamless. This is what we prefer in this solution over other solutions. We are using it the most for the orchestration where the data is coming from different categories. Its other features are very much similar to what they are giving us in open source. Their push-down approach is the most advantageous, where they push most of the processing on to the same data source. This means that they have a serverless kind of thing, and they don't process the data inside a product such as Data Hub. They process the data from where the data is coming out. If it is coming from HANA, to capture the data or process it for analytics, orchestration, or management, they go to the HANA database and give it out. They don't process it on Data Hub. This push-down approach increases the processing speed a little bit because the data is processed where it is sitting. That's the best part and an advantage. I have used another product where they used to capture the data first and then they used to process it and give it. In Data Hub, it is in reverse. They process it first and give it, and then they put their own manipulations. They lead in terms of business functions. No other solution has business functions already implemented to perform business analysis. They have a lot of prebuilt business functions for machine learning and orchestration, which we can use directly to get an analysis out from the existing data. Most of the data is sitting as enterprise data there. That's a major advantage that they have.""SAP is one of the most seamless ERPs that have integrated SAP archiving within Excel. I have not seen this with any other database.""The most valuable feature is the S/4HANA 1909 On-Premise"

More SAP Data Hub Pros →

Cons
"The documentation and in-application help for this solution need to be improved, especially for new features.""I'd like to be able to do more with the data and metadata, including copy and pasting, et cetera.""The setup is extremely difficult.""I really like this tool, but the administration should be on the same client application because a lot of administration features are not on the client-side, and they usually need to have administrative access. It's quite complicated to force IT teams to have separate administrative access from the developers.""The response time from support is slow and needs to be improved.""The interface needs improvement. It is really too technical. That is the main problem.""So, there are some features that are missing. If I compare DataStage to Talend, Talend allows you to write custom code in Java or use these tools in your applications as well if you are building a job application. But in DataStage, it does not allow you to write custom code for any component.""The interface needs improvement."

More IBM InfoSphere DataStage Cons →

"In 2018, connecting it to outside sources, such as IoT products or IoT-enabled big data Hadoop, was a little complex. It was not smooth at the beginning. It was unstable. It took a lot of time for the initial data load. Sometimes, the connection broke, and we had to restart the process, which was a major issue, but they might have improved it now. It is very smooth with SAP HANA on-premise system, SAP Cloud Platform, and SAP Analytics Cloud. It could be because these are their own products, and they know how to integrate them. With Hadoop, they might have used open-source technologies, and that's why it was breaking at that time. They are providing less embedded integration because they want us to use their other products. For example, they don't want to go and remove SAP Analytics Cloud and put everything in Data Hub. They want us to use SAP Analytics Cloud somewhere else and not inside the Data Hub. On the integration part, it lacks real-time analytics, and it is slow. They should embed the SAP Analytics Cloud inside Data Hub or support some kind of analysis. They do provide some analysis, but it is not extensive. They are moreover open source. So, we need a lot of developers or data scientists to go in and implement Python algorithms. It would be better if they can provide their own existing algorithms and give some connections and drop-down menus to go and just configure those. It will make things really quick by increasing the embedded integrations. It will also improve the process efficiency and processing power. Its performance needs improvement. It is a little slow. It is not the best in the market, and there are other products that are much better than this. In terms of technology and performance, it is a little slow as compared to Microsoft and other data orchestration products. I haven't used other products, but I have read about those products, their settings, and the milliseconds that they do. In Azure Purview, they say that they can copy, manage, or transform the data within milliseconds. They say that they can transform 100 gigabytes of data within three to five seconds, which is something SAP cannot do. It generally takes a lot of time to process that much amount of data. However, I have never tested out Azure.""Nowadays there are some inconsistencies in data bases, however, they upgrade and release the versions to market.""The company has everything offshore."

More SAP Data Hub Cons →

Pricing and Cost Advice
  • "High-cost of ownership: They could take a page from open source software."
  • "Pricing varies based on use, and it is not as costly as some competing enterprise solutions."
  • "Small and medium-sized companies cannot afford to pay for this solution."
  • "The cost is too high."
  • "It's very expensive."
  • "Our internal team takes care of group licensing and cost. We don't have individual licenses. We have group licensing at the company level. Usually, IBM doesn't charge anything separately on the licensing side. For storage and everything else, we are paying around $6,000 per month, which is not very high. It includes Linux data storage, execution, and licensing. They're charging $40 for one-hour execution. Based on that, we are spending around $2,000 on the production environment and $1,000 on the lower environment for testing and development-side executions. For the mainframe, we are using the Db2 mainframe database, and we are spending around $1,000 on the Db2 mainframe database as well. All this comes out to be around $6,000. We, however, would like to have some cost reduction."
  • "The price is expensive but there are no licensing fees."
  • "It is quite expensive."
  • More IBM InfoSphere DataStage Pricing and Cost Advice →

  • "The Cloud is very expensive, but SAP HANA previous service is okay."
  • More SAP Data Hub Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Data Integration solutions are best for your needs.
    768,924 professionals have used our research since 2012.
    Questions from the Community
    Top Answer: My company currently uses the free version of the product, and we are definitely switching to a paid one. We needed a tool that can help us not only integrate our data but use it effectively. For the… more »
    Top Answer: I think the tool may cause some difficulties if you have not used other data integration solutions before. I have worked at companies that used different tools for data integration, and they work… more »
    Top Answer:IBM Cloud Paks makes a big difference in your data integration. My company has been using it alongside IBM InfoSphere DataStage and while the main product is good on its own, this one truly expands… more »
    Top Answer:SAP is one of the most seamless ERPs that have integrated SAP archiving within Excel. I have not seen this with any other database.
    Top Answer:We moved from Oracle. If you're aware of your monitoring system, the RPU market, and the managed system, you should move to HANA, which is an innovative database built by SAP itself. However, this… more »
    Top Answer:I technically handle the database, like cycle management projects. When transaction data comes in, we see it based on the retention periods. We have to move the data to some secure storage rather than… more »
    Ranking
    7th
    out of 100 in Data Integration
    Views
    11,157
    Comparisons
    9,214
    Reviews
    15
    Average Words per Review
    452
    Rating
    7.9
    25th
    out of 57 in Data Governance
    Views
    821
    Comparisons
    686
    Reviews
    1
    Average Words per Review
    469
    Rating
    7.0
    Comparisons
    Learn More
    Overview

    IBM InfoSphere DataStage is a high-quality data integration tool that aims to design, develop, and run jobs that move and transform data for organizations of different sizes. The product works by integrating data across multiple systems through a high-performance parallel framework. It supports extended metadata management, enterprise connectivity, and integration of all types of data.

    The solution is the data integration component of IBM InfoSphere Information Server, providing a graphical framework for moving data from source systems to target systems. IBM InfoSphere DataStage can deliver data to data warehouses, data marts, operational data sources, and other enterprise applications. The tool works with various types of patterns - extract, transform and load (ETL), and extract, load, and transform (ELT). The scalability of the platform is achieved by using parallel processing and enterprise connectivity.

    The solution has various versions, catering to different types of companies, which include the Server Edition, the Enterprise Edition, and the MVS Edition. Depending on which version a company has bought, different goals can be achieved. They include the following:

    • Designing data flows to extract information from multiple sources, transform the data, and deliver it to target databases or applications.

    • Delivery of relevant and accurate data through direct connections to enterprise applications.

    • Reduction of development time and improvement of consistency through prebuilt functions.

    • Utilization of InfoSphere Information Server tools for accelerating the project delivery cycle.

    IBM InfoSphere DataStage can be deployed in various ways, including:

    • As a service: The tool can be accessed from a subscription model, where its capabilities are a part of IBM DataStage on IBM Cloud Park for Data as a Service. This option offers full management on IBM Cloud.

    • On premises or in any cloud: The two editions - IBM DataStage Enterprise and IBM DataStage Enterprise Plus - can run workloads on premises or in any cloud when added to IBM DataStage on IBM Cloud Pak for Data as a Service.

    • On premises: The basic jobs of the tool can be run on premises using IBM DataStage.

    IBM InfoSphere DataStage Features

    The tool has various features through which users can integrate and utilize their data effectively. The components of IBM InfoSphere DataStage include:

    • AI services: The tool offers services such as data science, event messaging, data warehousing, and data virtualization. It accelerates processes through artificial intelligence (AI) and offers a connection with IBM Cloud Paks - the cloud-native insight platform of the solution.

    • Parallel engine: Through this feature, ETL performance can be optimized to process data at scale. This is achieved through parallel engine and load balancing, which maximizes throughput.

    • Metadata support: This feature of the product uses the IBM Watson Knowledge Catalog to protect companies' sensitive data and monitor who can access it and at what levels.

    • Automated delivery pipelines: IBM InfoSphere DataStage reduces costs by automating continuous integration and delivery of pipelines.

    • Prebuilt connectors: The feature for prebuilt connectivity and stages allows users to move data between multiple cloud sources and data warehouses, including IBM native products.

    • IBM DataStage Flow Designer: This feature offers assistance through machine learning design. The product offers its clients a user-friendly interface which facilitates the work process.

    • IBM InfoSphere QualityStage: The tool provides a feature that automatically resolves data quality issues and increases the reliability of the delivered data.

    • Automated failure detection: Through this feature, companies can reduce infrastructure management efforts, relying on the automated detection that the tool offers.

    • Distributed data processing: Cloud runtimes can be executed remotely through this feature while maintaining its sovereignty and decreasing costs.

    IBM InfoSphere DataStage Benefits

    This solution offers many benefits for the companies that utilize it for data integration. Some of these benefits include:

    • Increased speed of workload execution due to better balancing and a parallel engine.

    • Reduction of data movement costs through integrations and seamless design of jobs.

    • Modernization of data integration by extending the capabilities of companies' data.

    • Delivery of reliable data through IBM Cloud Pak for Data.

    • Utilization of a drag-and-drop interface which assists in the delivery of data without the need for code.

    • Effective data manipulation allows data to be merged before being mapped and transformed.

    • Creating easier access of users to their data by providing visual maps of the process and the delivered data.

    Reviews from Real Users

    A data/solution architect at a computer software company says the product is robust, easy to use, has a simple error logging mechanism, and works very well for huge volumes of data.

    Tirthankar Roy Chowdhury, team leader at Tata Consultancy Services, feels the tool is user-friendly with a lot of functionalities, and doesn't require much coding because of its drag-and-drop features.

    The SAP® Data Hub solution enables sophisticated data operations management. It gives you the capability and flexibility to connect enterprise data and Big Data and gain a deep understanding of data and information processes across sources and systems throughout the distributed landscape. The unified solution provides visibility and control into data opportunities, integrating cloud and on-premise information and driving data agility and business value. Distributed processing power enables greater speed and efficiency.

    Sample Customers
    Dubai Statistics Center, Etisalat Egypt
    Kaeser Kompressoren, HARTMANN
    Top Industries
    REVIEWERS
    Computer Software Company50%
    Insurance Company14%
    Transportation Company7%
    Healthcare Company7%
    VISITORS READING REVIEWS
    Financial Services Firm26%
    Manufacturing Company11%
    Computer Software Company10%
    Insurance Company8%
    VISITORS READING REVIEWS
    Computer Software Company15%
    Manufacturing Company13%
    Financial Services Firm12%
    Government7%
    Company Size
    REVIEWERS
    Small Business45%
    Midsize Enterprise6%
    Large Enterprise49%
    VISITORS READING REVIEWS
    Small Business16%
    Midsize Enterprise9%
    Large Enterprise75%
    VISITORS READING REVIEWS
    Small Business15%
    Midsize Enterprise11%
    Large Enterprise74%
    Buyer's Guide
    Data Integration
    April 2024
    Find out what your peers are saying about Microsoft, Informatica, Oracle and others in Data Integration. Updated: April 2024.
    768,924 professionals have used our research since 2012.

    IBM InfoSphere DataStage is ranked 7th in Data Integration with 37 reviews while SAP Data Hub is ranked 25th in Data Governance with 3 reviews. IBM InfoSphere DataStage is rated 7.8, while SAP Data Hub is rated 7.6. The top reviewer of IBM InfoSphere DataStage writes "User-friendly with a lot of functions for transmission rules, but has slow performance and not suitable for a huge volume of data". On the other hand, the top reviewer of SAP Data Hub writes "The solution is seamless, but the database sometimes leads to confusion". IBM InfoSphere DataStage is most compared with SSIS, IBM Cloud Pak for Data, Azure Data Factory, Talend Open Studio and Informatica PowerCenter, whereas SAP Data Hub is most compared with Microsoft Purview, SAP Data Services, Alation Data Catalog, Azure Data Factory and Palantir Foundry.

    We monitor all Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.