Azure Data Factory vs IBM InfoSphere DataStage comparison

Cancel
You must select at least 2 products to compare!
Microsoft Logo
26,170 views|20,469 comparisons
91% willing to recommend
IBM Logo
11,157 views|9,214 comparisons
82% willing to recommend
Comparison Buyer's Guide
Executive Summary
Updated on Sep 5, 2022

We performed a comparison between Azure Data Factory and IBM Infosphere DataStage based on our users’ reviews in four categories. After reading all of the collected data, you can find our conclusion below.

  • Ease of Deployment: For the most part, users of both solutions feel they are easy and straightforward to deploy.
  • Features: Azure Data Factory allows users to create ETL pipelines easily. The visual drag-and-drop feature saves time and makes things easy to customize. Users can move data from on-premise to cloud solutions seamlessly. Users would like to see machine learning capabilities and other connectors for other third-party solutions.

    IBM Infosphere DataStorage is robust and can handle huge amounts of data with ease. The solution is very user-friendly, providing drag-and-drop features with a large number of capabilities. Users feel the solution lacks virtualization features and is a bit dated. Theory would like there to be more focus on cloud technologies to be more competitive in the marketplace.
  • Pricing: Azure Data Factory users feel the pricing is reasonable. Users feel IBM Infosphere DataStage is an expensive solution.
  • Service and Support: Overall, users tell us support could be better for both solutions.

Comparison Results: Azure Data Factory is mature, robust, and consistent. The built-in connectors of more than 100 sources and onboarding data from many different sources to the cloud environment make it easier for users to better understand the data flow. Users are happier with its pricing as well. Once IBM Infosphere DataStage moves toward a focus on cloud technologies, it will become a more desirable solution in today’s cloud-focused marketplace.

To learn more, read our detailed Azure Data Factory vs. IBM InfoSphere DataStage Report (Updated: March 2024).
768,246 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"The function of the solution is great.""The solution is okay.""I think it makes it very easy to understand what data flow is and so on. You can leverage the user interface to do the different data flows, and it's great. I like it a lot.""I like that it's a monolithic data platform. This is why we propose these solutions.""I am one hundred percent happy with the stability.""The most valuable feature of Azure Data Factory is that it has a good combination of flexibility, fine-tuning, automation, and good monitoring.""The feature I found most helpful in Azure Data Factory is the pipeline feature, including being able to connect to different sources. Azure Data Factory also has built-in security, which is another valuable feature.""The most valuable aspect is the copy capability."

More Azure Data Factory Pros →

"The performance optimization is quite good in DataStage. It provides parallelism and pipelining mechanisms""Highly customizable: Allowing you to handle multiple data latencies (scheduled batch, on-demand, and real-time) in the same job.""The product is easy to deploy.""The solution is stable.""The most valuable feature is the ability to transfer information via notes.""It works with multiple servers and offers high availability.""The solution's scalability is really good...we are using multi-instance jobs where you can scale them easily.""As a data integration platform, it is easy to use. It is quite robust and useful for volumetric analysis when you have huge volumes of data. We have tested it for up to ten million rows, and it is robust enough to process ten million rows internally with its parallel processing. Its error logging mechanism is far simpler and easier to understand than other data integration tools. The newer version of InfoSphere has the data catalog and IDC lineage. They are helpful in the easy traceability of columns and tables."

More IBM InfoSphere DataStage Pros →

Cons
"Lacks in-built streaming data processing.""There are limitations when processing more than one GD file.""I would like to see this time travel feature in Snowflake added to Azure Data Factory.""Data Factory would be improved if it were a little more configuration-oriented and not so code-oriented and if it had more automated features.""Azure Data Factory uses many resources and has issues with parallel workflows.""Areas for improvement in Azure Data Factory include connectivity and integration. When you use integration runtime, whenever there's a failure, the backup process in Azure Data Factory takes time, so this is another area for improvement.""There is no built-in function for automatically adding notifications concerning the progress or outline of a pipeline run.""There aren't many third-party extensions or plugins available in the solution."

More Azure Data Factory Cons →

"The interface needs improvement.""In the future, I would like to see more integration with cloud technologies.""It would be great if they can include some basic version of data quality checking features.""In terms of intermediate storage, we have some challenges, especially with customers who store data in intermediate locations.""The solution should be more user-friendly.""The initial setup can be complex.""We would be happy to see in next versions the ability to return several parameters from jobs. Now, jobs can return just one parameter. If they could return several parameters, that would be great.""The solution can be a bit more user-friendly, similar to Informatica."

More IBM InfoSphere DataStage Cons →

Pricing and Cost Advice
  • "In terms of licensing costs, we pay somewhere around S14,000 USD per month. There are some additional costs. For example, we would have to subscribe to some additional computing and for elasticity, but they are minimal."
  • "This is a cost-effective solution."
  • "The price you pay is determined by how much you use it."
  • "Understanding the pricing model for Data Factory is quite complex."
  • "I would not say that this product is overly expensive."
  • "The licensing is a pay-as-you-go model, where you pay for what you consume."
  • "Our licensing fees are approximately 15,000 ($150 USD) per month."
  • "The licensing cost is included in the Synapse."
  • More Azure Data Factory Pricing and Cost Advice →

  • "High-cost of ownership: They could take a page from open source software."
  • "Pricing varies based on use, and it is not as costly as some competing enterprise solutions."
  • "Small and medium-sized companies cannot afford to pay for this solution."
  • "The cost is too high."
  • "It's very expensive."
  • "Our internal team takes care of group licensing and cost. We don't have individual licenses. We have group licensing at the company level. Usually, IBM doesn't charge anything separately on the licensing side. For storage and everything else, we are paying around $6,000 per month, which is not very high. It includes Linux data storage, execution, and licensing. They're charging $40 for one-hour execution. Based on that, we are spending around $2,000 on the production environment and $1,000 on the lower environment for testing and development-side executions. For the mainframe, we are using the Db2 mainframe database, and we are spending around $1,000 on the Db2 mainframe database as well. All this comes out to be around $6,000. We, however, would like to have some cost reduction."
  • "The price is expensive but there are no licensing fees."
  • "It is quite expensive."
  • More IBM InfoSphere DataStage Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Data Integration solutions are best for your needs.
    768,246 professionals have used our research since 2012.
    Questions from the Community
    Top Answer:AWS Glue and Azure Data factory for ELT best performance cloud services.
    Top Answer:Azure Data Factory is flexible, modular, and works well. In terms of cost, it is not too pricey. It offers the stability and reliability I am looking for, good scalability, and is easy to set up and… more »
    Top Answer:Azure Data Factory is a solid product offering many transformation functions; It has pre-load and post-load transformations, allowing users to apply transformations either in code by using Power… more »
    Top Answer: My company currently uses the free version of the product, and we are definitely switching to a paid one. We needed a tool that can help us not only integrate our data but use it effectively. For the… more »
    Top Answer: I think the tool may cause some difficulties if you have not used other data integration solutions before. I have worked at companies that used different tools for data integration, and they work… more »
    Top Answer:IBM Cloud Paks makes a big difference in your data integration. My company has been using it alongside IBM InfoSphere DataStage and while the main product is good on its own, this one truly expands… more »
    Ranking
    1st
    out of 100 in Data Integration
    Views
    26,170
    Comparisons
    20,469
    Reviews
    46
    Average Words per Review
    489
    Rating
    8.0
    7th
    out of 100 in Data Integration
    Views
    11,157
    Comparisons
    9,214
    Reviews
    15
    Average Words per Review
    452
    Rating
    7.9
    Comparisons
    Learn More
    Overview

    Azure Data Factory efficiently manages and integrates data from various sources, enabling seamless movement and transformation across platforms. Its valuable features include seamless integration with Azure services, handling large data volumes, flexible transformation, user-friendly interface, extensive connectors, and scalability. Users have experienced improved team performance, workflow simplification, enhanced collaboration, streamlined processes, and boosted productivity.

    IBM InfoSphere DataStage is a high-quality data integration tool that aims to design, develop, and run jobs that move and transform data for organizations of different sizes. The product works by integrating data across multiple systems through a high-performance parallel framework. It supports extended metadata management, enterprise connectivity, and integration of all types of data.

    The solution is the data integration component of IBM InfoSphere Information Server, providing a graphical framework for moving data from source systems to target systems. IBM InfoSphere DataStage can deliver data to data warehouses, data marts, operational data sources, and other enterprise applications. The tool works with various types of patterns - extract, transform and load (ETL), and extract, load, and transform (ELT). The scalability of the platform is achieved by using parallel processing and enterprise connectivity.

    The solution has various versions, catering to different types of companies, which include the Server Edition, the Enterprise Edition, and the MVS Edition. Depending on which version a company has bought, different goals can be achieved. They include the following:

    • Designing data flows to extract information from multiple sources, transform the data, and deliver it to target databases or applications.

    • Delivery of relevant and accurate data through direct connections to enterprise applications.

    • Reduction of development time and improvement of consistency through prebuilt functions.

    • Utilization of InfoSphere Information Server tools for accelerating the project delivery cycle.

    IBM InfoSphere DataStage can be deployed in various ways, including:

    • As a service: The tool can be accessed from a subscription model, where its capabilities are a part of IBM DataStage on IBM Cloud Park for Data as a Service. This option offers full management on IBM Cloud.

    • On premises or in any cloud: The two editions - IBM DataStage Enterprise and IBM DataStage Enterprise Plus - can run workloads on premises or in any cloud when added to IBM DataStage on IBM Cloud Pak for Data as a Service.

    • On premises: The basic jobs of the tool can be run on premises using IBM DataStage.

    IBM InfoSphere DataStage Features

    The tool has various features through which users can integrate and utilize their data effectively. The components of IBM InfoSphere DataStage include:

    • AI services: The tool offers services such as data science, event messaging, data warehousing, and data virtualization. It accelerates processes through artificial intelligence (AI) and offers a connection with IBM Cloud Paks - the cloud-native insight platform of the solution.

    • Parallel engine: Through this feature, ETL performance can be optimized to process data at scale. This is achieved through parallel engine and load balancing, which maximizes throughput.

    • Metadata support: This feature of the product uses the IBM Watson Knowledge Catalog to protect companies' sensitive data and monitor who can access it and at what levels.

    • Automated delivery pipelines: IBM InfoSphere DataStage reduces costs by automating continuous integration and delivery of pipelines.

    • Prebuilt connectors: The feature for prebuilt connectivity and stages allows users to move data between multiple cloud sources and data warehouses, including IBM native products.

    • IBM DataStage Flow Designer: This feature offers assistance through machine learning design. The product offers its clients a user-friendly interface which facilitates the work process.

    • IBM InfoSphere QualityStage: The tool provides a feature that automatically resolves data quality issues and increases the reliability of the delivered data.

    • Automated failure detection: Through this feature, companies can reduce infrastructure management efforts, relying on the automated detection that the tool offers.

    • Distributed data processing: Cloud runtimes can be executed remotely through this feature while maintaining its sovereignty and decreasing costs.

    IBM InfoSphere DataStage Benefits

    This solution offers many benefits for the companies that utilize it for data integration. Some of these benefits include:

    • Increased speed of workload execution due to better balancing and a parallel engine.

    • Reduction of data movement costs through integrations and seamless design of jobs.

    • Modernization of data integration by extending the capabilities of companies' data.

    • Delivery of reliable data through IBM Cloud Pak for Data.

    • Utilization of a drag-and-drop interface which assists in the delivery of data without the need for code.

    • Effective data manipulation allows data to be merged before being mapped and transformed.

    • Creating easier access of users to their data by providing visual maps of the process and the delivered data.

    Reviews from Real Users

    A data/solution architect at a computer software company says the product is robust, easy to use, has a simple error logging mechanism, and works very well for huge volumes of data.

    Tirthankar Roy Chowdhury, team leader at Tata Consultancy Services, feels the tool is user-friendly with a lot of functionalities, and doesn't require much coding because of its drag-and-drop features.

    Sample Customers
    1. Adobe 2. BMW 3. Coca-Cola 4. General Electric 5. Johnson & Johnson 6. LinkedIn 7. Mastercard 8. Nestle 9. Pfizer 10. Samsung 11. Siemens 12. Toyota 13. Unilever 14. Verizon 15. Walmart 16. Accenture 17. American Express 18. AT&T 19. Bank of America 20. Cisco 21. Deloitte 22. ExxonMobil 23. Ford 24. General Motors 25. IBM 26. JPMorgan Chase 27. Microsoft (Azure Data Factory is developed by Microsoft) 28. Oracle 29. Procter & Gamble 30. Salesforce 31. Shell 32. Visa
    Dubai Statistics Center, Etisalat Egypt
    Top Industries
    REVIEWERS
    Computer Software Company34%
    Insurance Company11%
    Manufacturing Company8%
    Financial Services Firm8%
    VISITORS READING REVIEWS
    Computer Software Company13%
    Financial Services Firm13%
    Manufacturing Company8%
    Healthcare Company7%
    REVIEWERS
    Computer Software Company50%
    Insurance Company14%
    Healthcare Company7%
    Financial Services Firm7%
    VISITORS READING REVIEWS
    Financial Services Firm26%
    Manufacturing Company11%
    Computer Software Company10%
    Insurance Company7%
    Company Size
    REVIEWERS
    Small Business29%
    Midsize Enterprise19%
    Large Enterprise52%
    VISITORS READING REVIEWS
    Small Business18%
    Midsize Enterprise13%
    Large Enterprise70%
    REVIEWERS
    Small Business45%
    Midsize Enterprise6%
    Large Enterprise49%
    VISITORS READING REVIEWS
    Small Business16%
    Midsize Enterprise9%
    Large Enterprise75%
    Buyer's Guide
    Azure Data Factory vs. IBM InfoSphere DataStage
    March 2024
    Find out what your peers are saying about Azure Data Factory vs. IBM InfoSphere DataStage and other solutions. Updated: March 2024.
    768,246 professionals have used our research since 2012.

    Azure Data Factory is ranked 1st in Data Integration with 81 reviews while IBM InfoSphere DataStage is ranked 7th in Data Integration with 37 reviews. Azure Data Factory is rated 8.0, while IBM InfoSphere DataStage is rated 7.8. The top reviewer of Azure Data Factory writes "The data factory agent is quite good but pricing needs to be more transparent". On the other hand, the top reviewer of IBM InfoSphere DataStage writes "User-friendly with a lot of functions for transmission rules, but has slow performance and not suitable for a huge volume of data". Azure Data Factory is most compared with Informatica PowerCenter, Informatica Cloud Data Integration, Alteryx Designer, Snowflake and Palantir Foundry, whereas IBM InfoSphere DataStage is most compared with IBM Cloud Pak for Data, SSIS, Talend Open Studio, Informatica PowerCenter and IBM InfoSphere Information Server. See our Azure Data Factory vs. IBM InfoSphere DataStage report.

    See our list of best Data Integration vendors.

    We monitor all Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.