Azure Data Factory vs IBM InfoSphere DataStage comparison

Cancel
You must select at least 2 products to compare!
Comparison Buyer's Guide
Executive Summary
Updated on Sep 5, 2022

We performed a comparison between Azure Data Factory and IBM Infosphere DataStage based on our users’ reviews in four categories. After reading all of the collected data, you can find our conclusion below.

  • Ease of Deployment: For the most part, users of both solutions feel they are easy and straightforward to deploy.
  • Features: Azure Data Factory allows users to create ETL pipelines easily. The visual drag-and-drop feature saves time and makes things easy to customize. Users can move data from on-premise to cloud solutions seamlessly. Users would like to see machine learning capabilities and other connectors for other third-party solutions.

    IBM Infosphere DataStorage is robust and can handle huge amounts of data with ease. The solution is very user-friendly, providing drag-and-drop features with a large number of capabilities. Users feel the solution lacks virtualization features and is a bit dated. Theory would like there to be more focus on cloud technologies to be more competitive in the marketplace.
  • Pricing: Azure Data Factory users feel the pricing is reasonable. Users feel IBM Infosphere DataStage is an expensive solution.
  • Service and Support: Overall, users tell us support could be better for both solutions.

Comparison Results: Azure Data Factory is mature, robust, and consistent. The built-in connectors of more than 100 sources and onboarding data from many different sources to the cloud environment make it easier for users to better understand the data flow. Users are happier with its pricing as well. Once IBM Infosphere DataStage moves toward a focus on cloud technologies, it will become a more desirable solution in today’s cloud-focused marketplace.

To learn more, read our detailed Azure Data Factory vs. IBM InfoSphere DataStage Report (Updated: May 2023).
708,544 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"I like how you can create your own pipeline in your space and reuse those creations. You can collaborate with other people who want to use your code.""The most important feature is that it can help you do the multi-threading concepts.""The feature I found most helpful in Azure Data Factory is the pipeline feature, including being able to connect to different sources. Azure Data Factory also has built-in security, which is another valuable feature.""The trigger scheduling options are decently robust.""It's cloud-based, allowing multiple users to easily access the solution from the office or remote locations. I like that we can set up the security protocols for IP addresses, like allow lists. It's a pretty user-friendly product as well. The interface and build environment where you create pipelines are easy to use. It's straightforward to manage the digital transformation pipelines we build.""We haven't had any issues connecting it to other products.""The data mapping and the ability to systematically derive data are nice features. It worked really well for the solution we had. It is visual, and it did the transformation as we wanted.""The most valuable feature of Azure Data Factory is that it has a good combination of flexibility, fine-tuning, automation, and good monitoring."

More Azure Data Factory Pros →

"The most valuable feature of the solution is the ability to incorporate very complex business rules in Data Stage.""The best feature of IBM InfoSphere DataStage for me was that it was very much user-friendly. The solution didn't require that much raw coding because most of its features were drag and drop, plus it had a large number of functionalities.""It works with multiple servers and offers high availability.""It is quite useful and powerful.""Offers great flexibility.""The performance optimization is quite good in DataStage. It provides parallelism and pipelining mechanisms""We like the flexibility of modeling.""The solution is stable."

More IBM InfoSphere DataStage Pros →

Cons
"This solution is currently only useful for basic data movement and file extractions, which we would like to see developed to handle more complex data transformations.""Data Factory's performance during heavy data processing isn't great.""We require Azure Data Factory to be able to connect to Google Analytics.""I have not found any real shortcomings within the product.""Lacks in-built streaming data processing.""There's space for improvement in the development process of the data pipelines.""There's no Oracle connector if you want to do transformation using data flow activity, so Azure Data Factory needs more connectors for data flow transformation.""There is always room to improve. There should be good examples of use that, of course, customers aren't always willing to share. It is Catch-22. It would help the user base if everybody had really good examples of deployments that worked, but when you ask people to put out their good deployments, which also includes me, you usually got, "No, I'm not going to do that." They don't have enough good examples. Microsoft probably just needs to pay one of their partners to build 20 or 30 examples of functional Data Factories and then share them as a user base."

More Azure Data Factory Cons →

"It takes a lot of time to actually trigger your job and then go into the logs and other stuff. So all of this is really time-consuming.""What needs improvement in IBM InfoSphere DataStage is its pricing. The pricing for the solution is higher than its competitors, so a lot of the clients my company has worked with prefer other tools over IBM InfoSphere DataStage because of the high price tag. Another area for improvement in the solution stems from a lot of new types of databases, for example, databases in the cloud and big data have become available, and IBM InfoSphere DataStage is working on various connectors for different data sources, but that still isn't up-to-date, meaning that some connectors are missing for modern data sources. The latest version of IBM InfoSphere DataStage also has a complex architecture, so my team faced frequent outages and that should be improved as well.""Their web interface is good but the on-prem sites are outdated. The solution could also be improved if they could integrate the data pipeline scheduling part of their interface.""Currently lacking virtualization ability.""The error messaging needs to be improved.""The solution can be a bit more user-friendly, similar to Informatica.""In the future, I would like to see more integration with cloud technologies.""It would be useful to provide support for Python, AR, and Java."

More IBM InfoSphere DataStage Cons →

Pricing and Cost Advice
  • "Our licensing fees are approximately 15,000 ($150 USD) per month."
  • "The licensing cost is included in the Synapse."
  • "It's not particularly expensive."
  • "Product is priced at the market standard."
  • "There's no licensing for Azure Data Factory, they have a consumption payment model. How often you are running the service and how long that service takes to run. The price can be approximately $500 to $1,000 per month but depends on the scaling."
  • "I don't see a cost; it appears to be included in general support."
  • "Pricing appears to be reasonable in my opinion."
  • "Pricing is comparable, it's somewhere in the middle."
  • More Azure Data Factory Pricing and Cost Advice →

  • "It's quite expensive."
  • "I have no information on the exact pricing for IBM InfoSphere DataStage because the solution is usually procured by the clients my company works with, though the pricing is higher compared to other solutions, so many clients choose to go with a different solution rather than IBM InfoSphere DataStage."
  • "The pricing depends on the setup. However, we paid $100,000 as a one-time cost for an on-premises setup."
  • More IBM InfoSphere DataStage Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Data Integration Tools solutions are best for your needs.
    708,544 professionals have used our research since 2012.
    Questions from the Community
    Top Answer:AWS Glue and Azure Data factory for ELT best performance cloud services.
    Top Answer:Azure Data Factory is flexible, modular, and works well. In terms of cost, it is not too pricey. It offers the stability and reliability I am looking for, good scalability, and is easy to set up and… more »
    Top Answer:Azure Data Factory is a solid product offering many transformation functions; It has pre-load and post-load transformations, allowing users to apply transformations either in code by using Power… more »
    Top Answer: My company currently uses the free version of the product, and we are definitely switching to a paid one. We needed a tool that can help us not only integrate our data but use it effectively. For the… more »
    Top Answer: I think the tool may cause some difficulties if you have not used other data integration solutions before. I have worked at companies that used different tools for data integration, and they work… more »
    Top Answer:IBM Cloud Paks makes a big difference in your data integration. My company has been using it alongside IBM InfoSphere DataStage and while the main product is good on its own, this one truly expands… more »
    Ranking
    1st
    Views
    43,047
    Comparisons
    33,917
    Reviews
    49
    Average Words per Review
    489
    Rating
    8.0
    13th
    Views
    15,135
    Comparisons
    12,290
    Reviews
    10
    Average Words per Review
    439
    Rating
    7.7
    Comparisons
    Learn More
    Overview

    Azure Data Factory is a managed cloud service built for extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects. This is a digital integration tool as well as a cloud data warehouse that allows users to create, schedule, and manage data in the cloud or on premises. The use cases of the product include data engineering, operational data integration, analytics, ingesting data into data warehouses, and migrating on-premise SQL Server Integration Services (SSIS) packages to Azure.

    The tool allows users to create data-driven workflows for initiating data movement and data transformation at scale. Data can be ingested from disparate data stores via pipelines. Companies can utilize this product to build complex ETL processes for transforming data visually with data flows. Azure Data Factory also offers services such as Azure HDInsight Hadoop, Azure Databricks, Azure Synapse Analytics, and Azure SQL Database. These services are created to facilitate data management and control for organizations, providing them with better visibility of their data for improved decision-making.

    Azure Data Factory allows companies to create schedules for moving and transforming data into their pipelines. This can be done hourly, daily, weekly, or according to the specific needs of the organization. The steps through which the data-driven workflows work in Azure Data Factory are the following:

    1. Connecting to required sources and collecting data. After connecting to the various sources where data is stored, the pipelines move the data to a centralized location for further processing.

    2. Transforming and enriching the data. Once the data is moved to a centralized data store in the cloud, the pipelines transform it through services like HDInsight Hadoop, Azure Data Lake Analytics, Spark, and Machine Learning.

    3. Delivering the transformed data to on-premise sources or keeping it in cloud storage sources for usage by different tools and applications.

    Azure Data Factory Concepts

    The solution consists of a series of interconnected systems that provide data integration and related services for users. The following concepts create the end product for users:

    • Pipelines: A pipeline refers to the logical grouping of activities that performs a unit of work which together perform a task.

    • Mapping data flows: Azure Data Factory lets its users create and manage graphs of data transformation logic for transforming any-sized data. The logic is executed on a Spark cluster, which does not have to be managed or maintained personally by the user.

    • Linked services: The linked services in the tool define the connection to the data source. There are various services used for two main purposes - to represent a data store that the solution supports and to represent a compute resource that can host the execution of an activity.

    • Integration runtime: The integration runtime in the tool provides the bridge between the activity and linked services needed for it.

    • Triggers: There are various types of triggers in the solution, created for different types of events. They determine when a pipeline execution should be initiated.

    • Pipeline runs: Pipeline runs are instantiated by passing the arguments to the parameters that are defined in pipelines, executing the pipelines' work.

    • Control flow: Control flow in Azure Data Factory is an orchestration of pipeline activities.

    • Connect and collect: This serves as the first step of the services that this tool offers. It connects all the required sources of data and processing in order to prepare the data for moving it to a centralized location for further processing. The step eliminates the need for companies to integrate expensive custom solutions for data movement. Through Copy Activity, Azure Blob storage, and Azure HDInsight Hadoop cluster, users can quickly initiate the first step of organizing their data.

    • Transform and enrich: The collected data can be processed or transformed by using the mapping data flows of the product. Data transformation graphs can be executed on Spark without the need to understand its clusters or how programming works.

    • CI/CD and publish: Through Azure DevOps and GitHub clients, the tool can receive full support for CI/CD for their data pipelines, which allows for the development and delivery of ETL processes before publishing the finished product.

    • Monitor: When users have successfully built and deployed their data integration pipelines, the service offers them the option to monitor the scheduled activities and pipelines. This is done through Azure Monitor, API, PowerShell, and health panels on the Azure portal.

    Azure Data Factory Benefits

    Azure Data Factory offers clients many several benefits. Some of these include:

    • An easy-to-use platform which is suitable for both beginner and expert users, as it offers code-free processes and built-in support.

    • Pay-as-you-go option for clients to pay only for the services that they are using.

    • Powerful tool with more than 90 built-in connectors, which allow companies to ingest on-premise and software as service (SaaS) data quickly.

    • Provided autonomous ETL, which unlocks operational efficiencies and citizen integrators.

    • The tool is designed to handle large volumes of data and provide users with better scalability and performance than classic ETL systems.

    • Azure Data Factory allows users to easily migrate ETL workloads to the solution’s cloud.

    • The solution offers great security for its users, as it provides the option for assigning specific permissions and roles within the organization.

    • Azure Data Factory is highly automated, which allows users to orchestrate their data more efficiently.

    • The platform is a combination of GUI and scripting-based interfaces, which gives users more freedom over data management.

    • The tool provides organizations with the option to rely on Microsoft to fully manage the process. This eliminates the potential need of hiring a third-party expert.

    Reviews from Real Users

    According to Dan M., a Chief Strategist & CTO at a consultancy, Azure Data Factory is secure and reasonably priced.

    A Senior Manager at a tech services company evaluates the tool as reasonably priced, scales well, good performance.

    IBM InfoSphere DataStage is a high-quality data integration tool that aims to design, develop, and run jobs that move and transform data for organizations of different sizes. The product works by integrating data across multiple systems through a high-performance parallel framework. It supports extended metadata management, enterprise connectivity, and integration of all types of data.

    The solution is the data integration component of IBM InfoSphere Information Server, providing a graphical framework for moving data from source systems to target systems. IBM InfoSphere DataStage can deliver data to data warehouses, data marts, operational data sources, and other enterprise applications. The tool works with various types of patterns - extract, transform and load (ETL), and extract, load, and transform (ELT). The scalability of the platform is achieved by using parallel processing and enterprise connectivity.

    The solution has various versions, catering to different types of companies, which include the Server Edition, the Enterprise Edition, and the MVS Edition. Depending on which version a company has bought, different goals can be achieved. They include the following:

    • Designing data flows to extract information from multiple sources, transform the data, and deliver it to target databases or applications.

    • Delivery of relevant and accurate data through direct connections to enterprise applications.

    • Reduction of development time and improvement of consistency through prebuilt functions.

    • Utilization of InfoSphere Information Server tools for accelerating the project delivery cycle.

    IBM InfoSphere DataStage can be deployed in various ways, including:

    • As a service: The tool can be accessed from a subscription model, where its capabilities are a part of IBM DataStage on IBM Cloud Park for Data as a Service. This option offers full management on IBM Cloud.

    • On premises or in any cloud: The two editions - IBM DataStage Enterprise and IBM DataStage Enterprise Plus - can run workloads on premises or in any cloud when added to IBM DataStage on IBM Cloud Pak for Data as a Service.

    • On premises: The basic jobs of the tool can be run on premises using IBM DataStage.

    IBM InfoSphere DataStage Features

    The tool has various features through which users can integrate and utilize their data effectively. The components of IBM InfoSphere DataStage include:

    • AI services: The tool offers services such as data science, event messaging, data warehousing, and data virtualization. It accelerates processes through artificial intelligence (AI) and offers a connection with IBM Cloud Paks - the cloud-native insight platform of the solution.

    • Parallel engine: Through this feature, ETL performance can be optimized to process data at scale. This is achieved through parallel engine and load balancing, which maximizes throughput.

    • Metadata support: This feature of the product uses the IBM Watson Knowledge Catalog to protect companies' sensitive data and monitor who can access it and at what levels.

    • Automated delivery pipelines: IBM InfoSphere DataStage reduces costs by automating continuous integration and delivery of pipelines.

    • Prebuilt connectors: The feature for prebuilt connectivity and stages allows users to move data between multiple cloud sources and data warehouses, including IBM native products.

    • IBM DataStage Flow Designer: This feature offers assistance through machine learning design. The product offers its clients a user-friendly interface which facilitates the work process.

    • IBM InfoSphere QualityStage: The tool provides a feature that automatically resolves data quality issues and increases the reliability of the delivered data.

    • Automated failure detection: Through this feature, companies can reduce infrastructure management efforts, relying on the automated detection that the tool offers.

    • Distributed data processing: Cloud runtimes can be executed remotely through this feature while maintaining its sovereignty and decreasing costs.

    IBM InfoSphere DataStage Benefits

    This solution offers many benefits for the companies that utilize it for data integration. Some of these benefits include:

    • Increased speed of workload execution due to better balancing and a parallel engine.

    • Reduction of data movement costs through integrations and seamless design of jobs.

    • Modernization of data integration by extending the capabilities of companies' data.

    • Delivery of reliable data through IBM Cloud Pak for Data.

    • Utilization of a drag-and-drop interface which assists in the delivery of data without the need for code.

    • Effective data manipulation allows data to be merged before being mapped and transformed.

    • Creating easier access of users to their data by providing visual maps of the process and the delivered data.

    Reviews from Real Users

    A data/solution architect at a computer software company says the product is robust, easy to use, has a simple error logging mechanism, and works very well for huge volumes of data.

    Tirthankar Roy Chowdhury, team leader at Tata Consultancy Services, feels the tool is user-friendly with a lot of functionalities, and doesn't require much coding because of its drag-and-drop features.

    Offer
    Learn more about Azure Data Factory
    Learn more about IBM InfoSphere DataStage
    Sample Customers
    Milliman, Pier 1 Imports, Rockwell Automation, Ziosk, Real Madrid
    Dubai Statistics Center, Etisalat Egypt
    Top Industries
    REVIEWERS
    Computer Software Company36%
    Insurance Company9%
    Manufacturing Company7%
    Financial Services Firm7%
    VISITORS READING REVIEWS
    Computer Software Company15%
    Financial Services Firm12%
    Manufacturing Company7%
    Government7%
    REVIEWERS
    Computer Software Company64%
    Healthcare Company9%
    Financial Services Firm9%
    Insurance Company9%
    VISITORS READING REVIEWS
    Financial Services Firm23%
    Computer Software Company13%
    Manufacturing Company9%
    Insurance Company8%
    Company Size
    REVIEWERS
    Small Business28%
    Midsize Enterprise19%
    Large Enterprise54%
    VISITORS READING REVIEWS
    Small Business17%
    Midsize Enterprise13%
    Large Enterprise70%
    REVIEWERS
    Small Business44%
    Midsize Enterprise6%
    Large Enterprise50%
    VISITORS READING REVIEWS
    Small Business15%
    Midsize Enterprise9%
    Large Enterprise76%
    Buyer's Guide
    Azure Data Factory vs. IBM InfoSphere DataStage
    May 2023
    Find out what your peers are saying about Azure Data Factory vs. IBM InfoSphere DataStage and other solutions. Updated: May 2023.
    708,544 professionals have used our research since 2012.

    Azure Data Factory is ranked 1st in Data Integration Tools with 49 reviews while IBM InfoSphere DataStage is ranked 13th in Data Integration Tools with 10 reviews. Azure Data Factory is rated 8.0, while IBM InfoSphere DataStage is rated 7.8. The top reviewer of Azure Data Factory writes "The good, the bad and the lots of ugly". On the other hand, the top reviewer of IBM InfoSphere DataStage writes "User-friendly with a lot of functionalities, and doesn't require much coding because of its drag-and-drop features". Azure Data Factory is most compared with Informatica PowerCenter, Microsoft Azure Synapse Analytics, Informatica Cloud Data Integration, Alteryx Designer and SSIS, whereas IBM InfoSphere DataStage is most compared with SSIS, Talend Open Studio, AWS Glue, IBM Cloud Pak for Data and Informatica PowerCenter. See our Azure Data Factory vs. IBM InfoSphere DataStage report.

    See our list of best Data Integration Tools vendors.

    We monitor all Data Integration Tools reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.