We performed a comparison between Azure Data Factory and IBM Infosphere DataStage based on our users’ reviews in four categories. After reading all of the collected data, you can find our conclusion below.
Comparison Results: Azure Data Factory is mature, robust, and consistent. The built-in connectors of more than 100 sources and onboarding data from many different sources to the cloud environment make it easier for users to better understand the data flow. Users are happier with its pricing as well. Once IBM Infosphere DataStage moves toward a focus on cloud technologies, it will become a more desirable solution in today’s cloud-focused marketplace.
"I like how you can create your own pipeline in your space and reuse those creations. You can collaborate with other people who want to use your code."
"The most important feature is that it can help you do the multi-threading concepts."
"The feature I found most helpful in Azure Data Factory is the pipeline feature, including being able to connect to different sources. Azure Data Factory also has built-in security, which is another valuable feature."
"The trigger scheduling options are decently robust."
"It's cloud-based, allowing multiple users to easily access the solution from the office or remote locations. I like that we can set up the security protocols for IP addresses, like allow lists. It's a pretty user-friendly product as well. The interface and build environment where you create pipelines are easy to use. It's straightforward to manage the digital transformation pipelines we build."
"We haven't had any issues connecting it to other products."
"The data mapping and the ability to systematically derive data are nice features. It worked really well for the solution we had. It is visual, and it did the transformation as we wanted."
"The most valuable feature of Azure Data Factory is that it has a good combination of flexibility, fine-tuning, automation, and good monitoring."
"The most valuable feature of the solution is the ability to incorporate very complex business rules in Data Stage."
"The best feature of IBM InfoSphere DataStage for me was that it was very much user-friendly. The solution didn't require that much raw coding because most of its features were drag and drop, plus it had a large number of functionalities."
"It works with multiple servers and offers high availability."
"It is quite useful and powerful."
"Offers great flexibility."
"The performance optimization is quite good in DataStage. It provides parallelism and pipelining mechanisms"
"We like the flexibility of modeling."
"The solution is stable."
"This solution is currently only useful for basic data movement and file extractions, which we would like to see developed to handle more complex data transformations."
"Data Factory's performance during heavy data processing isn't great."
"We require Azure Data Factory to be able to connect to Google Analytics."
"I have not found any real shortcomings within the product."
"Lacks in-built streaming data processing."
"There's space for improvement in the development process of the data pipelines."
"There's no Oracle connector if you want to do transformation using data flow activity, so Azure Data Factory needs more connectors for data flow transformation."
"There is always room to improve. There should be good examples of use that, of course, customers aren't always willing to share. It is Catch-22. It would help the user base if everybody had really good examples of deployments that worked, but when you ask people to put out their good deployments, which also includes me, you usually got, "No, I'm not going to do that." They don't have enough good examples. Microsoft probably just needs to pay one of their partners to build 20 or 30 examples of functional Data Factories and then share them as a user base."
"It takes a lot of time to actually trigger your job and then go into the logs and other stuff. So all of this is really time-consuming."
"What needs improvement in IBM InfoSphere DataStage is its pricing. The pricing for the solution is higher than its competitors, so a lot of the clients my company has worked with prefer other tools over IBM InfoSphere DataStage because of the high price tag. Another area for improvement in the solution stems from a lot of new types of databases, for example, databases in the cloud and big data have become available, and IBM InfoSphere DataStage is working on various connectors for different data sources, but that still isn't up-to-date, meaning that some connectors are missing for modern data sources. The latest version of IBM InfoSphere DataStage also has a complex architecture, so my team faced frequent outages and that should be improved as well."
"Their web interface is good but the on-prem sites are outdated. The solution could also be improved if they could integrate the data pipeline scheduling part of their interface."
"Currently lacking virtualization ability."
"The error messaging needs to be improved."
"The solution can be a bit more user-friendly, similar to Informatica."
"In the future, I would like to see more integration with cloud technologies."
"It would be useful to provide support for Python, AR, and Java."
Azure Data Factory is a managed cloud service built for extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects. This is a digital integration tool as well as a cloud data warehouse that allows users to create, schedule, and manage data in the cloud or on premises. The use cases of the product include data engineering, operational data integration, analytics, ingesting data into data warehouses, and migrating on-premise SQL Server Integration Services (SSIS) packages to Azure.
The tool allows users to create data-driven workflows for initiating data movement and data transformation at scale. Data can be ingested from disparate data stores via pipelines. Companies can utilize this product to build complex ETL processes for transforming data visually with data flows. Azure Data Factory also offers services such as Azure HDInsight Hadoop, Azure Databricks, Azure Synapse Analytics, and Azure SQL Database. These services are created to facilitate data management and control for organizations, providing them with better visibility of their data for improved decision-making.
Azure Data Factory allows companies to create schedules for moving and transforming data into their pipelines. This can be done hourly, daily, weekly, or according to the specific needs of the organization. The steps through which the data-driven workflows work in Azure Data Factory are the following:
1. Connecting to required sources and collecting data. After connecting to the various sources where data is stored, the pipelines move the data to a centralized location for further processing.
2. Transforming and enriching the data. Once the data is moved to a centralized data store in the cloud, the pipelines transform it through services like HDInsight Hadoop, Azure Data Lake Analytics, Spark, and Machine Learning.
3. Delivering the transformed data to on-premise sources or keeping it in cloud storage sources for usage by different tools and applications.
Azure Data Factory Concepts
The solution consists of a series of interconnected systems that provide data integration and related services for users. The following concepts create the end product for users:
Azure Data Factory Benefits
Azure Data Factory offers clients many several benefits. Some of these include:
Reviews from Real Users
According to Dan M., a Chief Strategist & CTO at a consultancy, Azure Data Factory is secure and reasonably priced.
A Senior Manager at a tech services company evaluates the tool as reasonably priced, scales well, good performance.
IBM InfoSphere DataStage is a high-quality data integration tool that aims to design, develop, and run jobs that move and transform data for organizations of different sizes. The product works by integrating data across multiple systems through a high-performance parallel framework. It supports extended metadata management, enterprise connectivity, and integration of all types of data.
The solution is the data integration component of IBM InfoSphere Information Server, providing a graphical framework for moving data from source systems to target systems. IBM InfoSphere DataStage can deliver data to data warehouses, data marts, operational data sources, and other enterprise applications. The tool works with various types of patterns - extract, transform and load (ETL), and extract, load, and transform (ELT). The scalability of the platform is achieved by using parallel processing and enterprise connectivity.
The solution has various versions, catering to different types of companies, which include the Server Edition, the Enterprise Edition, and the MVS Edition. Depending on which version a company has bought, different goals can be achieved. They include the following:
IBM InfoSphere DataStage can be deployed in various ways, including:
IBM InfoSphere DataStage Features
The tool has various features through which users can integrate and utilize their data effectively. The components of IBM InfoSphere DataStage include:
IBM InfoSphere DataStage Benefits
This solution offers many benefits for the companies that utilize it for data integration. Some of these benefits include:
Reviews from Real Users
A data/solution architect at a computer software company says the product is robust, easy to use, has a simple error logging mechanism, and works very well for huge volumes of data.
Tirthankar Roy Chowdhury, team leader at Tata Consultancy Services, feels the tool is user-friendly with a lot of functionalities, and doesn't require much coding because of its drag-and-drop features.
Azure Data Factory is ranked 1st in Data Integration Tools with 49 reviews while IBM InfoSphere DataStage is ranked 13th in Data Integration Tools with 10 reviews. Azure Data Factory is rated 8.0, while IBM InfoSphere DataStage is rated 7.8. The top reviewer of Azure Data Factory writes "The good, the bad and the lots of ugly". On the other hand, the top reviewer of IBM InfoSphere DataStage writes "User-friendly with a lot of functionalities, and doesn't require much coding because of its drag-and-drop features". Azure Data Factory is most compared with Informatica PowerCenter, Microsoft Azure Synapse Analytics, Informatica Cloud Data Integration, Alteryx Designer and SSIS, whereas IBM InfoSphere DataStage is most compared with SSIS, Talend Open Studio, AWS Glue, IBM Cloud Pak for Data and Informatica PowerCenter. See our Azure Data Factory vs. IBM InfoSphere DataStage report.
See our list of best Data Integration Tools vendors.
We monitor all Data Integration Tools reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.