Azure Data Factory vs IBM Cloud Pak for Data comparison

Cancel
You must select at least 2 products to compare!
StreamSets Logo
5,913 views|3,583 comparisons
Microsoft Logo
37,092 views|30,088 comparisons
IBM Logo
4,206 views|2,535 comparisons
Comparison Buyer's Guide
Executive Summary

We performed a comparison between Azure Data Factory and IBM Cloud Pak for Data based on real PeerSpot user reviews.

Find out in this report how the two Data Integration Tools solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI.
To learn more, read our detailed Azure Data Factory vs. IBM Cloud Pak for Data Report (Updated: November 2022).
655,711 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"In StreamSets, everything is in one place.""It is a very powerful, modern data analytics solution, in which you can integrate a large volume of data from different sources. It integrates all of the data and you can design, create, and monitor pipelines according to your requirements. It is an all-in-one day data ops solution.""StreamSets data drift feature gives us an alert upfront so we know that the data can be ingested. Whatever the schema or data type changes, it lands automatically into the data lake without any intervention from us, but then that information is crucial to fix for downstream pipelines, which process the data into models, like Tableau and Power BI models. This is actually very useful for us. We are already seeing benefits. Our pipelines used to break when there were data drift changes, then we needed to spend about a week fixing it. Right now, we are saving one to two weeks. Though, it depends on the complexity of the pipeline, we are definitely seeing a lot of time being saved.""StreamSets’ data drift resilience has reduced the time it takes us to fix data drift breakages. For example, in our previous Hadoop scenario, when we were creating the Sqoop-based processes to move data from source to destinations, we were getting the job done. That took approximately an hour to an hour and a half when we did it with Hadoop. However, with the StreamSets, since it works on a data collector-based mechanism, it completes the same process in 15 minutes of time. Therefore, it has saved us around 45 minutes per data pipeline or table that we migrate. Thus, it reduced the data transfer, including the drift part, by 45 minutes.""I have used Data Collector, Transformer, and Control Hub products from StreamSets. What I really like about these products is that they're very user-friendly. People who are not from a technological or core development background find it easy to get started and build data pipelines and connect to the databases. They would be comfortable like any technical person within a couple of weeks."

More StreamSets Pros →

"I think it makes it very easy to understand what data flow is and so on. You can leverage the user interface to do the different data flows, and it's great. I like it a lot.""An excellent tool for pipeline orchestration.""The trigger scheduling options are decently robust.""One of the most valuable features of Azure Data Factory is the drag-and-drop interface. This helps with workflow management because we can just drag any tables or data sources we need. Because of how easy it is to drag and drop, we can deliver things very quickly. It's more customizable through visual effect.""In terms of my personal experience, it works fine.""The data mapping and the ability to systematically derive data are nice features. It worked really well for the solution we had. It is visual, and it did the transformation as we wanted.""It's cloud-based, allowing multiple users to easily access the solution from the office or remote locations. I like that we can set up the security protocols for IP addresses, like allow lists. It's a pretty user-friendly product as well. The interface and build environment where you create pipelines are easy to use. It's straightforward to manage the digital transformation pipelines we build.""I enjoy the ease of use for the backend JSON generator, the deployment solution, and the template management."

More Azure Data Factory Pros →

"One of Cloud Pak's best features is the Watson Knowledge Catalog, which helps you implement data governance.""What I found most helpful in IBM Cloud Pak for Data is containerization, which means it's easy to shift and leave in terms of moving to other clouds. That's an advantage of IBM Cloud Pak for Data.""The most valuable features of IBM Cloud Pak for Data are the Watson Studio, where we can initiate more groups and write code. Additionally, Watson Machine Learning is available with many other services, such as APIs which you can plug the machine learning models."

More IBM Cloud Pak for Data Pros →

Cons
"The logging mechanism could be improved. If I am working on a pipeline, then create a job out of it and it is running, it will generate constant logs. So, the logging mechanism could be simplified. Now, it is a bit difficult to understand and filter the logs. It takes some time.""We create pipelines or jobs in StreamSets Control Hub. It is a great feature, but if there is a way to have a folder structure or organize the pipelines and jobs in Control Hub, it would be great. I submitted a ticket for this some time back.""Sometimes, when we have large amounts of data that is very efficiently stored in Hadoop or Kafka, it is not very efficient to run it through StreamSets, due to the lack of efficiency or the resources that StreamSets is using.""If you use JDBC Lookup, for example, it generally takes a long time to process data.""Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful."

More StreamSets Cons →

"They require more detailed error reporting, data normalization tools, easier connectivity to other services, more data services, and greater compatibility with other commonly used schemas.""The one element of the solution that we have used and could be improved is the user interface.""Snowflake connectivity was recently added and if the vendor provided some videos on how to create data then that would be helpful.""Azure Data Factory uses many resources and has issues with parallel workflows.""We are too early into the entire cycle for us to really comment on what problems we face. We're mostly using it for transformations, like ETL tasks. I think we are comfortable with the facts or the facts setting. But for other parts, it is too early to comment on.""When the record fails, it's tough to identify and log.""Azure Data Factory should be cheaper to move data to a data center abroad for calamities in case of disasters.""The deployment should be easier."

More Azure Data Factory Cons →

"One thing that bugs me is how much infrastructure Cloud Pak requires for the initial deployment. It doesn't allow you to start small. The smallest permitted deployment is too big. It's a huge problem that prevents us from implementing the solution in many scenarios.""One challenge I'm facing with IBM Cloud Pak for Data is native features have been decommissioned, such as XML input and output. Too many changes have been made, and my company has around one hundred thousand mappings, so my team has been putting more effort into alternative ways to do things. Another area for improvement in IBM Cloud Pak for Data is that it's more complicated to shift from on-premise to the cloud. Other vendors provide secure agents that easily connect with your existing setup. Still, with IBM Cloud Pak for Data, you have to perform connection migration steps, upgrade to the latest version, etc., which makes it more complicated, especially as my company has XML-based mappings. Still, the XML input and output capabilities of IBM Cloud Pak for Data have been discontinued, so I'd like IBM to bring that back.""There is a solution that is part of IBM Cloud Pak for Data called Watson OpenScale. It is used to monitor the deployed models for the quality and fairness of the results. This is one area that needs a lot of improvement."

More IBM Cloud Pak for Data Cons →

Pricing and Cost Advice
  • "StreamSets Data Collector is open source. One can utilize the StreamSets Data Collector, but the Control Hub is the main repository where all the jobs are present. Everything happens in Control Hub."
  • "It has a CPU core-based licensing, which works for us and is quite good."
  • "There are different versions of the product. One is the corporate license version, and the other one is the open-source or free version. I have been using the corporate license version, but they have recently launched a new open-source version so that anybody can create an account and use it. The licensing cost varies from customer to customer. I don't have a lot of input on that. It is taken care of by PMO, and they seem fine with its pricing model. It is being used enterprise-wide. They seem to have got a good deal for StreamSets."
  • "The pricing is good, but not the best. They have some customized plans you can opt for."
  • More StreamSets Pricing and Cost Advice →

  • "I would not say that this product is overly expensive."
  • "The licensing is a pay-as-you-go model, where you pay for what you consume."
  • "Our licensing fees are approximately 15,000 ($150 USD) per month."
  • "The licensing cost is included in the Synapse."
  • "It's not particularly expensive."
  • "Product is priced at the market standard."
  • "There's no licensing for Azure Data Factory, they have a consumption payment model. How often you are running the service and how long that service takes to run. The price can be approximately $500 to $1,000 per month but depends on the scaling."
  • "I don't see a cost; it appears to be included in general support."
  • More Azure Data Factory Pricing and Cost Advice →

  • "I don't have the exact licensing cost for IBM Cloud Pak for Data, as my company is still finalizing requirements, including monthly, yearly, and three-year licensing fees. Still, on a scale of one to five, I'd rate it a three because, compared to other vendors, it's more complicated."
  • More IBM Cloud Pak for Data Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Data Integration Tools solutions are best for your needs.
    655,711 professionals have used our research since 2012.
    Questions from the Community
    Top Answer:It is really easy to set up and the interface is easy to use.
    Top Answer:We've seen a couple of cases where it appears to have a memory leak or a similar problem. It grows for a bit and then… more »
    Top Answer:We typically use it to transport our Oracle raw datasets up to Microsoft Azure, and then into SQL databases there.
    Top Answer:AWS Glue and Azure Data factory for ELT best performance cloud services.
    Top Answer:Azure Data Factory is flexible, modular, and works well. In terms of cost, it is not too pricey. It offers the stability… more »
    Top Answer:Azure Data Factory is a solid product offering many transformation functions; It has pre-load and post-load… more »
    Top Answer:The most valuable features are data virtualization and reporting.
    Top Answer:I think that this product is too expensive for smaller companies.
    Top Answer:The utilization of system resources is high. The technical support could be a little better. Having a "lite" version for… more »
    Comparisons
    Also Known As
    Cloud Pak for Data
    Learn More
    StreamSets
    Video Not Available
    Overview

    StreamSets offers an end-to-end data integration platform to build, run, monitor and manage smart data pipelines that deliver continuous data for DataOps, and power the modern data ecosystem and hybrid integration.

    Only StreamSets provides a single design experience for all design patterns for 10x greater developer productivity; smart data pipelines that are resilient to change for 80% less breakages; and a single pane of glass for managing and monitoring all pipelines across hybrid and cloud architectures to eliminate blind spots and control gaps.

    With StreamSets, you can deliver the continuous data that drives the connected enterprise.

    Azure Data Factory is a managed cloud service built for extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects. This is a digital integration tool as well as a cloud data warehouse that allows users to create, schedule, and manage data in the cloud or on premises. The use cases of the product include data engineering, operational data integration, analytics, ingesting data into data warehouses, and migrating on-premise SQL Server Integration Services (SSIS) packages to Azure.

    The tool allows users to create data-driven workflows for initiating data movement and data transformation at scale. Data can be ingested from disparate data stores via pipelines. Companies can utilize this product to build complex ETL processes for transforming data visually with data flows. Azure Data Factory also offers services such as Azure HDInsight Hadoop, Azure Databricks, Azure Synapse Analytics, and Azure SQL Database. These services are created to facilitate data management and control for organizations, providing them with better visibility of their data for improved decision-making.

    Azure Data Factory allows companies to create schedules for moving and transforming data into their pipelines. This can be done hourly, daily, weekly, or according to the specific needs of the organization. The steps through which the data-driven workflows work in Azure Data Factory are the following:

    1. Connecting to required sources and collecting data. After connecting to the various sources where data is stored, the pipelines move the data to a centralized location for further processing.

    2. Transforming and enriching the data. Once the data is moved to a centralized data store in the cloud, the pipelines transform it through services like HDInsight Hadoop, Azure Data Lake Analytics, Spark, and Machine Learning.

    3. Delivering the transformed data to on-premise sources or keeping it in cloud storage sources for usage by different tools and applications.

    Azure Data Factory Concepts

    The solution consists of a series of interconnected systems that provide data integration and related services for users. The following concepts create the end product for users:

    • Pipelines: A pipeline refers to the logical grouping of activities that performs a unit of work which together perform a task.

    • Mapping data flows: Azure Data Factory lets its users create and manage graphs of data transformation logic for transforming any-sized data. The logic is executed on a Spark cluster, which does not have to be managed or maintained personally by the user.

    • Linked services: The linked services in the tool define the connection to the data source. There are various services used for two main purposes - to represent a data store that the solution supports and to represent a compute resource that can host the execution of an activity.

    • Integration runtime: The integration runtime in the tool provides the bridge between the activity and linked services needed for it.

    • Triggers: There are various types of triggers in the solution, created for different types of events. They determine when a pipeline execution should be initiated.

    • Pipeline runs: Pipeline runs are instantiated by passing the arguments to the parameters that are defined in pipelines, executing the pipelines' work.

    • Control flow: Control flow in Azure Data Factory is an orchestration of pipeline activities.

    • Connect and collect: This serves as the first step of the services that this tool offers. It connects all the required sources of data and processing in order to prepare the data for moving it to a centralized location for further processing. The step eliminates the need for companies to integrate expensive custom solutions for data movement. Through Copy Activity, Azure Blob storage, and Azure HDInsight Hadoop cluster, users can quickly initiate the first step of organizing their data.

    • Transform and enrich: The collected data can be processed or transformed by using the mapping data flows of the product. Data transformation graphs can be executed on Spark without the need to understand its clusters or how programming works.

    • CI/CD and publish: Through Azure DevOps and GitHub clients, the tool can receive full support for CI/CD for their data pipelines, which allows for the development and delivery of ETL processes before publishing the finished product.

    • Monitor: When users have successfully built and deployed their data integration pipelines, the service offers them the option to monitor the scheduled activities and pipelines. This is done through Azure Monitor, API, PowerShell, and health panels on the Azure portal.

    Azure Data Factory Benefits

    Azure Data Factory offers clients many several benefits. Some of these include:

    • An easy-to-use platform which is suitable for both beginner and expert users, as it offers code-free processes and built-in support.

    • Pay-as-you-go option for clients to pay only for the services that they are using.

    • Powerful tool with more than 90 built-in connectors, which allow companies to ingest on-premise and software as service (SaaS) data quickly.

    • Provided autonomous ETL, which unlocks operational efficiencies and citizen integrators.

    • The tool is designed to handle large volumes of data and provide users with better scalability and performance than classic ETL systems.

    • Azure Data Factory allows users to easily migrate ETL workloads to the solution’s cloud.

    • The solution offers great security for its users, as it provides the option for assigning specific permissions and roles within the organization.

    • Azure Data Factory is highly automated, which allows users to orchestrate their data more efficiently.

    • The platform is a combination of GUI and scripting-based interfaces, which gives users more freedom over data management.

    • The tool provides organizations with the option to rely on Microsoft to fully manage the process. This eliminates the potential need of hiring a third-party expert.

    Reviews from Real Users

    According to Dan M., a Chief Strategist & CTO at a consultancy, Azure Data Factory is secure and reasonably priced.

    A Senior Manager at a tech services company evaluates the tool as reasonably priced, scales well, good performance.

    IBM Cloud Pak® for Data is a fully-integrated data and AI platform that modernizes how businesses collect, organize and analyze data to infuse AI throughout their organizations. Cloud-native by design, the platform unifies market-leading services spanning the entire analytics lifecycle. From data management, DataOps, governance, business analytics and automated AI, IBM Cloud Pak for Data helps eliminate the need for costly, and often competing, point solutions while providing the information architecture you need to implement AI successfully.

    Building on the streamlined hybrid-cloud foundation of Red Hat® OpenShift®, IBM Cloud Pak for Data takes advantage of the underlying resource and infrastructure optimization and management. The solution fully supports multicloud environments such as Amazon Web Services (AWS), Azure, Google Cloud, IBM Cloud™ and private cloud deployments. Find out how IBM Cloud Pak for Data can lower your total cost of ownership and accelerate innovation.

    Offer
    Learn more about StreamSets
    Learn more about Azure Data Factory
    Learn more about IBM Cloud Pak for Data
    Sample Customers
    Availity, BT Group, Humana, Deluxe, GSK, RingCentral, IBM, Shell, SamTrans, State of Ohio, TalentFulfilled, TechBridge
    Milliman, Pier 1 Imports, Rockwell Automation, Ziosk, Real Madrid
    Qatar Development Bank, GuideWell, Skanderborg Music Festival
    Top Industries
    VISITORS READING REVIEWS
    Financial Services Firm17%
    Computer Software Company14%
    Manufacturing Company7%
    Insurance Company7%
    REVIEWERS
    Computer Software Company35%
    Non Profit9%
    Manufacturing Company9%
    Insurance Company6%
    VISITORS READING REVIEWS
    Computer Software Company19%
    Financial Services Firm11%
    Comms Service Provider8%
    Energy/Utilities Company7%
    VISITORS READING REVIEWS
    Computer Software Company19%
    Financial Services Firm15%
    Government10%
    Comms Service Provider9%
    Company Size
    REVIEWERS
    Small Business13%
    Midsize Enterprise38%
    Large Enterprise50%
    VISITORS READING REVIEWS
    Small Business15%
    Midsize Enterprise11%
    Large Enterprise74%
    REVIEWERS
    Small Business26%
    Midsize Enterprise22%
    Large Enterprise52%
    VISITORS READING REVIEWS
    Small Business17%
    Midsize Enterprise13%
    Large Enterprise70%
    VISITORS READING REVIEWS
    Small Business15%
    Midsize Enterprise11%
    Large Enterprise74%
    Buyer's Guide
    Azure Data Factory vs. IBM Cloud Pak for Data
    November 2022
    Find out what your peers are saying about Azure Data Factory vs. IBM Cloud Pak for Data and other solutions. Updated: November 2022.
    655,711 professionals have used our research since 2012.

    Azure Data Factory is ranked 1st in Data Integration Tools with 40 reviews while IBM Cloud Pak for Data is ranked 24th in Data Integration Tools with 3 reviews. Azure Data Factory is rated 7.8, while IBM Cloud Pak for Data is rated 8.0. The top reviewer of Azure Data Factory writes "There's the good, the bad and the ugly....unfortunately lots of ugly". On the other hand, the top reviewer of IBM Cloud Pak for Data writes "Plenty of features, multiple services available, but machine learning could improve". Azure Data Factory is most compared with Informatica PowerCenter, Microsoft Azure Synapse Analytics, Informatica Cloud Data Integration, Alteryx Designer and Oracle Data Integrator (ODI), whereas IBM Cloud Pak for Data is most compared with IBM InfoSphere DataStage, Palantir Foundry, IBM InfoSphere Information Server, Denodo and Talend Data Fabric. See our Azure Data Factory vs. IBM Cloud Pak for Data report.

    See our list of best Data Integration Tools vendors.

    We monitor all Data Integration Tools reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.