Apache Hadoop vs Azure Data Factory comparison

Cancel
You must select at least 2 products to compare!
Apache Logo
2,765 views|2,378 comparisons
Microsoft Logo
8,490 views|6,535 comparisons
Comparison Buyer's Guide
Executive Summary

We performed a comparison between Apache Hadoop and Azure Data Factory based on real PeerSpot user reviews.

Find out in this report how the two Cloud Data Warehouse solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI.
To learn more, read our detailed Apache Hadoop vs. Azure Data Factory Report (Updated: March 2024).
765,386 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"I liked that Apache Hadoop was powerful, had a lot of tools, and the fact that it was free and community-developed.""It's open-source, so it's very cost-effective.""The ability to add multiple nodes without any restriction is the solution's most valuable aspect.""Hadoop File System is compatible with almost all the query engines.""The most important feature is its ability to handle large volumes. Some of our customers have really large volumes, and it is capable of handling their data in terms of the core volume and daily incremental volume. So, its processing power and speed are most valuable.""​​Data ingestion: It has rapid speed, if Apache Accumulo is used.""It is a file system for data collection. There are nodes in this cluster that contain all the information, directories, and other files. The nodes are based on the MySQL database.""The most valuable features are powerful tools for ingestion, as data is in multiple systems."

More Apache Hadoop Pros →

"The solution can scale very easily.""Powerful but easy-to-use and intuitive.""It has built-in connectors for more than 100 sources and onboarding data from many different sources to the cloud environment.""The most valuable features of Azure Data Factory are the flexibility, ability to move data at scale, and the integrations with different Azure components.""We use the solution to move data from on-premises to the cloud.""I like how you can create your own pipeline in your space and reuse those creations. You can collaborate with other people who want to use your code.""ADF is another ETL tool similar to Informatica that can transform data or copy it from on-prem to the cloud or vice versa. Once we have the data, we can apply various transformations to it and schedule our pipeline according to our business needs. ADF integrates with Databricks. We can call our Databricks notebooks and schedule them via ADF.""From what we have seen so far, the solution seems very stable."

More Azure Data Factory Pros →

Cons
"The main thing is the lack of community support. If you want to implement a new API or create a new file system, you won't find easy support.""From the Apache perspective or the open-source community, they need to add more capabilities to make life easier from a configuration and deployment perspective.""The upgrade path should be improved because it is not as easy as it should be.""I think more of the solution needs to be focused around the panel processing and retrieval of data.""Hadoop's security could be better.""The solution is not easy to use. The solution should be easy to use and suitable for almost any case connected with the use of big data or working with large amounts of data.""There is a lack of virtualization and presentation layers, so you can't take it and implement it like a radio solution.""General installation/dependency issues were there, but were not a major, complex issue. While migrating data from MySQL to Hive, things are a little challenging, but we were able to get through that with support from forums and a little trial and error."

More Apache Hadoop Cons →

"The thing we missed most was data update, but this is now available as of two weeks ago.""If the user interface was more user friendly and there was better error feedback, it would be helpful.""There's no Oracle connector if you want to do transformation using data flow activity, so Azure Data Factory needs more connectors for data flow transformation.""My only problem is the seamless connectivity with various other databases, for example, SAP.""There is no built-in pipeline exit activity when encountering an error.""The initial setup is not very straightforward.""They require more detailed error reporting, data normalization tools, easier connectivity to other services, more data services, and greater compatibility with other commonly used schemas.""You cannot use a custom data delimiter, which means that you have problems receiving data in certain formats."

More Azure Data Factory Cons →

Pricing and Cost Advice
  • "Do take into consider that data storage and compute capacity scale differently and hence purchasing a "boxed" / 'all-in-one" solution (software and hardware) might not be the best idea."
  • "​There are no licensing costs involved, hence money is saved on the software infrastructure​."
  • "This is a low cost and powerful solution."
  • "The price of Apache Hadoop could be less expensive."
  • "If my company can use the cloud version of Apache Hadoop, particularly the cloud storage feature, it would be easier and would cost less because an on-premises deployment has a higher cost during storage, for example, though I don't know exactly how much Apache Hadoop costs."
  • "We don't directly pay for it. Our clients pay for it, and they usually don't complain about the price. So, it is probably acceptable."
  • "The price could be better. Hortonworks no longer exists, and Cloudera killed the free version of Hadoop."
  • "We just use the free version."
  • More Apache Hadoop Pricing and Cost Advice →

  • "In terms of licensing costs, we pay somewhere around S14,000 USD per month. There are some additional costs. For example, we would have to subscribe to some additional computing and for elasticity, but they are minimal."
  • "This is a cost-effective solution."
  • "The price you pay is determined by how much you use it."
  • "Understanding the pricing model for Data Factory is quite complex."
  • "I would not say that this product is overly expensive."
  • "The licensing is a pay-as-you-go model, where you pay for what you consume."
  • "Our licensing fees are approximately 15,000 ($150 USD) per month."
  • "The licensing cost is included in the Synapse."
  • More Azure Data Factory Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Cloud Data Warehouse solutions are best for your needs.
    765,386 professionals have used our research since 2012.
    Questions from the Community
    Top Answer:Hadoop File System is compatible with almost all the query engines.
    Top Answer:The tool provides functionalities to deal with data skewness or a diverse set of data. There are some configurations that it usually provides. In certain cases, the configurations for dealing with… more »
    Top Answer:AWS Glue and Azure Data factory for ELT best performance cloud services.
    Top Answer:Azure Data Factory is flexible, modular, and works well. In terms of cost, it is not too pricey. It offers the stability and reliability I am looking for, good scalability, and is easy to set up and… more »
    Top Answer:Azure Data Factory is a solid product offering many transformation functions; It has pre-load and post-load transformations, allowing users to apply transformations either in code by using Power… more »
    Ranking
    5th
    out of 33 in Data Warehouse
    Views
    2,765
    Comparisons
    2,378
    Reviews
    10
    Average Words per Review
    539
    Rating
    8.0
    3rd
    Views
    8,490
    Comparisons
    6,535
    Reviews
    40
    Average Words per Review
    495
    Rating
    8.0
    Comparisons
    Learn More
    Overview
    The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

    Azure Data Factory efficiently manages and integrates data from various sources, enabling seamless movement and transformation across platforms. Its valuable features include seamless integration with Azure services, handling large data volumes, flexible transformation, user-friendly interface, extensive connectors, and scalability. Users have experienced improved team performance, workflow simplification, enhanced collaboration, streamlined processes, and boosted productivity.

    Sample Customers
    Amazon, Adobe, eBay, Facebook, Google, Hulu, IBM, LinkedIn, Microsoft, Spotify, AOL, Twitter, University of Maryland, Yahoo!, Cornell University Web Lab
    1. Adobe 2. BMW 3. Coca-Cola 4. General Electric 5. Johnson & Johnson 6. LinkedIn 7. Mastercard 8. Nestle 9. Pfizer 10. Samsung 11. Siemens 12. Toyota 13. Unilever 14. Verizon 15. Walmart 16. Accenture 17. American Express 18. AT&T 19. Bank of America 20. Cisco 21. Deloitte 22. ExxonMobil 23. Ford 24. General Motors 25. IBM 26. JPMorgan Chase 27. Microsoft (Azure Data Factory is developed by Microsoft) 28. Oracle 29. Procter & Gamble 30. Salesforce 31. Shell 32. Visa
    Top Industries
    REVIEWERS
    Financial Services Firm40%
    Comms Service Provider27%
    Hospitality Company7%
    Consumer Goods Company7%
    VISITORS READING REVIEWS
    Financial Services Firm27%
    Computer Software Company10%
    Comms Service Provider6%
    University6%
    REVIEWERS
    Computer Software Company35%
    Insurance Company12%
    Manufacturing Company8%
    Financial Services Firm8%
    VISITORS READING REVIEWS
    Computer Software Company13%
    Financial Services Firm13%
    Manufacturing Company8%
    Healthcare Company7%
    Company Size
    REVIEWERS
    Small Business35%
    Midsize Enterprise24%
    Large Enterprise41%
    VISITORS READING REVIEWS
    Small Business15%
    Midsize Enterprise10%
    Large Enterprise75%
    REVIEWERS
    Small Business28%
    Midsize Enterprise19%
    Large Enterprise53%
    VISITORS READING REVIEWS
    Small Business18%
    Midsize Enterprise13%
    Large Enterprise70%
    Buyer's Guide
    Apache Hadoop vs. Azure Data Factory
    March 2024
    Find out what your peers are saying about Apache Hadoop vs. Azure Data Factory and other solutions. Updated: March 2024.
    765,386 professionals have used our research since 2012.

    Apache Hadoop is ranked 5th in Data Warehouse with 31 reviews while Azure Data Factory is ranked 3rd in Cloud Data Warehouse with 79 reviews. Apache Hadoop is rated 7.8, while Azure Data Factory is rated 8.0. The top reviewer of Apache Hadoop writes "A file system for data collection that contains needed information and files". On the other hand, the top reviewer of Azure Data Factory writes "The data factory agent is quite good but pricing needs to be more transparent". Apache Hadoop is most compared with Microsoft Azure Synapse Analytics, Oracle Exadata, Snowflake, Teradata and BigQuery, whereas Azure Data Factory is most compared with Informatica PowerCenter, Alteryx Designer, Informatica Cloud Data Integration, Snowflake and Microsoft Azure Synapse Analytics. See our Apache Hadoop vs. Azure Data Factory report.

    See our list of best Cloud Data Warehouse vendors.

    We monitor all Cloud Data Warehouse reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.