We performed a comparison between Apache Hadoop and Azure Data Factory based on real PeerSpot user reviews.
Find out in this report how the two Cloud Data Warehouse solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."I liked that Apache Hadoop was powerful, had a lot of tools, and the fact that it was free and community-developed."
"It's open-source, so it's very cost-effective."
"The ability to add multiple nodes without any restriction is the solution's most valuable aspect."
"Hadoop File System is compatible with almost all the query engines."
"The most important feature is its ability to handle large volumes. Some of our customers have really large volumes, and it is capable of handling their data in terms of the core volume and daily incremental volume. So, its processing power and speed are most valuable."
"Data ingestion: It has rapid speed, if Apache Accumulo is used."
"It is a file system for data collection. There are nodes in this cluster that contain all the information, directories, and other files. The nodes are based on the MySQL database."
"The most valuable features are powerful tools for ingestion, as data is in multiple systems."
"The solution can scale very easily."
"Powerful but easy-to-use and intuitive."
"It has built-in connectors for more than 100 sources and onboarding data from many different sources to the cloud environment."
"The most valuable features of Azure Data Factory are the flexibility, ability to move data at scale, and the integrations with different Azure components."
"We use the solution to move data from on-premises to the cloud."
"I like how you can create your own pipeline in your space and reuse those creations. You can collaborate with other people who want to use your code."
"ADF is another ETL tool similar to Informatica that can transform data or copy it from on-prem to the cloud or vice versa. Once we have the data, we can apply various transformations to it and schedule our pipeline according to our business needs. ADF integrates with Databricks. We can call our Databricks notebooks and schedule them via ADF."
"From what we have seen so far, the solution seems very stable."
"The main thing is the lack of community support. If you want to implement a new API or create a new file system, you won't find easy support."
"From the Apache perspective or the open-source community, they need to add more capabilities to make life easier from a configuration and deployment perspective."
"The upgrade path should be improved because it is not as easy as it should be."
"I think more of the solution needs to be focused around the panel processing and retrieval of data."
"Hadoop's security could be better."
"The solution is not easy to use. The solution should be easy to use and suitable for almost any case connected with the use of big data or working with large amounts of data."
"There is a lack of virtualization and presentation layers, so you can't take it and implement it like a radio solution."
"General installation/dependency issues were there, but were not a major, complex issue. While migrating data from MySQL to Hive, things are a little challenging, but we were able to get through that with support from forums and a little trial and error."
"The thing we missed most was data update, but this is now available as of two weeks ago."
"If the user interface was more user friendly and there was better error feedback, it would be helpful."
"There's no Oracle connector if you want to do transformation using data flow activity, so Azure Data Factory needs more connectors for data flow transformation."
"My only problem is the seamless connectivity with various other databases, for example, SAP."
"There is no built-in pipeline exit activity when encountering an error."
"The initial setup is not very straightforward."
"They require more detailed error reporting, data normalization tools, easier connectivity to other services, more data services, and greater compatibility with other commonly used schemas."
"You cannot use a custom data delimiter, which means that you have problems receiving data in certain formats."
Apache Hadoop is ranked 5th in Data Warehouse with 31 reviews while Azure Data Factory is ranked 3rd in Cloud Data Warehouse with 79 reviews. Apache Hadoop is rated 7.8, while Azure Data Factory is rated 8.0. The top reviewer of Apache Hadoop writes "A file system for data collection that contains needed information and files". On the other hand, the top reviewer of Azure Data Factory writes "The data factory agent is quite good but pricing needs to be more transparent". Apache Hadoop is most compared with Microsoft Azure Synapse Analytics, Oracle Exadata, Snowflake, Teradata and BigQuery, whereas Azure Data Factory is most compared with Informatica PowerCenter, Alteryx Designer, Informatica Cloud Data Integration, Snowflake and Microsoft Azure Synapse Analytics. See our Apache Hadoop vs. Azure Data Factory report.
See our list of best Cloud Data Warehouse vendors.
We monitor all Cloud Data Warehouse reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.