We performed a comparison between Apache Hadoop and Snowflake based on real PeerSpot user reviews.
Find out in this report how the two Data Warehouse solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."Most valuable features are HDFS and Kafka: Ingestion of huge volumes and variety of unstructured/semi-structured data is feasible, and it helps us to quickly onboard a new Big Data analytics prospect."
"We selected Apache Hadoop because it is not dependent on third-party vendors."
"Its integration is Hadoop's best feature because that allows us to support different tools in a big data platform."
"Data ingestion: It has rapid speed, if Apache Accumulo is used."
"The scalability of Apache Hadoop is very good."
"As compared to Hive on MapReduce, Impala on MPP returns results of SQL queries in a fairly short amount of time, and is relatively fast when reading data into other platforms like R."
"Initially, with RDBMS alone, we had a lot of work and few servers running on-premise and on cloud for the PoC and incubation. With the use of Hadoop and ecosystem components and tools, and managing it in Amazon EC2, we have created a Big Data "lab" which helps us to centralize all our work and solutions into a single repository. This has cut down the time in terms of maintenance, development and, especially, data processing challenges."
"The most valuable feature is scalability and the possibility to work with major information and open source capability."
"The solution's computing time is less."
"Can be leveraged with respect to better performance, auto tuning and competition."
"I have found the solution's most valuable features to be storage, flexibility, ease of use, and security."
"Snowflake is an enormously useful platform. The Snowpipe feature is valuable because it allows us to load terabytes and petabytes of data into the data mart at a very low cost."
"Snowflake is faster than on-premise systems and allows for variable compute power based on need."
"Snowflake has a variety of other ETL provisions that they provide. You can use your own ETL pipeline. Additionally, they provide adapters, and they are always evolving, it is a well-developed solution."
"Data Science capabilities are the most valuable feature."
"The initial setup is very simple."
"The solution is not easy to use. The solution should be easy to use and suitable for almost any case connected with the use of big data or working with large amounts of data."
"Based on our needs, we would like to see a tool for data visualization and enhanced Ambari for management, plus a pre-built IoT hub/model. These would reduce our efforts and the time needed to prove to a customer that this will help them."
"It requires a great deal of learning curve to understand. The overall Hadoop ecosystem has a large number of sub-products. There is ZooKeeper, and there are a whole lot of other things that are connected. In many cases, their functionalities are overlapping, and for a newcomer or our clients, it is very difficult to decide which of them to buy and which of them they don't really need. They require a consulting organization for it, which is good for organizations such as ours because that's what we do, but it is not easy for the end customers to gain so much knowledge and optimally use it."
"Hadoop's security could be better."
"The main thing is the lack of community support. If you want to implement a new API or create a new file system, you won't find easy support."
"From the Apache perspective or the open-source community, they need to add more capabilities to make life easier from a configuration and deployment perspective."
"The price could be better. I think we would use it more, but the company didn't want to pay for it. Hortonworks doesn't exist anymore, and Cloudera killed the free version of Hadoop."
"I think more of the solution needs to be focused around the panel processing and retrieval of data."
"There are some challenges with loading unstructured data and integrating some message queues or brokers. In one project, we had a problem connecting to one of the message queues and we had to take a different route altogether on Microsoft Azure."
"I think that Snowflake could improve its user interface. The current one is not interactive."
"Its pricing or affordability is one of the big challenges. Pricing was the only thing that we didn't like about Snowflake. In terms of technical features, it is a complete solution."
"Snowflake has to improve their spatial parts since it doesn't have much in terms of geo-spatial queries."
"For the Snowflake database, there should be some third-party features for the ETL. It would also be good to be able to use some kind of controls to get the data either from another database or a flat file. Its price should be improved. It should be cheaper than Microsoft."
"I see room for improvement when it comes to credit performance. The other thing I'd like to be improved is the warehouse facility."
"It needs a bit more rigor and governance, which is something you don't get with newer tools. This makes it less enterprise scalable. Its governance and structure can be enhanced, which would really be valuable. I would like to see some kind of prebuilt functionality in terms of having almost like a pre-built data warehouse. A functionality for generating automated kind of pieces would be good."
"The solution could use a little bit more UI."
Apache Hadoop is ranked 5th in Data Warehouse with 32 reviews while Snowflake is ranked 1st in Data Warehouse with 92 reviews. Apache Hadoop is rated 7.8, while Snowflake is rated 8.4. The top reviewer of Apache Hadoop writes "A file system for data collection that contains needed information and files". On the other hand, the top reviewer of Snowflake writes "Good usability, good data sharing and elastic compute features, and requires less DBA involvement". Apache Hadoop is most compared with Azure Data Factory, Microsoft Azure Synapse Analytics, Oracle Exadata, Teradata and BigQuery, whereas Snowflake is most compared with BigQuery, Azure Data Factory, Teradata, Vertica and Teradata Cloud Data Warehouse. See our Apache Hadoop vs. Snowflake report.
See our list of best Data Warehouse vendors and best Cloud Data Warehouse vendors.
We monitor all Data Warehouse reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.
Apache Hadoop is for data lake use cases. But getting data out of Hadoop for meaningful analytics is indeed need quite an amount of work. by either using spark/Hive/presto and so on. The way i look at Snowflake and Hadoop is they complement each other. For data lake you can use hadoop and then for datawarehouse companies can use snowflake. Depending on the size of the company you can turn snowflake into a data lake use case too. Snowflake is SQL friendly and you don't need to carry out any circus to get the data in and out of snowflake.