Apache Hadoop Reviews and Pricing - page 3

No more typing reviews! Try our Samantha, our new voice AI agent.

Yevgen Manzhulyanov

CEO at AM-BITS LLC

Aug 8, 2023

A hybrid solution for managing enterprise data hubs, monitoring network quality, and implementing an AntiFraud system

Pros and Cons

"The most valuable feature is scalability and the possibility to work with major information and open source capability."

"The solution is not easy to use. The solution should be easy to use and suitable for almost any case connected with the use of big data or working with large amounts of data."

What is our primary use case?

This solution is used for a variety of purposes, including managing enterprise data hubs, monitoring network quality, implementing an AntiFraud system, and establishing a conveyor system.

What is most valuable?

The most valuable feature is scalability and the possibility to work with major information and open source capability.

What needs improvement?

The solution is not easy to use. The solution should be easy to use and suitable for almost any case connected with the use of big data or working with large amounts of data.

For how long have I used the solution?

I have been using Apache Hadoop for ten years. Initially, we worked directly, but now we use Cloudera and Bigtop. We are the solution provider.

Free Report: Apache Hadoop Reviews and More

Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: May 2026.

896,510 professionals have used our research since 2012.

What do I think about the stability of the solution?

The tool's stability is good.

What do I think about the scalability of the solution?

We may have 15 people working on this solution.

I rate the solution’s scalability a ten out of ten.

How was the initial setup?

The setup is not easy for a financial or telecom company.

It takes around one month for basic development and around three to four months for enterprise. We require more than 50 engineers to do the engineering stuff and more than 20 If for the data engineering team.

In terms of production, the most significant aspects are security and staging, with a focus on either a one-month or three-month timeframe for security considerations.

What other advice do I have?

The best advice is not to start a project based on Apache Hadoop alone. It is based on technology, and needs a skilled team.

Overall, I rate the solution an eight out of ten.

Disclosure: My company has a business relationship with this vendor other than being a customer. Partner

Aria Amini - PeerSpot reviewer

Data Engineer at Behsazan Mellat

Aug 7, 2023

A big-data engineering solution that integrates well into a variety of environments

Pros and Cons

"Its integration is Hadoop's best feature because that allows us to support different tools in a big data platform."

"It could be more user-friendly."

What is our primary use case?

We use the Apache Hadoop environment for use cases involving big data engineering. We have many applications, such as collecting, transforming, loading, and storing lag event data for big organizations.

What is most valuable?

Its integration is Hadoop's best feature because that allows us to support different tools in a big data platform. Hadoop can integrate all of these features in various environments and have use cases beyond all of the tools in the environment.

What needs improvement?

It could be more user-friendly. Other platforms, such as Cloudera, used for big data, are more user-friendly and presented in a more straightforward way. They are also more flexible than Hadoop. Hadoop's scrollback is not easy to use, either.

For how long have I used the solution?

I have used Apache Hadoop for three years, and I use Hadoop's open-source version.

What do I think about the stability of the solution?

Hadoop is stable because it's run on a cluster. And if some issues occur for a range of servers, Hadoop could continue its activity.

What do I think about the scalability of the solution?

Apache Hadoop is very good for scalability because one of its main features is its scalability tool. For all the big data infrastructure, we have about ten employees working in the Hadoop environment as engineers and developers. One of our clients is a bank, and the Hadoop environment can retrieve a lot of data, so we could have an unlimited number of end users.

How was the initial setup?

The initial setup is, to some extent, difficult because additional skills are required, specifically knowledge of the operating system at installation. We need someone with professional skills to install the Hadoop environment. With one engineer with those skills, Hadoop takes ten days to two weeks to deploy the solution.

Two or three people are needed to maintain the solution. At least two people are required to maintain the Hadoop stack, in case of unexpected situations, like when something gets corrupted, and they need to solve the problem as fast as possible. Hadoop is easy to maintain because of its governance feature, which helps maintain all the Hadoop stacks.

Which other solutions did I evaluate?

Some competitors include Kibana from Elasticsearch, Splunk, and Cloudera. Each of them has some advantages and disadvantages, but Hadoop is more flexible when working in a big data environment. Compared to Splunk and Cloudera, Apache Hadoop is platform-independent and works on any platform. It is also open-source.

What other advice do I have?

We use Hadoop's open-source version and do not receive direct support from Apache. There are good resources on the web, though, so we have no problem getting help, but not directly from the company.

If you want to use big data on a larger scale, you should use Hadoop. But you could use alternatives if you're going to use big data to analyze data in the short term and don't need cybersecurity. You could use your cloud's features. For example, if you are on Google or Amazon Cloud, you could use in-built features instead of Apache Hadoop. If you are, like us, working with banks that don't want to use the cloud or some commercial clouds or have large-scale data, Hadoop is a good choice for you.

I rate Apache Hadoop an eight out of ten because it could be more user-friendly and easier to install. Also, Hadoop has changed some features in the commercial version.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Free Report: Apache Hadoop Reviews and More

Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: May 2026.

896,510 professionals have used our research since 2012.

Business data analyst at RBSG Internet operations

Sep 19, 2022

A low-cost solution that allows us to download data, but has latency issues when running queries

Pros and Cons

"One valuable feature is that we can download data."
"Hadoop has also made it feasible to have all the data available in one area."

"I think more of the solution needs to be focused around the panel processing and retrieval of data."
"We have plans to increase usage and this is where we've realized that when we have all these clusters and we're running queries and analyzing, we are facing some latency issues."

What is our primary use case?

We use the solution as a data link for our customer payment and SaaS information. We get data from various sources and then utilize and leverage that data.

What is most valuable?

One valuable feature is that we can download data. Another is that it is a low-cost solution. Hadoop has also made it feasible to have all the data available in one area.

What needs improvement?

We have plans to increase usage and this is where we've realized that when we have all these clusters and we're running queries and analyzing, we are facing some latency issues. I think more of the solution needs to be focused around the panel processing and retrieval of data.

For how long have I used the solution?

I have been using this solution for about seven or eight years.

What do I think about the stability of the solution?

This is a stable product.

What do I think about the scalability of the solution?

The scalability of the solution is good. Approximately 100 people are currently using this solution within our company.

How are customer service and support?

I would rate the tech support as a four out of five.

How would you rate customer service and support?

Positive

What other advice do I have?

I would recommend this product to others. I would rate it as an eight out of ten.

Which deployment model are you using for this solution?

On-premises

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

R&D Head, Big Data Adjunct Professor at SK Communications Co., Ltd.

Jan 26, 2022

Not dependent on third-party vendors

Pros and Cons

"We selected Apache Hadoop because it is not dependent on third-party vendors."
"We selected Apache Hadoop because it is not dependent on third-party vendors."

"Real-time data processing is weak. This solution is very difficult to run and implement."
"Apache Hadoop's real-time data processing is weak and is not enough to satisfy our customers, so we may have to pick other products."

What needs improvement?

Apache Hadoop's real-time data processing is weak and is not enough to satisfy our customers, so we may have to pick other products. We are continuously researching other solutions and other vendors.

Another weak point of this solution, technically speaking, is that it's very difficult to run and difficult to smoothly implement. Preparation and integration are important.

The integration of this solution with other data-related products and solutions, and having other functions, e.g. API connectivity, are what I want to see in the next release.

For how long have I used the solution?

We've started using Apache Hadoop since 2011.

Which solution did I use previously and why did I switch?

We selected Apache Hadoop because it is not dependent on third-party vendors. Previously, our main business unit was related to big vendors like IBM, Oracle, and EMC, etc. We wanted to have a competitive advantage in technology, so we selected the Apache project and used Apache open source.

What about the implementation team?

The solution was implemented through a local vendor team here in Korea.

Which other solutions did I evaluate?

We evaluated IBM, Oracle, and EMC solutions.

What other advice do I have?

My position in the company falls under the research and development of new technologies and solutions. I investigate, research, download, and read information and reports as part of my job.

Our company has a big data business division, and we propose, develop, and implement things which are related to big data projects. We are using Cloud Hadoop open source versions, distributed versions, and commercial Hadoop distributed versions. We propose all these versions to our customers from any industry.

Our focus is on the public sector. Big data is our strong point in Korea. Our company is the leader in big data technology, including infrastructure and visualization. This is a solution we provide to our customers. We are also in partnership with IBM. Our main focus is on Apache Hadoop.

We provide Apache Hadoop to our customers. I work for a systems integrator and technical consulting company.

Overall, our satisfaction with this solution is so-so. We continuously investigate new technologies and other solutions.

The Hadoop open source version was implemented in 95% of our company's customer base. Our remaining customers had the local vendor's Hadoop platform package implemented for them.

Our company is in the big data business. Before the big data business back in 1976, we implemented BI (business intelligence), DW (data warehouse), EIS, and DSS (decision support system), so we are in partnership with IBM.

I don't have advice for people looking into implementing this solution because I'm not in the business unit. I'm in the research field. My role is to plan new technology and provide consultation to our customers for big data projects in the early stages.

My rating for Apache Hadoop from a technical standpoint is eight out of ten.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

reviewer1384338 - PeerSpot reviewer

reviewer1384338

Vice President - Finance & IT at a consumer goods company with 1-10 employees

Jul 15, 2020

Great micro-partitions, helpful technical support and quite stable

Pros and Cons

"The solution is easy to expand. We haven't seen any issues with it in that sense. We've added 10 servers, and we've added two nodes. We've been expanding since we started using it since we started out so small. Companies that need to scale shouldn't have a problem doing so."
"If the data volume is too big, it's IoT data, or the stream of data is too much, this solution can handle it and I would definitely recommend Apache Hadoop."

"The solution needs a better tutorial. There are only documents available currently. There's a lot of YouTube videos available. However, in terms of learning, we didn't have great success trying to learn that way. There needs to be better self-paced learning."
"The solution needs a better tutorial."

What is our primary use case?

As an example of a use case, when I was a contractor for Cisco, we were processing mobile network data and the volume was too big. RDBMS was not supporting anything. We started using the Hadoop framework to improve the process and get the results faster.

What is most valuable?

The data is stored in micro-partitions which makes the processes very fast compared to other RDBMS systems. Apache Spark is in the memory process, and it's much better than MapReduce.

Micro-partitions and the HDFS are both excellent features.

What needs improvement?

I'm not sure if I have any ideas as to how to improve the product.

Every year, the solution comes out with new features. Spark is one new feature, for example. If they could continue to release new helpful features, it will continue to increase the value of the solution.

The solution could always improve performance. This is a consistent requirement. Whenever you run it, there is always room for improvement in terms of performance.

The solution needs a better tutorial. There are only documents available currently. There's a lot of YouTube videos available. However, in terms of learning, we didn't have great success trying to learn that way. There needs to be better self-paced learning.

We would prefer it if users didn't just get pushed through to certification-based learning, as certifications are expensive. Maybe if they could arrange it so that the certification was at a lesser cost. The certification cost is currently around $2,500 or thereabout.

For how long have I used the solution?

I've been using the solution for four years.

What do I think about the stability of the solution?

We haven't had too many problems with stability. For the POC we used a small amount of data and we started with 10 nodes. We're gradually increasing in now to 40 nodes. We haven't seen any issues after the small teething period in the beginning. The configuration issues and the performance issues have subsided. Once we learned how to stack everything, it has been much better.

What do I think about the scalability of the solution?

The solution is easy to expand. We haven't seen any issues with it in that sense. We've added 10 servers, and we've added two nodes. We've been expanding since we started using it since we started out so small. Companies that need to scale shouldn't have a problem doing so.

We are supporting a multitenancy model and we get the data on supporting the users. I would say, per organization, we have eight to 10 users and probably have a total of around 40 users across the board.

How are customer service and technical support?

We started on the solution as a POC. Once we got into production, we had some minor issues. We get great support. They share advice and helped us tweak some things in terms of the configurations. We've been satisfied with the level of service we've been provided.

Which solution did I use previously and why did I switch?

We have only ever used Apache Hadoop, or a version of it. When we looked for the commercial tier, there was Cloudera and Hortonworks. We started with the Hortonworks due to the fact that at that time we felt it was cost-effective. However, Cloudera bought Hadoop and Hortonworks and now it's all basically the same solution.

How was the initial setup?

The initial setup was a little complex the first time around. We were new to the system, and we didn't have any expertise at that time. Once we get some support and insights into how to work everything properly it went more smoothly.

First, we started with a POC - proof of concept. It takes a couple of days in terms of understanding and configuring everything, etc. When we went to production, it was a couple of hours for deployment and we put into practice everything we learned from the POC.

There's definitely a learning curve. It's stable for us now.

We have a team of developers doing multiple tasks on the solution and few of them are taking care of Hadoop, so we do have a few people handling maintenance.

What about the implementation team?

As we were new to the solution, we found we needed some outside assistance to guide us. However, that was for the POC. In the end, I did it myself.

What other advice do I have?

We're just a customer. We don't have a business relationship with Hadoop.

My day-to-day job is data modeling and architecting.

Originally we used it as an open-source solution. We downloaded it, then we went for a commercial version of it.

In terms of advice, I'd tell other potential users that whether the solution is right for them depends on a few items. If the data volume is too big, it's IoT data, or the stream of data is too much, this solution can handle it and I would definitely recommend Apache Hadoop.

Recently, in the last 18 months, I've been working with the Snowflake, it's a Data Lake project, and I am really impressed with that one. I got a certification so that we started using Snowflake set for our Data Lake environment.

I'd rate the solution eight out of ten.

Which deployment model are you using for this solution?

On-premises

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Data Analytics Practice head at bse

Apr 29, 2022

Stable, highly scalable, but integration could improve

Pros and Cons

"The scalability of Apache Hadoop is very good."
"The scalability of Apache Hadoop is very good."

"The integration with Apache Hadoop with lots of different techniques within your business can be a challenge."
"The integration with Apache Hadoop with lots of different techniques within your business can be a challenge."

What needs improvement?

The integration with Apache Hadoop with lots of different techniques within your business can be a challenge.

For how long have I used the solution?

I have been using Apache Hadoop for approximately nine years.

What do I think about the stability of the solution?

Apache Hadoop is stable.

What do I think about the scalability of the solution?

The scalability of Apache Hadoop is very good.

What's my experience with pricing, setup cost, and licensing?

The price of Apache Hadoop could be less expensive.

What other advice do I have?

My advice to others is if you have a strong engineering team then this solution is excellent.

I rate Apache Hadoop an eight out of ten.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

reviewer901065 - PeerSpot reviewer

Partner at a tech services company with 11-50 employees

Oct 10, 2021

Highly elastic and stable, but it needs better security

Pros and Cons

"Hadoop is extensible — it's elastic."
"Hadoop is extensible — it's elastic."

"Hadoop's security could be better."
"Hadoop's security could be better."

What is our primary use case?

There are several use cases for Hadoop. Sometimes it's used for data warehousing. Other times, it's analytics. And In some cases, it's used to do transformation. For example, I have one client using it to decompress, compress, or encrypt data on ingestion. So, he used it like an ETL engine.

What is most valuable?

Hadoop is extensible — it's elastic.

What needs improvement?

Hadoop's security could be better.

For how long have I used the solution?

I've been using Hadoop for about eight years. I'm not sure exactly.

What do I think about the stability of the solution?

Performance is one of the reasons people choose Hadoop.

What do I think about the scalability of the solution?

Scalability is one of Hadoop's strong suits.

How are customer service and support?

I've never had to use Hadoop support.

How was the initial setup?

The complexity of Hadoop's setup depends on the customer and their needs. However, most of my customers wind up using Hadoop as a service, which makes it very easy. It doesn't need much maintenance. My staff maintains multiple systems, so it's not like there would ever be somebody dedicated to one, and Hadoop is not a high-touch platform.

What other advice do I have?

I rate Hadoop seven out of 10. It's very good, but it could always be better. To anyone considering Hadoop, I recommend that you be mindful of what you're trying to achieve.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)

Disclosure: My company has a business relationship with this vendor other than being a customer. Implementer

reviewer1464630 - PeerSpot reviewer

reviewer1464630

Founder & CTO at a tech services company with 1-10 employees

Dec 15, 2020

Processes large data sets across clusters of computers

Pros and Cons

"Hadoop is designed to be scalable, so I don't think that it has limitations in regards to scalability."
"Real-time streaming and integration using Spark streaming and the ecosystem of Spark technologies inside Hadoop."

"From the Apache perspective or the open-source community, they need to add more capabilities to make life easier from a configuration and deployment perspective."
"From the Apache perspective or the open-source community, they need to add more capabilities to make life easier from a configuration and deployment perspective."

What is our primary use case?

We mainly use Apache Hadoop for real-time streaming. Real-time streaming and integration using Spark streaming and the ecosystem of Spark technologies inside Hadoop.

What is most valuable?

I actually like most of the capabilities, but I think Spark has added reposit capabilities on top of the Hadoop ecosystem. The Spark area includes the capabilities that I like the most with Hadoop.

What needs improvement?

I don't have any concerns because each part of Hadoop has its use cases. To date, I haven't implemented a huge product or project using Hadoop, but on the level of POCs, it's fine.

The community of Hadoop is now a cluster, I think there is room for improvement in the ecosystem.

From the Apache perspective or the open-source community, they need to add more capabilities to make life easier from a configuration and deployment perspective.

For how long have I used the solution?

I have been using this solution for roughly five years.

What do I think about the stability of the solution?

I've never experienced any bugs or glitches.

What do I think about the scalability of the solution?

Hadoop is designed to be scalable, so I don't think that it has limitations in regards to scalability.

How was the initial setup?

It's a well-known fact that Hadoop's configuration is pretty hard.

What other advice do I have?

Usually, people need to study and prepare for a few use cases and compare multiple ecosystems before choosing one. When people think of using a big data solution, Hadoop comes to mind. For certain use cases, Hadoop is comparable with other technologies. For example, when building a sort of real-time data warehouse — an enterprise data hub —, people don't think about using Hadoop directly. People often use solutions like DROID for building.

At the end of the day, you need to compare technologies — existing technologies against their use cases. You need to study your use case and select the technology inside of Hadoop that will fit your use case. You may find another ecosystem that solves your problem, just keep in mind, Hadoop is not the only solution, there are a lot of solutions. It depends on the use case.

Overall, on a scale from one to ten, I would give Hadoop a rating of eight.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Buyer's Guide

Download our free Apache Hadoop Report and get advice and tips from experienced pros sharing their opinions.

Updated: May 2026

Product Categories

Buyer's Guide

Download our free Apache Hadoop Report and get advice and tips from experienced pros sharing their opinions.

Quick Links

Learn More:

Questions: