Coming October 25: PeerSpot Awards will be announced! Learn more

Cloudera Distribution for Hadoop OverviewUNIXBusinessApplication

Cloudera Distribution for Hadoop is #2 ranked solution in top Hadoop tools and #3 ranked solution in top NoSQL Databases. PeerSpot users give Cloudera Distribution for Hadoop an average rating of 7.6 out of 10. Cloudera Distribution for Hadoop is most commonly compared to Amazon EMR: Cloudera Distribution for Hadoop vs Amazon EMR. Cloudera Distribution for Hadoop is popular among the large enterprise segment, accounting for 72% of users researching this solution on PeerSpot. The top industry researching this solution are professionals from a computer software company, accounting for 20% of all views.
Buyer's Guide

Download the Hadoop Buyer's Guide including reviews and more. Updated: September 2022

What is Cloudera Distribution for Hadoop?
Cloudera Distribution for Hadoop is the world's most complete, tested, and popular distribution of Apache Hadoop and related projects. CDH is 100% Apache-licensed open source and is the only Hadoop solution to offer unified batch processing, interactive SQL, and interactive search, and role-based access controls. More enterprises have downloaded CDH than all other such distributions combined.
Cloudera Distribution for Hadoop Customers
37signals, Adconion,adgooroo, Aggregate Knowledge, AMD, Apollo Group, Blackberry, Box, BT, CSC
Cloudera Distribution for Hadoop Video

Archived Cloudera Distribution for Hadoop Reviews (more than two years old)

Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
AD - Associate Director at a financial services firm with 10,001+ employees
Real User
Feature rich and scalable with good support, but there are performance issues and the security could be improved
Pros and Cons
  • "The main advantage is the storage is less expensive."
  • "Currently, we are using many other tools such as Spark and Blade Job to improve the performance."

What is our primary use case?

We are using this solution for storing Big Data in one centralized location.

How has it helped my organization?

It has been helpful in allowing data storage in one centralized location with data lakes and all of the surrounding applications.

All of the data processes are being stored into the Big Data Lake.

What is most valuable?

It allows us to store huge amounts of data, which is an advantage.

They have BI (Business Intelligence) tools. There are many AI tools.

We are able to connect and analyze the data to get reports. The reports are very good.

The main advantage is the storage is less expensive.

What needs improvement?

The performance can be improved. We have experienced some performance issues. It is not as sophisticated as Oracle Sybase.

Currently, we are using many other tools such as Spark and Blade Job to improve the performance.

The setup could be simplified, it's complex.

The security needs to be improved.

Buyer's Guide
Hadoop
September 2022
Find out what your peers are saying about Cloudera, IBM, Amazon and others in Hadoop. Updated: September 2022.
634,590 professionals have used our research since 2012.

For how long have I used the solution?

I have been using this solution since 2015.

What do I think about the stability of the solution?

It's a stable solution.

What do I think about the scalability of the solution?

Scalability is good. It's replicated and by default, with Big Data there is a replication factor.

Over the years we have grown, when we started we had 10 nodes now we have increased to a large number of nodes.

How are customer service and support?

Technical support is good. I have been able to learn from them. As a developer, I am learning every day.

I would rate the technical support a ten out of ten.

Which solution did I use previously and why did I switch?

Previously we were using Oracle Sybase SQL. We switched because now, we have introduced Big Data.

How was the initial setup?

The initial setup was complex.

It's not as simple as Oracle Sybase.

It's a complex architecture because you have raw data and many engines.

What's my experience with pricing, setup cost, and licensing?

When comparing with Oracle Sybase and SQL, it's cheaper. It's not expensive.

What other advice do I have?

I am a part of security and software development. 

We are currently considering migrating to the cloud, and planning on using Microsoft Azure, mainly for the Big Data component.

I would rate this solution a five out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
EricLin - PeerSpot reviewer
Chairman at Athemaster co.,ltd.
Real User
Performs cost analysis tasks for our customers in the financial industry

What is our primary use case?

We are a solution provider and this is one of the systems that we implement for our clients.

Our clients for this product are in the financial industry and they use it to perform cost analysis tasks.

What is most valuable?

The most valuable feature is Kubernetes.

What needs improvement?

The price of this solution could be lowered.

For how long have I used the solution?

We have been using the Cloudera Distribution for Hadoop for five years.

What do I think about the stability of the solution?

It is a stable solution.

What do I think about the scalability of the solution?

The Cloudera Distribution for Hadoop can be scaled. Our customers are enterprise-level companies and they have about 100 users for this solution.

How are customer service and technical support?

We offer technical support for this solution to our customers.

Which solution did I use previously and why did I switch?

We did not use another solution prior to this one.

How was the initial setup?

The initial setup is straightforward.

What's my experience with pricing, setup cost, and licensing?

The pricing is expensive.

Which other solutions did I evaluate?

Cloudera really has no competition.

What other advice do I have?

I would rate this solution a nine out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: reseller
PeerSpot user
Buyer's Guide
Hadoop
September 2022
Find out what your peers are saying about Cloudera, IBM, Amazon and others in Hadoop. Updated: September 2022.
634,590 professionals have used our research since 2012.
Data engineer at a tech services company with 11-50 employees
Real User
Supports a wide range of tools and has a good support community
Pros and Cons
  • "We also really like the Cloudera community. You can have any question and will have your answer within a few hours."
  • "Without the big data environment, we cannot store all of this data live. We have billions of records and terabytes of storage to be used. It's not an option actually for us to have a big data environment."

What is our primary use case?

Our primary use case for this solution is to host a big amount of data in our platform, processing, analysis and all of this stuff on the platform.

What is most valuable?

Cloudera is always developing new tools and supports a wide range of tools. We also really like the Cloudera community. You can have any question and will have your answer within a few hours. Cloudera is better than other competitors because they acquired Hortonworks.

What needs improvement?

We're processing a huge amount of data on our system. Without the big data environment, we cannot store all of this data live. We have billions of records and terabytes of storage to be used. It's not an option actually for us to have a big data environment. Cloudera is trying to adopt new technologies.

I think the idea of open source tools now is dominating. So Cloudera has to decide how to deal with open-source tools. I subscribe to Cloudera to get an enterprise version but I have found that I can get some of its features from other vendors that would be at a lower cost than Cloudera. They should lower the price. 

For how long have I used the solution?

We have been using Cloudera for a year. 

What do I think about the stability of the solution?

It's stable. I have no issue regarding the stability.

What do I think about the scalability of the solution?

It's scalable. You can add more nodes and you can expand your cluster easily.

How are customer service and technical support?

After we open a ticket, the issue can be resolved very quickly, they have a management portal. I don't contact them directly, but I haven't heard anybody having any problems with it. 

How was the initial setup?

The initial setup is complicated. We needed the vendor to install it themselves. The deployment took around three weeks. Three people were involved because they just follow up and supervise the deployment, but they're not deploying anything. The vendor does it. 

What other advice do I have?

In terms of the advice, I would say to focus on what tools are available on the market. In terms of open-source, most companies are delivering open source technologies and providing support to these tools. Now I have the option to purchase a license for whatever platform for $1. I can deliver it with another small company at a lower cost. If I was the decision-maker, I'd invest in open-source tools. Cloudera and all of these companies are trying to adapt to these big data technologies and open source tools. Cloudera is trying to put it inside their platform so that we can have a compatible solution.

I would rate it an eight out of ten. 

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
NavneetKaur - PeerSpot reviewer
Senior Software Engineer at a tech services company with 10,001+ employees
Real User
Performs well and the technical support is helpful, but the upgrade process needs to be consolidated
Pros and Cons
  • "The most valuable feature is Impala, the querying engine, which is very fast."
  • "There is a maximum of a one-gigabyte block size, which is an area of storage that can be improved upon."

What is our primary use case?

We are dealing with data from the telecom industry. We were using an Oracle system but our volume has increased. We now have a lot of real-time data that needs to be transformed so that it can be made available and used.

What is most valuable?

The most valuable feature is Impala, the querying engine, which is very fast. We have been able to work with one terabyte of data in less than 20 minutes. The speed makes it easy for us to process all of the data that comes in, in time.

The support is very good.

All of the data has automatic triple replication in order to secure integrity.

What needs improvement?

There is a maximum of a one-gigabyte block size, which is an area of storage that can be improved upon.

When we are upgrading CDH, there are many things that need to be upgraded and it would be helpful if it were bundled. As it is now, we have to upgrade many different things separately.

For how long have I used the solution?

I have been working with the Cloudera Distribution for Hadoop for around two years.

What do I think about the stability of the solution?

It is a stable solution.

What do I think about the scalability of the solution?

The scalability is good and it works on commodity hardware. One of the problems we have right now is that there is a lot of data and we're moving it from our Oracle solution. This means that there is a double cost, in terms of storage, during our transition to working with big data.

We are using a data lake that is a store for all of the data in our organization. There are more than25 projects, with between 25 and 30 people in each one, for a total of almost 1,000 people. All of them are dependent on this solution.

Most of our users are technicians who have problems to solve using the data available to them. A couple of them are data scientists and the remainder are upper management, who do the analysis.

How are customer service and technical support?

The technical support is very good. Whenever we open a ticket, we get support right away.

Which solution did I use previously and why did I switch?

We did use another solution prior to this one but it could not keep up with our increase in data.

What other advice do I have?

This suitability of this solution depends on the size of the data that you are going to be working with. If you have going to be working with a huge dataset that contains many gigabytes of data then this is a good solution. For smaller datasets, you should also consider other technologies.

My advice for anybody who is implementing this solution is to take some time to learn it. Beyond that, be sure to contact support if you have any problems because they are very helpful.

I would rate this solution a seven out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user900987 - PeerSpot reviewer
Data Management at BCX
Real User
Offers big data support for analytical applications but the technical support needs improvement
Pros and Cons
  • "In terms of scalability, if you have enough hardware you can scale out. Scalability doesn't have any issues."
  • "The one thing that we struggled with predominately was support. Because it was relatively new, support was always a big issue and I think it's still a bit of an ongoing concern with the team currently managing it."

What is our primary use case?

We primarily use it only for big data support for analytical applications.

What is most valuable?

The feature that we've used quite intensively is Spark, in how it specifically can speed up some of the data to assist with processing.

What needs improvement?

The one thing that we struggled with predominately was support. Because it was relatively new, support was always a big issue and I think it's still a bit of an ongoing concern with the team currently managing it.

In the next release, I think it would be helpful if there was easier integration into all the other existing data back corners. It will be a big plus as it's a favorite capability. We had to go with a third-party application in order to achieve that.

For how long have I used the solution?

I've been using the solution since 2016.

What do I think about the stability of the solution?

The stability is problematic. We did encounter quite a lot of issues with the cluster going down quite frequently.

What do I think about the scalability of the solution?

In terms of scalability, if you have enough hardware you can scale out. Scalability doesn't have any issues. Currently, only about 10 people in total are using the solution. So we have about four business users and then four technical people. It's only limited to two environments.

How are customer service and technical support?

I think there's a lot of room for improvement on the technical support side. Mostly because we don't have a lot of local skills in South Africa that could have supported the solution. It was an issue.

Which solution did I use previously and why did I switch?

This is our first solution. We tested a bunch of other technologies, but that was our first one and we're still using it.

How was the initial setup?

The initial implementation was straightforward from an application side. There weren't any hiccups. In terms of deployment time, it's going to be difficult to say, because most of it was related to hardware problems. Software took about two months to deploy. We required four people for deployment.

What's my experience with pricing, setup cost, and licensing?

The pricing is very competitive. It's not bad.

Which other solutions did I evaluate?

We considered working with a few other companies, including IBM Bluemix.

What other advice do I have?

I would recommend the solution given that they've proven the business case and that they've proven the technology. We have found that if you don't use or address the right business code you end up buying a technology that doesn't necessarily solve your business problems.

I would rate the solution seven out of ten. The main reason for not rating it higher is that I think that the overall support is not great and we've found some limitations. It wasn't mature when we started. It's getting there. It's getting better. The main reason for the score of seven is mainly the support as well as the limited functionality.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
BI Manager at a insurance company with 10,001+ employees
Real User
Open-source solution for intelligent data management and analysis
Pros and Cons
  • "Provides a viable open-source solution for enterprise implementations and reliable, intelligent data analysis."
  • "The solution does not support multiple languages very well and this means users need to create work-arounds to implement some solutions."

What is our primary use case?

We make recommendations to clients for using different models of this solution to handle data intelligently.

How has it helped my organization?

It gives us the opportunity to offer more options to our clients and create better solution models.

What is most valuable?

We find CDSW useful and plan to use it as a one-stop application for model build and training. Currently, we use Zeppelin notebook and we want to gravitate to a single application for notebooks.

What needs improvement?

The Data Science Workbench doesn't support multiple languages. It needs to support multiple programming languages. We were trying to use Scalar and Python for some solutions we wanted to deploy, but they didn't work properly. As a result, we had to come up with other workaround solutions. If the Data Science Workbench supported multiple programming languages our workflow would be easier and the solutions could be better.

Another aspect we would like to see improved is better opportunities for integration. For example, we would like to use H2O machine learning, which is an open-source product, and Cloudera doesn't support H2O.

If they could support H2O and also deploy multi-language support on the Cloudera Data Science that would be great. But the biggest thing that would help right now is H2O support.

Finally, one other improvement I would suggest is integrating data privacy software into  Cloudera. It is not quite complete in this aspect.

For how long have I used the solution?

We have been using the solution for approximately eight weeks.

What do I think about the stability of the solution?

From a stability point of view, we know that there is a new product coming out called Unity — or that is the proposed name of the product that merges Cloudera and Hortonworks. We know that this means that some changes will be happening within the environment. We don't believe that they will be radical changes that will affect existing software that we have. It should just be added functionality of Hortonworks integration. But we know at the same time that Cloudera support will be available if we need it.

What do I think about the scalability of the solution?

While we have not yet done a lot to scale the solution, we think that is going to be quite scalable because it's working on a distributed architecture. 

We will probably start with 10 or 15 users once we roll the solution out into production, which will probably be at the end of this week. Afterward, the user base will be growing quite large by double digits in percentage. But that is just to start with. Over a few years, we plan to start thinking about rolling out our experiences to our international businesses as well. This would be a substantial increase in user base.

How are customer service and technical support?

At the moment and for what we have been able to experience, technical support seems to be fine. I would rate it at between seven to eight out of ten.

Which solution did I use previously and why did I switch?

We did not consider other solutions.

How was the initial setup?

The initial setup was difficult and we didn't like it. That is only because we implemented it with other software solutions outside Cloudera and needed to do the integrations. 

We are still battling with working out problems with some integrations after eight weeks. It's up and running, but we're optimizing, so that is why I'm saying it's probably medium to complex. But that was the situation for us and our particular needs. It may not be as complex for other businesses at all.

What about the implementation team?

We have been working through the implementation with our own team.

Which other solutions did I evaluate?

We did consider other opportunities. Although we are quite comfortable with our current solution we may look at Hortonworks again, but that is not yet confirmed. We believe, from what we have read and what has been advertised, that Hortonworks and Cloudera are going to eventually merge and become one product. According to some sources, it has already happened.

We're simply trying to get the best of both worlds.

What other advice do I have?

I would say that the product as it currently is should rate at an eight out of ten. The reason that score is not higher is because of the workarounds that we have to do when it comes to certain models that do not support using multiple programming languages. For example, in a single notebook, it is inflexible if you want to use other program languages. 

As far as other advice for people considering this solution, I would say take a good look at your business need before you decide on this technology and which solution to choose. Make sure that you are not already able to solve for your particular, identified needs using your existing technology before even considering a change.

You want to be sure you're applying the technology to the right business case because of actual need and not just change for change's sake.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Project Coordinator at a manufacturing company with 1,001-5,000 employees
Real User
Good search functionality but the user interface needs improvement

What is our primary use case?

We primarily use the solution for external storage.

What is most valuable?

The search function is the most valuable aspect of the solution.

What needs improvement?

The user infrastructure and user interface needs to be improved, as well as the performance. The GUI needs to be better.

For how long have I used the solution?

I've been using the solution for 1 year.

Which solution did I use previously and why did I switch?

We did not previously use a different solution.

How was the initial setup?

The initial setup was complex, due to the user interface. We were doing a POC, so we're still doing the deployment.

What other advice do I have?

I would rate this solution seven out of 10. There's tons of room for improvement.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Senior Consultant & Training at a tech services company with 51-200 employees
Consultant
The valuable combination of all the tools enable me to solve use cases I'm working on
Pros and Cons
  • "We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there are a lot of things that need to improve. I believe they are working on that."
  • "We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there is a lot of things that need to improve."

What is our primary use case?

I've been working on the software installation from the beginning, and we have a client for global supply change, so we get information from Telefonica's sales and distributions. Getting all that information into this system allows us to process it, get KPIs, and create outgoing information for business intelligence tools. 

In the cloud provider enterprise we get all the information from the gamers, like delays, response, and information from the games. It allows us to see if gamers are having trouble, high latency or any other kind of issue. They test that and get information about the issues in order to solve them.

What is most valuable?

I like the combination of all the tools that allow me to provide solutions and enable me to solve the use cases I'm working on. You need tools or components to foresee everything, and they are all in our emails. Sometimes you try several of them, and sometimes one will work better than the other. So you have to test the tools to see what works for you. 

What needs improvement?

We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there are a lot of things that need to improve. I believe they are working on that. 

For how long have I used the solution?

I've been using this solution for about a year and a half now.

How was the initial setup?

It's been quite easy to install. We only had to follow the instructions and there weren't many problems. That's important for us.

What other advice do I have?

I will rate this solution a nine out of ten because nothing is ever perfect. You will always face problems, but I'm quite happy with Cloudera. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Doron Sela - PeerSpot reviewer
DBA team manager at a financial services firm with 1,001-5,000 employees
Real User
Helpful to build infrastructure for advanced analytics and is easy to install
Pros and Cons
  • "The features I find most valuable is that the solution is that it is easy to install and to work with. It starts with the installation and from there on the management is very simple and centralized."
  • "I would like to see an improvement in how the solution helps me to handle the whole cluster."

What is our primary use case?

I'm part of the IT team at my company, and our primary use case of this solution is building infrastructure for advanced analytics, where we copy data from our data warehouse that is now our relational database. We copy it to the Cloudera Distribution for Hadoop and then analyze it with Python and machine learning. 

What is most valuable?

The features I find most valuable is that the solution is that it is easy to install and to work with. It starts with the installation and from there on the management is very simple and centralized.

What needs improvement?

I would like to see an improvement in how the solution helps me to handle the whole cluster. For example, when I'm going down to a specific tool, like Kafka, for example, the Cloudera manager doesn't really help me. Then I have to use Google with other Kafka knowledge and tools. 

For how long have I used the solution?

I've been using this solution for about three years now.

What do I think about the stability of the solution?

It is a very stable solution.

What do I think about the scalability of the solution?

Not many people are currently using this solution at my organization, but I do believe it is scalable. I don't, however, have experience with upgrading or adding users. 

How are customer service and technical support?

My problem is that I started using Cloudera Express without technical support and then I purchased the Enterprise edition through another company. So now I don't really have access to Cloudera support, even though I hardly need to use it. 

How was the initial setup?

The initial setup was simple, but we had trouble implementing the cables in the Hadoop solution.

What other advice do I have?

I had a bad experience connecting the Cloudera Distribution for Hadoop cluster to my other resources in the company, like the active directory or firewall. I would like to see the outside environment to be easier to handle. I will rate this eight out of ten because the solution doesn't cover everything. It is a very complicated solution because it contains a lot of internal tools. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Sumit Chaudhuri - PeerSpot reviewer
Lead Consultant - Product Development at FIS (http://www.fisglobal.com/)
Consultant
We use this solution to use big data for our analyses

What is our primary use case?

Our core product is an insurance product and the actuarial module is quite complex. SMEs so far collect data from various sources into Excel sheets and through macros do the analytics which is a very crude form of doing the analysis. So we thought to use big data for such analysis.

How has it helped my organization?

That is still in PUC stage, as I have mentioned our analyst used to do the actuarial on a spreadsheet but after Hadoop  implementation they are getting confidence that now analysis is more appropriate and fast. Now exploring cloud implementation as well.

What is most valuable?

Keeping multi copies of the file and tools of map reduce like PIG, HIVE due to their flexibility it is easy to develop the application with less or almost no knowledge of Java and Sql. And capability to handle huge data size.

What needs improvement?

As such in the product side, I don't have much to comment. But like other upcoming technologies like RPA, AI, GO etc they have ample training materials with variety of USE Cases, which users can understand and aligned with their current requirements. On same ground I didn't see much training materials from Cloudera.

For how long have I used the solution?

One to three years.

What do I think about the stability of the solution?

Seems quite stable, as such didn't face any issue.

What do I think about the scalability of the solution?

It is very stable, didn't face any performance issue.

Which solution did I use previously and why did I switch?

No when we were heard of Hadoop, we tried on that only. I mean tried to migrate from spreadsheets to Hadoop.

How was the initial setup?

Very straight forward. Typical Windows type installation...Next, next, next clicks.

What about the implementation team?

In-house.

What was our ROI?

Other department handles all these so I can't comment on that.

What's my experience with pricing, setup cost, and licensing?

 

Which other solutions did I evaluate?

Not really.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user374058 - PeerSpot reviewer
Vice President - Big Data and Delivery at a computer software company with 51-200 employees
Vendor
Cloudera Manager is a good tool to administer. Sometimes it gets confusing to follow a single path for installation.

What is most valuable?

  • Cloudera Manager for administering the Hadoop cluster
  • Cloudera specific solutions like Impala
  • Extensive documentation
  • Good user community

How has it helped my organization?

Implementing a Hadoop cluster has become relatively straight-forward using CDH. Administering it is also less complex. As a result, efforts spent in these areas are less than anticipated.

What needs improvement?

  • Some of the UI features seem confusing e.g. charts on the CM Services page
  • Sometimes it gets confusing to follow a single path for installation due to multiple recommended approaches e.g. parcels vs packages

For how long have I used the solution?

We have been using it for the last two years.

What was my experience with deployment of the solution?

Following a single path for installation becomes confusing due to multiple recommended approaches e.g. parcels vs packages.

What do I think about the stability of the solution?

Flume seems unstable and has to be restarted quite often.

What do I think about the scalability of the solution?

None as such

How are customer service and technical support?

We are mostly using Cloudera Express so we did not use their technical support. However, the Cloudera community is an active place and Cloudera representatives participate actively in understanding and resolving issues.

Which solution did I use previously and why did I switch?

Cloudera is a prominent player in the Hadoop space and we did not have a need to adopt a different solution. However, we are also looking to work on Hadoop and MapR

How was the initial setup?

Following a single path for installation was initially confusing due to multiple recommended approaches e.g. parcels vs. packages. However, after a while, we managed to master it. However, knoweldge of Cloudera Manager and Hadoop architecture is a must.

What about the implementation team?

We have our own team of consultants who are proficient in implementing it. The high level steps about the implementation remain the same; however, it is the environment specific issues which are challenging.

What was our ROI?

We haven't really measured ROI.

What's my experience with pricing, setup cost, and licensing?

Licensing price on per node basis for Cloudera seems to be pretty steep (based on the inputs we have received from Cloudera).

What other advice do I have?

It is user friendly and installation is pretty straightforward. Cloudera Manager is a good tool to administer it. However, configuration for specific requirements is sometimes pretty complex.

You should have a team which is knowledgeable in Hadoop. Do keep in mind that the product is still maturing so there are good chances that you will come across unexpected issues now and then.

Disclosure: My company has a business relationship with this vendor other than being a customer: We're Cloudera partners and regularly install CDH
PeerSpot user
it_user374703 - PeerSpot reviewer
Data Consultant with 10,001+ employees
Vendor
Features like Hive, Pig, Impala, Flume and Spark are valuable to us.

Valuable Features

Cloudera Manager is the most valuable feature for it’s ease of use, features, ease of upgrade and install components. CM can also be use to set up high availability within minutes. Others features like Hive, Pig, Impala, Flume and Spark are also valuable.

Improvements to My Organization

It's improved our storage and the availability of analytics tools such as Hive, Pig, Impala, and Spark helps us tremendously.

Room for Improvement

I'd like to see improvements to Impala. Also, it needs a more integrated environment with Spark, data warehouse, storage systems, cloud. Additionally, I'd want more UIs for components of ecosystem, preferably those UIs are centralized in a gateway.

Use of Solution

I've used it for 3.5 years.

Deployment Issues

For experimental and production clusters alike, use Cloudera Manager right from the beginning. RPM installation is good for learning.

Stability Issues

It has compatibility issues if installed in specialized hardware such as EMC Isilon or if node manager and data nodes are not co-located. For production, draw out a detailed plan on how to manage local repo for installation and upgrade. Never install from internet for production clusters.

Customer Service and Technical Support

Most of the clusters are for experimentation that don’t require support. For production clusters, implementations are through major vendors which are handled by them.

Initial Setup

It depends on mode of installation. Cloudera Manager is always more straight forward and manageable. Avoid RPM installation as much as possible. Lay out plans with system admin on upgrade plan, commission and decommission nodes. Investigate impact and consequences of having HBase and Hadoop in the same cluster or as separate cluster, what are the impacts on system admin, cost, upgrades, data migrations, resources, etc?

The complexity kicks in when performing parameter configurations. Find out what are the use cases, are there disk IO or compution IO bound, are there lots of structured data or unstructured data for text analytics, etc.

Implementation Team

Both vendor team and in-house depending on the cluster size and use cases. Some customers may require certain number of certified personnel, something to think about when choosing a partner.

Other Advice

Be prepared for fast changing landscape in how Hadoop works under the hood and how it is used. Each major release usually involved change of file system and data structure. How would they impact data migration. Ask questions like should they Upgrade or create a new cluster? Plans for training and skill upgrades.

Disclosure: My company has a business relationship with this vendor other than being a customer: We're a system integration partner.
PeerSpot user
it_user370224 - PeerSpot reviewer
Director of Data Management at a media company with 51-200 employees
Vendor
It gives us improved business intelligence reporting from daily to every two hours.

Valuable Features:

Faster runtime for batch jobs.

Improvements to My Organization:

Improved Business Intelligence reporting from daily to every two hours satisfying the business stakeholders who would favour transactional systems to draft reports because it had the latest data. 

The issue that arises using transactional systems with multiple version of truths across the enterprise. With faster turn-around time business stakeholders are now adopting the BI systems designed to give a cohesive view of the performance metrics important to them.

Room for Improvement:

Full Support for all Spark SQL features, support for SparkR, compatibility with Hive for DataFrame saved tables.

Cloudera CDH5.5.x does not support SparkR. SparkR, the integration of R models in API would be a great addition since this will enable fast near real-time analytical integration of R models with data feed.

The functionality in SparkSQL to save a DataFrame as a table in HIVE produces a table not compatible with HIVE. There is a workaround for this in creating the HIVE table first and then doing inserts.

Cloudera CDH5.5.x is a great product, but the adoption of additional features not currently supported will make the product even better but by no means subtract from its desirability.


Other Advice:

Do thorough research and ensure your use-cases or scale does not conflict with the system requirements and that those features that would make a difference are supported.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user363186 - PeerSpot reviewer
Team Lead / Data Architect at a tech services company with 51-200 employees
Consultant
​The Cloudera Manager administrator webpage simplifies the administration tasks.

What is most valuable?

The Cloudera Manager administrator webpage simplifies the administration tasks and helps to maintain a global overview of the cluster performance.

How has it helped my organization?

We are moving from an standard SQL environment (Oracle DataWarehouse) to a Big Data environment, and the Hadoop cluster will be the key of our new organization. It will allow to scale in an easy namer.

What needs improvement?

We found some difficulties when importing Hive tables from another Cluster.

I want to point the fact that we encounter many problems related to the cloud storage and how resources are managed. Our learning has been that, although it is quite simple to deploy single machines on the cloud, deploying clusters of machines is much more complex as many factors need to be considered: individual machines, connectivity across machines, storage.

For how long have I used the solution?

I've used it for three months.

What do I think about the stability of the solution?

We found some issues but were related with the hardware provider. For the moment I have not detected any problem from the Cloudera software point of view.

How are customer service and technical support?

Technical support is really efficient.

Which solution did I use previously and why did I switch?

We chose this product as it is considered a market standard and due to its wide documentation on the web. I evaluated other options but the fact that now it is becoming an standard for many companies helped me to choose this option.

How was the initial setup?

In the cloud environment where we deployed (Azure Resource Manager) there was a ready-to-deploy template which simplified a lot the initial set-up.

What about the implementation team?

We implemented with an in-house team. Our initial idea was to stop the cluster during the weekends and when there was no usage. However, we found strong difficulties and we were not able to start programmatically the whole cluster, so finally we left the cluster working all the time.

This issues were mainly related with the cloud provider and how this provider manages the resources for the cluster machines.

What was our ROI?

From our point of view it is a long-time investment. We hope to get the ROI in the following years.

What other advice do I have?

I am very comfortable with this product. The combination of Cloudera Manager administrator server, which allows the management of the Hadoop Cluster, and the Hue server, which simplifies the use make this product a current standard on the market. Perhaps it lacks a full integration of all its components.

Disclosure: My company has a business relationship with this vendor other than being a customer: My company has a partnership relation with the vendor.
PeerSpot user
it_user364473 - PeerSpot reviewer
R&D Solutions Architect at a tech vendor with 10,001+ employees
Real User
It has good ease of use in terms of integration within the Hadoop ecosystem related products.

Valuable Features

Enterprise resource management, ease of use in terms of integration within the Hadoop ecosystem related products, and security.

Room for Improvement

Mainly they have to continuously evolve following the technology trends and replace or adapt part of their solutions accordingly.

Use of Solution

We've used it since October 2012.

Deployment Issues

No issues encountered.

Stability Issues

No issues encountered.

Scalability Issues

No issues encountered.

Customer Service and Technical Support

Pretty responsive and reactive compared to their competitors in the field.

Initial Setup

It was extremely easy, and allowed less experienced personnel to get into the context pretty fast. Any difficulties/complexities faced were not related to the product itself rather than to the cluster infrastructure used.

Implementation Team

In our case it was an in-house team including data scientists and data engineers (management & QA as well). With the appropriate training and the support offered by the vendor, it is not that hard to implement a small to medium scale project solution. However, complexity and size varies significantly between projects; therefore, it really depends.

ROI

That is not easy to answer since Huawei has several divisions using the product in different ways. Again regarding pricing/licensing highly depends on the context and the aims of the given organization for instance the level of support they are going to need, the type of services they are going to provide, or even the business domain they are targeting.

Other Solutions Considered

There were two provider solutions that have been evaluated. However, the level of customer service and technical support from Cloudera was better than the first one, and the second solution licence pricing was higher compared to Cloudera’s pricing schema.

Other Advice

Cloudera is doing a great job in the field offering an enterprise ready data platform. Based on my experiences I would definitely recommend it.

Disclosure: My company has a business relationship with this vendor other than being a customer: We do have a partnership with Cloudera.
PeerSpot user
it_user364431 - PeerSpot reviewer
Consultant at a tech consulting company with 51-200 employees
Consultant
The Cloudera Hadoop manager eased the work of orchestrating scripts.

Valuable Features

Very solid. Excellent user experience. good documentation. The Cloudera Manager is definitely a deal breaker. Packaging for Ubuntu is great for all the components.

Improvements to My Organization

Before the introduction of Cloudera Manager (that actually works), all the orchestration was done with scripts and Chef, and inexperienced team members had difficulties to participate in maintenance. The Cloudera Hadoop manager eased the work.

Room for Improvement

More customization, better documentation for the API (basically it's the same for all Cloudera Hadoop components).

Use of Solution

I've used it for two years.

Deployment Issues

No issues encountered.

Stability Issues

No issues encountered.

Scalability Issues

No issues encountered.

Customer Service and Technical Support

Didn't use dedicated service or support. The documentation is a bit of a mess, but it is decent and sufficient.

Initial Setup

Straightforward. The CDH VirtualBox with preconfigured environment helps for demonstration purposes

Implementation Team

We did it in-house.

Other Solutions Considered

We also looked at Hortonworks, but chose Cloudera because of my familiarity with it.

Other Advice

Do a comparisomn with Hortonworks as it's always good to compare to another major vendor

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user357645 - PeerSpot reviewer
Data/Big Data Architect at a healthcare company with 1,001-5,000 employees
Vendor
We were trying AWS Impala as well, but Cloudera won as it had more functionality with HUE, Sqoop, and Solr as built-in functions. At times, heavy queries do not finish at all.

What is most valuable?

Mostly HUE, Impala, Sqoop, and Hive. The impala-shell command is number one.

How has it helped my organization?

We are working on research for genomic data looking for specific genes and variances. Even Hive was not good enough to process it correctly, only with Impala are we getting results quicker.

What needs improvement?

Sometimes the heavy queries do not finish at all. It would be good to see the progress of heavy script in the impala shell or get some way to access it.

For how long have I used the solution?

We started to use Cloudera about one-and-a-half years ago.

What do I think about the stability of the solution?

We are having some issues with stability and are speaking to Cloudera support.

How are customer service and technical support?

Customer Service:

It's acceptable.

Technical Support:

It's acceptable.

Which solution did I use previously and why did I switch?

We were trying AWS Impala as well, but Cloudera won as it had more functionality with HUE, Sqoop, and Solr as built-in functions.

How was the initial setup?

We have struggled a bit in installing and configuring Cloudera Manager on the AWS cluster. For now, it is good.

What about the implementation team?

We did the implementation only using our team and resources. It was a hard start, but an easy landing.

What other advice do I have?

Cloudera is good for mid to big company, but small ones can use AWS Impala/HUE. Go to training, or you are going to spend many hours to find short answers. The Cloudera solution is big with good documentation, but you need to know what and where to read first.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user356769 - PeerSpot reviewer
Director of Data Architecture at a financial services firm with 501-1,000 employees
Vendor
It has enabled us to move BI out of our OLTP database and build a data warehouse, but although Spark under rapid development, it needs improvement.

What is most valuable?

  • Cloudera Manager
  • Impala
  • Sentry

How has it helped my organization?

It has enabled us to move BI out of our OLTP database and build a data warehouse.

What needs improvement?

Some areas are under rapid development, like Spark.

For how long have I used the solution?

I've used it for three years.

What was my experience with deployment of the solution?

No issues with the current version.

What do I think about the stability of the solution?

No issues with the current version.

What do I think about the scalability of the solution?

No issues with the current version.

How are customer service and technical support?

Customer Service:

It's excellent.

Technical Support:

It's excellent.

Which solution did I use previously and why did I switch?

We switched because Cloudera just works.

How was the initial setup?

Cloudera Manager greatly simplifies initial setup.

What about the implementation team?

In-house.

What other advice do I have?

Make sure you have clearly articulated, doable use cases before you start.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user347787 - PeerSpot reviewer
Lead Instructor at a tech company with 501-1,000 employees
Vendor
It has fairly matured tools like Cloudera Navigator and Cloudera Manager, but it is lacking Spark SQL support.

Valuable Features:

The features I find most valuable are--

  • Enterprise security features (authentication, authorization, data governance, and data protection)
  • Proactive support 
  • Training

Improvements to My Organization:

  • Providing robust infrastructure
  • Fairly matured tools like Cloudera Navigator, Cloudera Manager, etc. 
  • Professional support enabled us to provide great customer service
  • Our clients are able to perform proactive maintenance in an efficient manner

Room for Improvement:

Spark with R integration is missing. Also, it is lacking Spark SQL support.

Use of Solution:

I've used it for over eight months.

Deployment Issues:

We faced issues in deploying Azure with Cloudera. Our machine hard disks were getting corrupted whenever we used to get patches on weekends. Now these have been resolved.

Customer Service:

They offer excellent support.

Initial Setup:

It was complex because we were doing first time deployment of Cloudera on Azure. Also complexity was high due to lot of security features.

Implementation Team:

We are Big Data consultants, so we implement it.

Other Solutions Considered:

Cloudera is a leader in providing distributions for Hadoop so it was no brainer for us to decide.

Other Advice:

There were initial hiccups when deploying Cloudera on Azure but now this combo is working fine in production, so you can go for it.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user347592 - PeerSpot reviewer
Senior Analyst - Strategy Analytics at a consultancy with 10,001+ employees
Consultant
We were able to utilize data which was untapped previously, but the documentation on Hive could be more standardized.

What is most valuable?

The features we've found most valuable are--

  • Fast processing of data
  • Easy to manipulate using HiveQL

How has it helped my organization?

We were able to utilize data which was untapped previously. We've got great use cases now to drive business revenue.

What needs improvement?

It needs more standardized documentation on Hive.

For how long have I used the solution?

I've used it for two and a half years.

How are customer service and technical support?

Customer Service:

It's great.

Technical Support:

The level of technical support is great.

Which solution did I use previously and why did I switch?

No previous solution was used, and senior management chose to bring it in.

How was the initial setup?

I was not directly involved in deployment.

What about the implementation team?

It was done by the vendor team, who were great.

What other advice do I have?

It's good for Big Data analytics.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
Software Design Engineer at a marketing services firm with 501-1,000 employees
Vendor
It automates the installation and configuration of Hadoop, but it should not provide generic logs for failed installations.

What is most valuable?

It automates the installation and configuration of Hadoop and different Big Data services.

What needs improvement?

We're currently trying to perform a failed installation and it's little bit difficult. It should restart the installation where it left off.

For how long have I used the solution?

I've used it for two years.

What was my experience with deployment of the solution?

  • In some cases, logs are clear about failed services.
  • While deploying in some failed steps it should not provide generic logs.

How are customer service and technical support?

7/10 - they have forums where they will answer your query within a day.

Which solution did I use previously and why did I switch?

We previously used Hortonworks and changed because Cloudera is simpler and more interactive.

How was the initial setup?

It was very straightforward.

What about the implementation team?

We did it in-house. They have good technical support to help with implementation.

What's my experience with pricing, setup cost, and licensing?

We use the free version, and they provide everything we need.

What other advice do I have?

Implement the free version as it provides enough services. If you want a backup service, or any extra service, then you can implement the enterprise version.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user347565 - PeerSpot reviewer
Lead Bigdata Developer at a tech services company with 10,001+ employees
Consultant
We used it to build an enterprise data hub, but Apache Kudu needs improvement.

Valuable Features:

The most valuable feature for me are--

  • Sentry - provides granular-level security
  • Impala - open-source, MPP database

Improvements to My Organization:

We used it to build an enterprise data hub.

Room for Improvement:

Apache Kudu needs improvement. It's a real-time updatable database.

Implementation Team:

We used a vendor team to implement the solution.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user347535 - PeerSpot reviewer
Software Engineer at a tech services company with 501-1,000 employees
Consultant
It provides the ability to update configuration through the UI. I think licensing by size of data managed would be a useful improvement.

Valuable Features

The features most valuable to me are--

  • Installation (very easy initial setup)
  • Configuration
  • Ability to update configuration through UI

Improvements to My Organization

It made Hadoop easy to use and made it easy to get started.

Room for Improvement

The licensing was by node. I think licensing by size of data managed would be a useful improvement.

Use of Solution

I used Cloudera Manager to evaluate Hadoop and HBase for one year.

Deployment Issues

No issues encountered.

Stability Issues

No issues encountered.

Scalability Issues

No issues encountered.

Customer Service and Technical Support

Customer Service:

It's excellent.

Technical Support:

It's excellent.

Initial Setup

It was very easy.

Implementation Team

It was implemented in-house.

Other Solutions Considered

We compared it to Amazon EMR but found Cloudera Manager to be more functional.

Other Advice

It's a great product and must be evaluated if you are planning to use Hadoop..

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user347172 - PeerSpot reviewer
System Engineer at a tech company with 10,001+ employees
Vendor
For the clusters using CM, we are able to more tightly control and manage the configuration of all nodes in the clusters. But, it has HBase 1.0 stability issues and processing speed needs improvement.

What is most valuable?

  • Cluster rolling restarts 
  • Cluster wide configuration management

How has it helped my organization?

For the clusters using CM, we are able to more tightly control and manage the configuration of all nodes in the clusters. 

We are currently running six production clusters totaling 900+ nodes, and are building three more clusters. Knowing that if someone has some custom configuration on a node that they haven’t communicated out, and that I can ignore that configuration and bring that node into line with where we’ve decided to run the cluster, is very beneficial.

What needs improvement?

HBase 1.0 stability issues and processing speed is a major area for improvement. Right now, our Cloudera 5 clusters run four to seven times slower than our Cloudera 4 clusters using our storm and kafka topologies, which causes real-time processing to be a major challenge.

CM’s API is very limited and difficult when used on multiple clusters in the same CM instance

For how long have I used the solution?

We've used it for approximately two years. We also use Cloudera Manager, which is 6/10.

What was my experience with deployment of the solution?

No issues encountered.

What do I think about the stability of the solution?

Cloudera 5 is currently very unstable. Between two Cloudera 5 clusters, we have an incident at least twice a week due to what are now outstanding bugs.

What do I think about the scalability of the solution?

It's very easy to deploy and scale as large as you want. Once created on the CM management cluster, is difficult to scale up as needed, as you add more clusters to the same CM instance.

Which solution did I use previously and why did I switch?

No previous solution was used.

How was the initial setup?

We were already running one production cluster with approximately 75 nodes when I joined, so I’m not familiar with what was needed to get the initial production cluster up. Once I joined, I assisted in standing up the additional nodes and clusters using our chef automation.

What about the implementation team?

In house via chef automation. Chef, or similar systems, makes it much simpler to stand up large scale clusters. That said, I have not used or evaluated vendor team implementation methods.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user2700 - PeerSpot reviewer
Architect at a marketing services firm with 501-1,000 employees
Vendor
Cloudera Manager Hadoop Cluster Installation Evaluation

I decided to give Cloudera's Manager software a try, and was pleasantly surprised at how simple it becomes to deploy a substantial Hadoop cluster.

I began by creating an automated kickstart installer for RHEL 6.2 (booting off a custom isolinux image created for this purpose), with all of the required packages, so that from server power on to creating a 20+ node cluster takes less than 15 minutes. The limitation for the number of concurrent node installs is based on network and disk i/o bottlenecks on the deployment server. If you wanted to PXE boot the cluster in a production environment, you would want a bank of servers behind a load balancer, optimally.

Once the Manager is installed on the master node, you simply log into the administration webpage, and from there, add all of the hosts to deploy the cluster on. One nice discovery was that it takes advantage of regular expressions for host names or IP addresses, so you can literally create a cluster containing hundreds of nodes with a trivial amount of effort.

Once the software is deployed, you can select the roles for each of the servers. It's an incredibly painless deployment. That being said, it is not without its flaws.

One of the primary flaws is that all of the configuration and log files are in non-standard locations, and are split in non-standard ways. It's obvious from the way that the files are arranged that it simplifies programmatic deployment. It also makes it a bit harder for a human who is used to standard Hadoop deployments to figure out where everything is located.

And finally, I discovered a bug with one of the packaged software products, Oozie. One of the resource files, oozie-bundle-0.1.xsd contains an invalid regular expression on line 22. I haven't tracked down the behavior, but for some reason JDK 1.6.30 will parse that invalid regex, but JDK 1.7U2 will exit with errors. Naturally, I was running JDK 1.7U2, so it took me a little extra time to debug the problem.

Overall, I quite liked Cloudera's Manager. It's certainly one of the better cluster deployment products I've seen.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user217290 - PeerSpot reviewer
it_user217290Senior DBA Consultant at a tech services company with 10,001+ employees
Consultant

Hi

Can I have Cloudera's Manager software for free to test and deploy it on a sandBox to work on a POC purposes.

Buyer's Guide
Download our free Hadoop Report and find out what your peers are saying about Cloudera, IBM, Amazon, and more!
Updated: September 2022
Product Categories
Hadoop NoSQL Databases
Buyer's Guide
Download our free Hadoop Report and find out what your peers are saying about Cloudera, IBM, Amazon, and more!