Senior Software Engineer at a tech services company with 10,001+ employees
Real User
Performs well and the technical support is helpful, but the upgrade process needs to be consolidated
Pros and Cons
  • "The most valuable feature is Impala, the querying engine, which is very fast."
  • "There is a maximum of a one-gigabyte block size, which is an area of storage that can be improved upon."

What is our primary use case?

We are dealing with data from the telecom industry. We were using an Oracle system but our volume has increased. We now have a lot of real-time data that needs to be transformed so that it can be made available and used.

What is most valuable?

The most valuable feature is Impala, the querying engine, which is very fast. We have been able to work with one terabyte of data in less than 20 minutes. The speed makes it easy for us to process all of the data that comes in, in time.

The support is very good.

All of the data has automatic triple replication in order to secure integrity.

What needs improvement?

There is a maximum of a one-gigabyte block size, which is an area of storage that can be improved upon.

When we are upgrading CDH, there are many things that need to be upgraded and it would be helpful if it were bundled. As it is now, we have to upgrade many different things separately.

For how long have I used the solution?

I have been working with the Cloudera Distribution for Hadoop for around two years.

Buyer's Guide
Cloudera Distribution for Hadoop
May 2024
Learn what your peers think about Cloudera Distribution for Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: May 2024.
769,662 professionals have used our research since 2012.

What do I think about the stability of the solution?

It is a stable solution.

What do I think about the scalability of the solution?

The scalability is good and it works on commodity hardware. One of the problems we have right now is that there is a lot of data and we're moving it from our Oracle solution. This means that there is a double cost, in terms of storage, during our transition to working with big data.

We are using a data lake that is a store for all of the data in our organization. There are more than25 projects, with between 25 and 30 people in each one, for a total of almost 1,000 people. All of them are dependent on this solution.

Most of our users are technicians who have problems to solve using the data available to them. A couple of them are data scientists and the remainder are upper management, who do the analysis.

How are customer service and support?

The technical support is very good. Whenever we open a ticket, we get support right away.

Which solution did I use previously and why did I switch?

We did use another solution prior to this one but it could not keep up with our increase in data.

What other advice do I have?

This suitability of this solution depends on the size of the data that you are going to be working with. If you have going to be working with a huge dataset that contains many gigabytes of data then this is a good solution. For smaller datasets, you should also consider other technologies.

My advice for anybody who is implementing this solution is to take some time to learn it. Beyond that, be sure to contact support if you have any problems because they are very helpful.

I would rate this solution a seven out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user363186 - PeerSpot reviewer
Team Lead / Data Architect at a tech services company with 51-200 employees
Consultant
​The Cloudera Manager administrator webpage simplifies the administration tasks.

What is most valuable?

The Cloudera Manager administrator webpage simplifies the administration tasks and helps to maintain a global overview of the cluster performance.

How has it helped my organization?

We are moving from an standard SQL environment (Oracle DataWarehouse) to a Big Data environment, and the Hadoop cluster will be the key of our new organization. It will allow to scale in an easy namer.

What needs improvement?

We found some difficulties when importing Hive tables from another Cluster.

I want to point the fact that we encounter many problems related to the cloud storage and how resources are managed. Our learning has been that, although it is quite simple to deploy single machines on the cloud, deploying clusters of machines is much more complex as many factors need to be considered: individual machines, connectivity across machines, storage.

For how long have I used the solution?

I've used it for three months.

What do I think about the stability of the solution?

We found some issues but were related with the hardware provider. For the moment I have not detected any problem from the Cloudera software point of view.

How are customer service and technical support?

Technical support is really efficient.

Which solution did I use previously and why did I switch?

We chose this product as it is considered a market standard and due to its wide documentation on the web. I evaluated other options but the fact that now it is becoming an standard for many companies helped me to choose this option.

How was the initial setup?

In the cloud environment where we deployed (Azure Resource Manager) there was a ready-to-deploy template which simplified a lot the initial set-up.

What about the implementation team?

We implemented with an in-house team. Our initial idea was to stop the cluster during the weekends and when there was no usage. However, we found strong difficulties and we were not able to start programmatically the whole cluster, so finally we left the cluster working all the time.

This issues were mainly related with the cloud provider and how this provider manages the resources for the cluster machines.

What was our ROI?

From our point of view it is a long-time investment. We hope to get the ROI in the following years.

What other advice do I have?

I am very comfortable with this product. The combination of Cloudera Manager administrator server, which allows the management of the Hadoop Cluster, and the Hue server, which simplifies the use make this product a current standard on the market. Perhaps it lacks a full integration of all its components.

Disclosure: My company has a business relationship with this vendor other than being a customer: My company has a partnership relation with the vendor.
PeerSpot user
Buyer's Guide
Cloudera Distribution for Hadoop
May 2024
Learn what your peers think about Cloudera Distribution for Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: May 2024.
769,662 professionals have used our research since 2012.
it_user347172 - PeerSpot reviewer
System Engineer at a tech company with 10,001+ employees
Vendor
For the clusters using CM, we are able to more tightly control and manage the configuration of all nodes in the clusters. But, it has HBase 1.0 stability issues and processing speed needs improvement.

What is most valuable?

  • Cluster rolling restarts 
  • Cluster wide configuration management

How has it helped my organization?

For the clusters using CM, we are able to more tightly control and manage the configuration of all nodes in the clusters. 

We are currently running six production clusters totaling 900+ nodes, and are building three more clusters. Knowing that if someone has some custom configuration on a node that they haven’t communicated out, and that I can ignore that configuration and bring that node into line with where we’ve decided to run the cluster, is very beneficial.

What needs improvement?

HBase 1.0 stability issues and processing speed is a major area for improvement. Right now, our Cloudera 5 clusters run four to seven times slower than our Cloudera 4 clusters using our storm and kafka topologies, which causes real-time processing to be a major challenge.

CM’s API is very limited and difficult when used on multiple clusters in the same CM instance

For how long have I used the solution?

We've used it for approximately two years. We also use Cloudera Manager, which is 6/10.

What was my experience with deployment of the solution?

No issues encountered.

What do I think about the stability of the solution?

Cloudera 5 is currently very unstable. Between two Cloudera 5 clusters, we have an incident at least twice a week due to what are now outstanding bugs.

What do I think about the scalability of the solution?

It's very easy to deploy and scale as large as you want. Once created on the CM management cluster, is difficult to scale up as needed, as you add more clusters to the same CM instance.

Which solution did I use previously and why did I switch?

No previous solution was used.

How was the initial setup?

We were already running one production cluster with approximately 75 nodes when I joined, so I’m not familiar with what was needed to get the initial production cluster up. Once I joined, I assisted in standing up the additional nodes and clusters using our chef automation.

What about the implementation team?

In house via chef automation. Chef, or similar systems, makes it much simpler to stand up large scale clusters. That said, I have not used or evaluated vendor team implementation methods.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user2700 - PeerSpot reviewer
Architect at a marketing services firm with 501-1,000 employees
Vendor
Cloudera Manager Hadoop Cluster Installation Evaluation

I decided to give Cloudera's Manager software a try, and was pleasantly surprised at how simple it becomes to deploy a substantial Hadoop cluster.

I began by creating an automated kickstart installer for RHEL 6.2 (booting off a custom isolinux image created for this purpose), with all of the required packages, so that from server power on to creating a 20+ node cluster takes less than 15 minutes. The limitation for the number of concurrent node installs is based on network and disk i/o bottlenecks on the deployment server. If you wanted to PXE boot the cluster in a production environment, you would want a bank of servers behind a load balancer, optimally.

Once the Manager is installed on the master node, you simply log into the administration webpage, and from there, add all of the hosts to deploy the cluster on. One nice discovery was that it takes advantage of regular expressions for host names or IP addresses, so you can literally create a cluster containing hundreds of nodes with a trivial amount of effort.

Once the software is deployed, you can select the roles for each of the servers. It's an incredibly painless deployment. That being said, it is not without its flaws.

One of the primary flaws is that all of the configuration and log files are in non-standard locations, and are split in non-standard ways. It's obvious from the way that the files are arranged that it simplifies programmatic deployment. It also makes it a bit harder for a human who is used to standard Hadoop deployments to figure out where everything is located.

And finally, I discovered a bug with one of the packaged software products, Oozie. One of the resource files, oozie-bundle-0.1.xsd contains an invalid regular expression on line 22. I haven't tracked down the behavior, but for some reason JDK 1.6.30 will parse that invalid regex, but JDK 1.7U2 will exit with errors. Naturally, I was running JDK 1.7U2, so it took me a little extra time to debug the problem.

Overall, I quite liked Cloudera's Manager. It's certainly one of the better cluster deployment products I've seen.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user217290 - PeerSpot reviewer
it_user217290Senior DBA Consultant at a tech services company with 10,001+ employees
Consultant

Hi

Can I have Cloudera's Manager software for free to test and deploy it on a sandBox to work on a POC purposes.

it_user370224 - PeerSpot reviewer
Director of Data Management at a media company with 51-200 employees
Vendor
It gives us improved business intelligence reporting from daily to every two hours.

Valuable Features:

Faster runtime for batch jobs.

Improvements to My Organization:

Improved Business Intelligence reporting from daily to every two hours satisfying the business stakeholders who would favour transactional systems to draft reports because it had the latest data. 

The issue that arises using transactional systems with multiple version of truths across the enterprise. With faster turn-around time business stakeholders are now adopting the BI systems designed to give a cohesive view of the performance metrics important to them.

Room for Improvement:

Full Support for all Spark SQL features, support for SparkR, compatibility with Hive for DataFrame saved tables.

Cloudera CDH5.5.x does not support SparkR. SparkR, the integration of R models in API would be a great addition since this will enable fast near real-time analytical integration of R models with data feed.

The functionality in SparkSQL to save a DataFrame as a table in HIVE produces a table not compatible with HIVE. There is a workaround for this in creating the HIVE table first and then doing inserts.

Cloudera CDH5.5.x is a great product, but the adoption of additional features not currently supported will make the product even better but by no means subtract from its desirability.


Other Advice:

Do thorough research and ensure your use-cases or scale does not conflict with the system requirements and that those features that would make a difference are supported.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user374058 - PeerSpot reviewer
Vice President - Big Data and Delivery at a computer software company with 51-200 employees
Vendor
Cloudera Manager is a good tool to administer. Sometimes it gets confusing to follow a single path for installation.

What is most valuable?

  • Cloudera Manager for administering the Hadoop cluster
  • Cloudera specific solutions like Impala
  • Extensive documentation
  • Good user community

How has it helped my organization?

Implementing a Hadoop cluster has become relatively straight-forward using CDH. Administering it is also less complex. As a result, efforts spent in these areas are less than anticipated.

What needs improvement?

  • Some of the UI features seem confusing e.g. charts on the CM Services page
  • Sometimes it gets confusing to follow a single path for installation due to multiple recommended approaches e.g. parcels vs packages

For how long have I used the solution?

We have been using it for the last two years.

What was my experience with deployment of the solution?

Following a single path for installation becomes confusing due to multiple recommended approaches e.g. parcels vs packages.

What do I think about the stability of the solution?

Flume seems unstable and has to be restarted quite often.

What do I think about the scalability of the solution?

None as such

How are customer service and technical support?

We are mostly using Cloudera Express so we did not use their technical support. However, the Cloudera community is an active place and Cloudera representatives participate actively in understanding and resolving issues.

Which solution did I use previously and why did I switch?

Cloudera is a prominent player in the Hadoop space and we did not have a need to adopt a different solution. However, we are also looking to work on Hadoop and MapR

How was the initial setup?

Following a single path for installation was initially confusing due to multiple recommended approaches e.g. parcels vs. packages. However, after a while, we managed to master it. However, knoweldge of Cloudera Manager and Hadoop architecture is a must.

What about the implementation team?

We have our own team of consultants who are proficient in implementing it. The high level steps about the implementation remain the same; however, it is the environment specific issues which are challenging.

What was our ROI?

We haven't really measured ROI.

What's my experience with pricing, setup cost, and licensing?

Licensing price on per node basis for Cloudera seems to be pretty steep (based on the inputs we have received from Cloudera).

What other advice do I have?

It is user friendly and installation is pretty straightforward. Cloudera Manager is a good tool to administer it. However, configuration for specific requirements is sometimes pretty complex.

You should have a team which is knowledgeable in Hadoop. Do keep in mind that the product is still maturing so there are good chances that you will come across unexpected issues now and then.

Disclosure: My company has a business relationship with this vendor other than being a customer: We're Cloudera partners and regularly install CDH
PeerSpot user
Engineering Manager/Solution architect at a computer software company with 201-500 employees
Real User
Preferred solution for on-prem
Pros and Cons
  • "Cloudera is a very manageable solution with good support."
  • "The initial setup of Cloudera is difficult."

What is our primary use case?

We are a distributor for Hadoop. Our customers choose whether they would like to use Cloudera or another product.

Cloudera Distribution is deployed on-premise as well as on bare metal servers in AWS.

What is most valuable?

Cloudera is a very manageable solution with good support.

What needs improvement?

When you compare Cloudera with EMR, EMR has a lot of administrative features, so you don't need to manage the solution. Cloudera is not as easy, as it requires more DevOps resources than other solutions.

For how long have I used the solution?

We have been offering this solution for five years.

What do I think about the stability of the solution?

Cloudera Distribution is stable.

What do I think about the scalability of the solution?

This is a scalable solution. We have clients that have a large installation of Cloudera.

How are customer service and support?

Technical support from Cloudera is fine.

How was the initial setup?

The initial setup of Cloudera is difficult. After you have installed it once, it is not difficult to reproduce.

What about the implementation team?

For a POC deployment, we required only one DevOps. On larger-scale implementation, we also require a data engineer. 

What's my experience with pricing, setup cost, and licensing?

Cloudera requires a license to use.

Which other solutions did I evaluate?

We looked at EMR, however Cloudera is better when using OnPrem.

What other advice do I have?

Cloudera is one of the best solutions for on-prem. 

I would rate this solution an 8 out of 10.

Which deployment model are you using for this solution?

Hybrid Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
it_user364473 - PeerSpot reviewer
R&D Solutions Architect at a tech vendor with 10,001+ employees
Real User
It has good ease of use in terms of integration within the Hadoop ecosystem related products.

Valuable Features

Enterprise resource management, ease of use in terms of integration within the Hadoop ecosystem related products, and security.

Room for Improvement

Mainly they have to continuously evolve following the technology trends and replace or adapt part of their solutions accordingly.

Use of Solution

We've used it since October 2012.

Deployment Issues

No issues encountered.

Stability Issues

No issues encountered.

Scalability Issues

No issues encountered.

Customer Service and Technical Support

Pretty responsive and reactive compared to their competitors in the field.

Initial Setup

It was extremely easy, and allowed less experienced personnel to get into the context pretty fast. Any difficulties/complexities faced were not related to the product itself rather than to the cluster infrastructure used.

Implementation Team

In our case it was an in-house team including data scientists and data engineers (management & QA as well). With the appropriate training and the support offered by the vendor, it is not that hard to implement a small to medium scale project solution. However, complexity and size varies significantly between projects; therefore, it really depends.

ROI

That is not easy to answer since Huawei has several divisions using the product in different ways. Again regarding pricing/licensing highly depends on the context and the aims of the given organization for instance the level of support they are going to need, the type of services they are going to provide, or even the business domain they are targeting.

Other Solutions Considered

There were two provider solutions that have been evaluated. However, the level of customer service and technical support from Cloudera was better than the first one, and the second solution licence pricing was higher compared to Cloudera’s pricing schema.

Other Advice

Cloudera is doing a great job in the field offering an enterprise ready data platform. Based on my experiences I would definitely recommend it.

Disclosure: My company has a business relationship with this vendor other than being a customer: We do have a partnership with Cloudera.
PeerSpot user
Buyer's Guide
Download our free Cloudera Distribution for Hadoop Report and get advice and tips from experienced pros sharing their opinions.
Updated: May 2024
Product Categories
Hadoop NoSQL Databases
Buyer's Guide
Download our free Cloudera Distribution for Hadoop Report and get advice and tips from experienced pros sharing their opinions.