IT Central Station is now PeerSpot: Here's why

Cloudera Distribution for Hadoop OverviewUNIXBusinessApplication

Cloudera Distribution for Hadoop is #2 ranked solution in top Hadoop tools and top NoSQL Databases. PeerSpot users give Cloudera Distribution for Hadoop an average rating of 7.4 out of 10. Cloudera Distribution for Hadoop is most commonly compared to Amazon EMR: Cloudera Distribution for Hadoop vs Amazon EMR. Cloudera Distribution for Hadoop is popular among the large enterprise segment, accounting for 72% of users researching this solution on PeerSpot. The top industry researching this solution are professionals from a computer software company, accounting for 24% of all views.
Buyer's Guide

Download the Hadoop Buyer's Guide including reviews and more. Updated: July 2022

What is Cloudera Distribution for Hadoop?
Cloudera Distribution for Hadoop is the world's most complete, tested, and popular distribution of Apache Hadoop and related projects. CDH is 100% Apache-licensed open source and is the only Hadoop solution to offer unified batch processing, interactive SQL, and interactive search, and role-based access controls. More enterprises have downloaded CDH than all other such distributions combined.
Cloudera Distribution for Hadoop Customers
37signals, Adconion,adgooroo, Aggregate Knowledge, AMD, Apollo Group, Blackberry, Box, BT, CSC
Cloudera Distribution for Hadoop Video

Cloudera Distribution for Hadoop Pricing Advice

What users are saying about Cloudera Distribution for Hadoop pricing:
  • "Cloudera Distribution for Hadoop is expensive, with support costs involved."
  • "When comparing with Oracle Sybase and SQL, it's cheaper. It's not expensive."
  • "I haven't bought a license for this solution. I'm only using the Apache license version."
  • "Cloudera requires a license to use."
  • Cloudera Distribution for Hadoop Reviews

    Filter by:
    Filter Reviews
    Industry
    Loading...
    Filter Unavailable
    Company Size
    Loading...
    Filter Unavailable
    Job Level
    Loading...
    Filter Unavailable
    Rating
    Loading...
    Filter Unavailable
    Considered
    Loading...
    Filter Unavailable
    Order by:
    Loading...
    • Date
    • Highest Rating
    • Lowest Rating
    • Review Length
    Search:
    Showingreviews based on the current filters. Reset all filters
    Vice President at a financial services firm with 10,001+ employees
    Real User
    Top 20
    Stores large volumes of data and makes log analytics, monitoring, and management easier, but its feature list is limited
    Pros and Cons
    • "We're now able to store large volumes of data through Cloudera Distribution for Hadoop. We're able to push large volumes of data to the platform, and that used to be a challenge, especially when storing a terabyte of information. This is the area where Cloudera Distribution for Hadoop improved the organization."
    • "Cloudera Distribution for Hadoop has a limited feature list and a lot of costs involved."

    What is our primary use case?

    In my previous organization, we used Cloudera Distribution for Hadoop
    for compiling website logs and application logs. We used it for log analytics.

    How has it helped my organization?

    We're now able to store large volumes of data through Cloudera Distribution for Hadoop. We're able to push large volumes of data to the platform, and that used to be a challenge, especially when storing a terabyte of information. This is the area where Cloudera Distribution for Hadoop improved the organization.

    What is most valuable?

    The feature I found most valuable in Cloudera Distribution for Hadoop is the Cloudera Manager. It's a good component because it makes log management easy. It's really useful as a management and monitoring console.

    What needs improvement?

    The setup and administration were not easy with Cloudera Distribution for Hadoop. They could be improved.

    The solution has a limited feature list, so having more features is something I'd like to see in the next release of Cloudera Distribution for Hadoop.

    Buyer's Guide
    Hadoop
    July 2022
    Find out what your peers are saying about Cloudera, IBM, Amazon and others in Hadoop. Updated: July 2022.
    621,703 professionals have used our research since 2012.

    For how long have I used the solution?

    I've been using Cloudera Distribution for Hadoop for two years. I'm still using it.

    What do I think about the stability of the solution?

    Cloudera Distribution for Hadoop seems to be a stable product.

    What do I think about the scalability of the solution?

    Cloudera Distribution for Hadoop is really easy to scale. We can add more servers to it, so it's scalable.

    How are customer service and support?

    I don't have experience contacting the technical support team of Cloudera Distribution for Hadoop.

    How was the initial setup?

    The initial setup for Cloudera Distribution for Hadoop was easy for us because we outsourced the work to the vendor. All the nitty-gritty was taken care of by them.

    What about the implementation team?

    We implemented Cloudera Distribution for Hadoop through the vendor. Deployment was done by an integrator. It usually doesn't take a lot of time. It usually takes just a day to deploy the solution.

    Our implementation strategy for Cloudera Distribution for Hadoop was more into outsourcing. For example, the hardware, including its management, was outsourced, so the admin, data management, support, etc., were also outsourced. We were looking into having the application done in-house, with the team. We were looking at a one-year implementation plan to move more and more governance and data sets into Cloudera Distribution for Hadoop. Every quarter, we planned to have other features reintroduced into the platform.

    Two people did the installation and two people did the deployment. It was deployed in a single location, and we initially had ten users of Cloudera Distribution for Hadoop.

    What was our ROI?

    It's tricky to derive the ROI from Cloudera Distribution for Hadoop, because in analytics, it's a little difficult to determine that this is the investment, and we're increasing the footprints and the revenue. It's very difficult to evaluate.

    What's my experience with pricing, setup cost, and licensing?

    Cloudera Distribution for Hadoop is expensive. There are a lot of costs involved. For example: apart from the standard licensing fees, there are support costs involved, and support could be for three years, five years, etc., so support is a pretty large part of the contract.

    Which other solutions did I evaluate?

    We didn't evaluate other options before choosing Cloudera Distribution for Hadoop.

    What other advice do I have?

    I'm using Cloudera Distribution for Hadoop.

    The advice I would give to others looking into implementing or using Cloudera Distribution for Hadoop is for them to opt for a cloud variant, particularly something scalable for Azure, because of the ease of deployment and ease of setup. Procuring Cloudera Distribution for Hadoop is also a challenge unless the customer goes for its cloud version.

    I would rate Cloudera Distribution for Hadoop six out of ten because of its limited features. If they can enhance their feature list, that would improve their score.

    Which deployment model are you using for this solution?

    On-premises
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    Flag as inappropriate
    PeerSpot user
    AD - Associate Director at a financial services firm with 10,001+ employees
    Real User
    Top 10
    Feature rich and scalable with good support, but there are performance issues and the security could be improved
    Pros and Cons
    • "The main advantage is the storage is less expensive."
    • "Currently, we are using many other tools such as Spark and Blade Job to improve the performance."

    What is our primary use case?

    We are using this solution for storing Big Data in one centralized location.

    How has it helped my organization?

    It has been helpful in allowing data storage in one centralized location with data lakes and all of the surrounding applications.

    All of the data processes are being stored into the Big Data Lake.

    What is most valuable?

    It allows us to store huge amounts of data, which is an advantage.

    They have BI (Business Intelligence) tools. There are many AI tools.

    We are able to connect and analyze the data to get reports. The reports are very good.

    The main advantage is the storage is less expensive.

    What needs improvement?

    The performance can be improved. We have experienced some performance issues. It is not as sophisticated as Oracle Sybase.

    Currently, we are using many other tools such as Spark and Blade Job to improve the performance.

    The setup could be simplified, it's complex.

    The security needs to be improved.

    For how long have I used the solution?

    I have been using this solution since 2015.

    What do I think about the stability of the solution?

    It's a stable solution.

    What do I think about the scalability of the solution?

    Scalability is good. It's replicated and by default, with Big Data there is a replication factor.

    Over the years we have grown, when we started we had 10 nodes now we have increased to a large number of nodes.

    How are customer service and technical support?

    Technical support is good. I have been able to learn from them. As a developer, I am learning every day.

    I would rate the technical support a ten out of ten.

    Which solution did I use previously and why did I switch?

    Previously we were using Oracle Sybase SQL. We switched because now, we have introduced Big Data.

    How was the initial setup?

    The initial setup was complex.

    It's not as simple as Oracle Sybase.

    It's a complex architecture because you have raw data and many engines.

    What's my experience with pricing, setup cost, and licensing?

    When comparing with Oracle Sybase and SQL, it's cheaper. It's not expensive.

    What other advice do I have?

    I am a part of security and software development. 

    We are currently considering migrating to the cloud, and planning on using Microsoft Azure, mainly for the Big Data component.

    I would rate this solution a five out of ten.

    Which deployment model are you using for this solution?

    On-premises
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    Buyer's Guide
    Hadoop
    July 2022
    Find out what your peers are saying about Cloudera, IBM, Amazon and others in Hadoop. Updated: July 2022.
    621,703 professionals have used our research since 2012.
    Senior Data Architect Manager at Unifonic
    Real User
    Top 5
    Great being able to manage the security layer using the shared SDX which provides flexibility
    Pros and Cons
    • "With a cluster available, you can manage the security layer using the shared SDX - it provides flexibility."
    • "This is a very expensive solution."

    What is our primary use case?

    This product is a framework for edge AI, it comes with multiple ecosystems as a project. I'm a senior data architect manager and we are consultants. We offer Cloudera to our customers but we don't have a partnership with them. 

    What is most valuable?

    The best feature is the layer shared experience. If you have a cluster available on-prem or in the cloud, you can manage that security layer using the shared SDX and it provides flexibility. New features are constantly being added. 

    What needs improvement?

    The only thing that needs improvement is the cost, it's a very expensive solution and one of the main reasons companies are not attracted to the product. 

    What do I think about the stability of the solution?

    This product has been around for a long time so it's very mature and stable.

    What do I think about the scalability of the solution?

    The scalability is very good. 

    How was the initial setup?

    The initial setup has become easier although you need a dedicated admin to maintain and manage the solution because it's a framework and not a single product. Deployment nowadays is much smoother with the PaaS offering in the public cloud, so you can carry out the deployment with an in-house team. The deployment only takes a day but a company is unlikely to go with the default so the solution needs fine-tuning which can take a couple of weeks. 

    What's my experience with pricing, setup cost, and licensing?

    For enterprise organizations that can bear the cost, it's a good solution. A smaller company wouldn't be able to afford the licensing fees. You can get a free trial for 60 days. They'll never have a community version because they're the only ones in the market offering this kind of framework. 

    What other advice do I have?

    I rate this solution nine out of 10. 

    Disclosure: My company has a business relationship with this vendor other than being a customer: Consultant
    Flag as inappropriate
    PeerSpot user
    Associate Manager at a consultancy with 501-1,000 employees
    Real User
    Top 5Leaderboard
    Easy to install, good technical support, and with a single script we can run jobs within minutes

    What is our primary use case?

    We use this solution to process data.

    When using an SQL Server you have to build indexes and you need to fine-tune the data.

    We import the data that is in the SQL Source.

    With a single script, we are able to run the jobs within minutes, which is an advantage.

    We are using the Power BI model for the business convention. The performance in Power BI will be reduced if you incorporate more calculations. Those calculations are captured in the Hadoop layer and processed.

    What needs improvement?

    It could be faster and more user-friendly.

    For how long have I used the solution?

    I have been using this solution for seven months.

    What do I think about the stability of the solution?

    It's a stable product. I don't see any performance issues.

    What do I think about the scalability of the solution?

    This solution is scalable. We have 40 users for different projects in our organization.

    We will continue to use this solution.

    How are customer service and technical support?

    Technical support is good.

    Which solution did I use previously and why did I switch?

    I didn't use any other product.

    How was the initial setup?

    The installing is straightforward.

    Our clients provide us with the access to use it directly.

    Once you have been given access to the edge nodes we are able to run the scripts in the Hadoop layer.

    What's my experience with pricing, setup cost, and licensing?

    We do not pay for licensing because our customers forward it, so there is no need to purchase the license for the project.

    What other advice do I have?

    I would recommend this solution.

    I would rate Cloudera Distribution for Hadoop a nine out of ten.

    Which deployment model are you using for this solution?

    Public Cloud
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    Co-Founder at a tech vendor with 11-50 employees
    Real User
    Top 5Leaderboard
    Has a useful file system and is scalable
    Pros and Cons
    • "The file system is a valuable feature."
    • "The security of this solution could be improved. There should also be a way to basically have a blockchain enabled storage with the HDFS."

    What is our primary use case?

    We use Cloudera Distribution for file storage. 

    This solution is deployed on-premise. 

    What is most valuable?

    The file system is a valuable feature. 

    What needs improvement?

    The security of this solution could be improved. There should also be a way to basically have a blockchain enabled storage with the HDFS. 

    For how long have I used the solution?

    I have been working with Cloudera Distribution for Hadoop for 11 years. 

    What do I think about the stability of the solution?

    This solution is stable. 

    What do I think about the scalability of the solution?

    This solution is scalable enough for us. 

    We have created a product, using HDFS, and when our engineers install it for themselves or for customers, we use this solution. There are about 15 to 20 people using it at any point of time. 

    How was the initial setup?

    The installation is straightforward. We use command-line-based installation and we have created our own way of installing with our product. 

    Depending on the customer or depending on internal usage, our DevOps engineer will install it or my development team will install it. 

    What about the implementation team?

    We are very well-versed on these tools, so we implemented it ourselves. 

    What's my experience with pricing, setup cost, and licensing?

    I haven't bought a license for this solution. I'm only using the Apache license version. 

    What other advice do I have?

    I rate this solution an eight out of ten. Cloudera is a great product and, overall, there are many features. 

    We actually use Cloudera HDFS underneath, and we build our product on top of it. So, we don't use the Cloudera versions of all the other products, we just use the Cloudera HDFS, nothing else.

    Which deployment model are you using for this solution?

    On-premises
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    Flag as inappropriate
    PeerSpot user
    BI Manager at a insurance company with 10,001+ employees
    Real User
    Includes several useful proprietary tools
    Pros and Cons
    • "CDH has a wide variety of proprietary tools that we use, like Impala. So from that perspective, it's quite useful as opposed to something open-source. We get a lot of value from Cloudera's proprietary tools."
    • "It would be useful if Cloudera had more tools like SQL Engines that offer the traditional relational database. We have to do a lot of work preparing the data outside Cloudera before getting it into the platform."

    How has it helped my organization?

    CDH has a wide variety of proprietary tools that we use, like Impala. So from that perspective, it's quite useful as opposed to something open-source. We get a lot of value from Cloudera's proprietary tools. 

    What needs improvement?

    Integration is one of the main things we struggle with because we're working with several other environments. For example, we've got an MPP environment outside the Hadoop environment. Many cloud-based platforms like Azure are fully integrated with technology that gives you MPP machine learning and data lakes all in one environment. We've got on-premises IBM solutions and Cloudera, so it isn't easy to integrate. It would be useful if Cloudera had more tools like SQL Engines that offer the traditional relational database. We have to do a lot of work preparing the data outside Cloudera before getting it into the platform. And ideally, we should get as much raw data as possible into the platform before we can do the engineering, so we have machine learning and model training.

    For how long have I used the solution?

    I've been using CDH for about two years, or rather, I manage the team that uses it.

    What do I think about the stability of the solution?

    We haven't had any issues with Cloudera. It's a solid product. 

    What do I think about the scalability of the solution?

    Cloudera is dependable, and it's completely scalable.

    How are customer service and support?

    We have engaged the technical support based in the UK. My team hasn't worked with them directly, but the administration team has. To my knowledge, they're fairly responsive. 

    What other advice do I have?

    I rate Cloudera Distribution for Hadoop eight out of 10.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    Engineering Manager/Solution architect at a computer software company with 201-500 employees
    Real User
    Top 5Leaderboard
    Preferred solution for on-prem

    What is our primary use case?

    We are a distributor for Hadoop. Our customers choose whether they would like to use Cloudera or another product.

    Cloudera Distribution is deployed on-premise as well as on bare metal servers in AWS.

    What is most valuable?

    Cloudera is a very manageable solution with good support.

    What needs improvement?

    When you compare Cloudera with EMR, EMR has a lot of administrative features, so you don't need to manage the solution. Cloudera is not as easy, as it requires more DevOps resources than other solutions.

    For how long have I used the solution?

    We have been offering this solution for five years.

    What do I think about the stability of the solution?

    Cloudera Distribution is stable.

    What do I think about the scalability of the solution?

    This is a scalable solution. We have clients that have a large installation of Cloudera.

    How are customer service and support?

    Technical support from Cloudera is fine.

    How was the initial setup?

    The initial setup of Cloudera is difficult. After you have installed it once, it is not difficult to reproduce.

    What about the implementation team?

    For a POC deployment, we required only one DevOps. On larger-scale implementation, we also require a data engineer. 

    What's my experience with pricing, setup cost, and licensing?

    Cloudera requires a license to use.

    Which other solutions did I evaluate?

    We looked at EMR, however Cloudera is better when using OnPrem.

    What other advice do I have?

    Cloudera is one of the best solutions for on-prem. 

    I would rate this solution an 8 out of 10.

    Which deployment model are you using for this solution?

    Hybrid Cloud
    Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
    Flag as inappropriate
    PeerSpot user
    IT expert at a comms service provider with 201-500 employees
    Real User
    Reliable, stable, but difficult to use

    What is our primary use case?

    We are in the testing phase of Cloudera Distribution for Hadoop, and we will be in production soon.

    What needs improvement?

    The procedure for operations could be simplified.

    For how long have I used the solution?

    I have used Cloudera Distribution for Hadoop within the past 12 months.

    What do I think about the stability of the solution?

    The solution is reliable and stable, it fits our requirements.

    How was the initial setup?

    The implementation of Cloudera Distribution for Hadoop is not easy. It works on multiple nodes and can be complex for testing. The whole process took us one and a half days.

    What about the implementation team?

    We used a local system integrator for the implementation. We had approximately five people for the implementation.

    We have not had to do maintenance of the solution because we are still in the testing phase.

    What other advice do I have?

    My advice to others is this solution can be complex.

    I rate Cloudera Distribution for Hadoop a seven out of ten.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    Flag as inappropriate
    PeerSpot user
    Buyer's Guide
    Download our free Hadoop Report and find out what your peers are saying about Cloudera, IBM, Amazon, and more!
    Updated: July 2022
    Product Categories
    Hadoop NoSQL Databases
    Buyer's Guide
    Download our free Hadoop Report and find out what your peers are saying about Cloudera, IBM, Amazon, and more!