Suresh_Srinivasan - PeerSpot reviewer
Co-Founder at FORMCEPT Technologies
Real User
Top 10
Has a useful file system and is scalable
Pros and Cons
  • "The file system is a valuable feature."
  • "The security of this solution could be improved. There should also be a way to basically have a blockchain enabled storage with the HDFS."

What is our primary use case?

We use Cloudera Distribution for file storage. 

This solution is deployed on-premise. 

What is most valuable?

The file system is a valuable feature. 

What needs improvement?

The security of this solution could be improved. There should also be a way to basically have a blockchain enabled storage with the HDFS. 

For how long have I used the solution?

I have been working with Cloudera Distribution for Hadoop for 11 years. 

Buyer's Guide
Cloudera Distribution for Hadoop
May 2024
Learn what your peers think about Cloudera Distribution for Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: May 2024.
769,630 professionals have used our research since 2012.

What do I think about the stability of the solution?

This solution is stable. 

What do I think about the scalability of the solution?

This solution is scalable enough for us. 

We have created a product, using HDFS, and when our engineers install it for themselves or for customers, we use this solution. There are about 15 to 20 people using it at any point of time. 

How was the initial setup?

The installation is straightforward. We use command-line-based installation and we have created our own way of installing with our product. 

Depending on the customer or depending on internal usage, our DevOps engineer will install it or my development team will install it. 

What about the implementation team?

We are very well-versed on these tools, so we implemented it ourselves. 

What's my experience with pricing, setup cost, and licensing?

I haven't bought a license for this solution. I'm only using the Apache license version. 

What other advice do I have?

I rate this solution an eight out of ten. Cloudera is a great product and, overall, there are many features. 

We actually use Cloudera HDFS underneath, and we build our product on top of it. So, we don't use the Cloudera versions of all the other products, we just use the Cloudera HDFS, nothing else.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
EricLin - PeerSpot reviewer
Chairman at Athemaster co.,ltd.
Real User
Top 10
Provides excellent data processing features and enables users to connect with other applications
Pros and Cons
  • "The product provides better data processing features than other tools."
  • "The dashboard could be improved."

What is our primary use case?

I use the solution because my data is too big. It is almost 100 TB.

What is most valuable?

The product provides many APIs to connect with other applications. The product provides better data processing features than other tools.

What needs improvement?

The dashboard could be improved.

For how long have I used the solution?

I have been using the solution for seven years.

What do I think about the stability of the solution?

The tool is stable. I rate the stability an eight out of ten.

What do I think about the scalability of the solution?

The tool is scalable. I rate the scalability an eight out of ten. It is easy to scale the product. Almost 20 to 25 people use the tool in our organization. We maintain the solution ourselves. We have nine engineers in our maintenance team.

How are customer service and support?

The support is very, very helpful.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

I have worked with Oracle. Oracle is too expensive.

How was the initial setup?

It was pretty easy to install the product. It took us 20 minutes.

What's my experience with pricing, setup cost, and licensing?

The product’s cost is higher compared to other tools. The pricing must be improved.

What other advice do I have?

I recommend the solution to others. Overall, I rate the solution an eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Buyer's Guide
Cloudera Distribution for Hadoop
May 2024
Learn what your peers think about Cloudera Distribution for Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: May 2024.
769,630 professionals have used our research since 2012.
Mohammed Hamad - PeerSpot reviewer
AI & Data Engineering Lead at a tech services company with 10,001+ employees
Real User
Top 5
Flexible and comprehensive solution
Pros and Cons
  • "The most valuable feature is that I can use CDH for almost all use cases across all industries, including the financial sector, public sector, private retailers, and so on."
  • "Cloudera's support is extremely bad and cannot be relied on."

What is our primary use case?

I primarily use CDH for data storage and regular dashboard reports.

What is most valuable?

The most valuable feature is that I can use CDH for almost all use cases across all industries, including the financial sector, public sector, private retailers, and so on.

What needs improvement?

Cloudera's prices are too high and are not competitive with other solutions. They could also improve the Data Science Workbench and add some more features, like wizard activities.

What do I think about the stability of the solution?

CDH is stable.

What do I think about the scalability of the solution?

CDH is scalable, but it's expensive to do it.

How are customer service and support?

Cloudera's support is extremely bad and cannot be relied on.

What's my experience with pricing, setup cost, and licensing?

I wouldn't recommend CDH to others because of its high cost.

What other advice do I have?

I would rate CDH as eight out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
DBA team manager at a financial services firm with 1,001-5,000 employees
Real User
Helpful to build infrastructure for advanced analytics and is easy to install
Pros and Cons
  • "The features I find most valuable is that the solution is that it is easy to install and to work with. It starts with the installation and from there on the management is very simple and centralized."
  • "I would like to see an improvement in how the solution helps me to handle the whole cluster."

What is our primary use case?

I'm part of the IT team at my company, and our primary use case of this solution is building infrastructure for advanced analytics, where we copy data from our data warehouse that is now our relational database. We copy it to the Cloudera Distribution for Hadoop and then analyze it with Python and machine learning. 

What is most valuable?

The features I find most valuable is that the solution is that it is easy to install and to work with. It starts with the installation and from there on the management is very simple and centralized.

What needs improvement?

I would like to see an improvement in how the solution helps me to handle the whole cluster. For example, when I'm going down to a specific tool, like Kafka, for example, the Cloudera manager doesn't really help me. Then I have to use Google with other Kafka knowledge and tools. 

For how long have I used the solution?

I've been using this solution for about three years now.

What do I think about the stability of the solution?

It is a very stable solution.

What do I think about the scalability of the solution?

Not many people are currently using this solution at my organization, but I do believe it is scalable. I don't, however, have experience with upgrading or adding users. 

How are customer service and technical support?

My problem is that I started using Cloudera Express without technical support and then I purchased the Enterprise edition through another company. So now I don't really have access to Cloudera support, even though I hardly need to use it. 

How was the initial setup?

The initial setup was simple, but we had trouble implementing the cables in the Hadoop solution.

What other advice do I have?

I had a bad experience connecting the Cloudera Distribution for Hadoop cluster to my other resources in the company, like the active directory or firewall. I would like to see the outside environment to be easier to handle. I will rate this eight out of ten because the solution doesn't cover everything. It is a very complicated solution because it contains a lot of internal tools. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Associate Manager at a consultancy with 501-1,000 employees
Real User
Easy to install, good technical support, and with a single script we can run jobs within minutes
Pros and Cons
  • "I don't see any performance issues."
  • "It could be faster and more user-friendly."

What is our primary use case?

We use this solution to process data.

When using an SQL Server you have to build indexes and you need to fine-tune the data.

We import the data that is in the SQL Source.

With a single script, we are able to run the jobs within minutes, which is an advantage.

We are using the Power BI model for the business convention. The performance in Power BI will be reduced if you incorporate more calculations. Those calculations are captured in the Hadoop layer and processed.

What needs improvement?

It could be faster and more user-friendly.

For how long have I used the solution?

I have been using this solution for seven months.

What do I think about the stability of the solution?

It's a stable product. I don't see any performance issues.

What do I think about the scalability of the solution?

This solution is scalable. We have 40 users for different projects in our organization.

We will continue to use this solution.

How are customer service and technical support?

Technical support is good.

Which solution did I use previously and why did I switch?

I didn't use any other product.

How was the initial setup?

The installing is straightforward.

Our clients provide us with the access to use it directly.

Once you have been given access to the edge nodes we are able to run the scripts in the Hadoop layer.

What's my experience with pricing, setup cost, and licensing?

We do not pay for licensing because our customers forward it, so there is no need to purchase the license for the project.

What other advice do I have?

I would recommend this solution.

I would rate Cloudera Distribution for Hadoop a nine out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user900987 - PeerSpot reviewer
Data Management at BCX
Real User
Offers big data support for analytical applications but the technical support needs improvement
Pros and Cons
  • "In terms of scalability, if you have enough hardware you can scale out. Scalability doesn't have any issues."
  • "The one thing that we struggled with predominately was support. Because it was relatively new, support was always a big issue and I think it's still a bit of an ongoing concern with the team currently managing it."

What is our primary use case?

We primarily use it only for big data support for analytical applications.

What is most valuable?

The feature that we've used quite intensively is Spark, in how it specifically can speed up some of the data to assist with processing.

What needs improvement?

The one thing that we struggled with predominately was support. Because it was relatively new, support was always a big issue and I think it's still a bit of an ongoing concern with the team currently managing it.

In the next release, I think it would be helpful if there was easier integration into all the other existing data back corners. It will be a big plus as it's a favorite capability. We had to go with a third-party application in order to achieve that.

For how long have I used the solution?

I've been using the solution since 2016.

What do I think about the stability of the solution?

The stability is problematic. We did encounter quite a lot of issues with the cluster going down quite frequently.

What do I think about the scalability of the solution?

In terms of scalability, if you have enough hardware you can scale out. Scalability doesn't have any issues. Currently, only about 10 people in total are using the solution. So we have about four business users and then four technical people. It's only limited to two environments.

How are customer service and technical support?

I think there's a lot of room for improvement on the technical support side. Mostly because we don't have a lot of local skills in South Africa that could have supported the solution. It was an issue.

Which solution did I use previously and why did I switch?

This is our first solution. We tested a bunch of other technologies, but that was our first one and we're still using it.

How was the initial setup?

The initial implementation was straightforward from an application side. There weren't any hiccups. In terms of deployment time, it's going to be difficult to say, because most of it was related to hardware problems. Software took about two months to deploy. We required four people for deployment.

What's my experience with pricing, setup cost, and licensing?

The pricing is very competitive. It's not bad.

Which other solutions did I evaluate?

We considered working with a few other companies, including IBM Bluemix.

What other advice do I have?

I would recommend the solution given that they've proven the business case and that they've proven the technology. We have found that if you don't use or address the right business code you end up buying a technology that doesn't necessarily solve your business problems.

I would rate the solution seven out of ten. The main reason for not rating it higher is that I think that the overall support is not great and we've found some limitations. It wasn't mature when we started. It's getting there. It's getting better. The main reason for the score of seven is mainly the support as well as the limited functionality.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
Lead Consultant - Product Development at FIS (http://www.fisglobal.com/)
Consultant
We use this solution to use big data for our analyses

What is our primary use case?

Our core product is an insurance product and the actuarial module is quite complex. SMEs so far collect data from various sources into Excel sheets and through macros do the analytics which is a very crude form of doing the analysis. So we thought to use big data for such analysis.

How has it helped my organization?

That is still in PUC stage, as I have mentioned our analyst used to do the actuarial on a spreadsheet but after Hadoop  implementation they are getting confidence that now analysis is more appropriate and fast. Now exploring cloud implementation as well.

What is most valuable?

Keeping multi copies of the file and tools of map reduce like PIG, HIVE due to their flexibility it is easy to develop the application with less or almost no knowledge of Java and Sql. And capability to handle huge data size.

What needs improvement?

As such in the product side, I don't have much to comment. But like other upcoming technologies like RPA, AI, GO etc they have ample training materials with variety of USE Cases, which users can understand and aligned with their current requirements. On same ground I didn't see much training materials from Cloudera.

For how long have I used the solution?

One to three years.

What do I think about the stability of the solution?

Seems quite stable, as such didn't face any issue.

What do I think about the scalability of the solution?

It is very stable, didn't face any performance issue.

Which solution did I use previously and why did I switch?

No when we were heard of Hadoop, we tried on that only. I mean tried to migrate from spreadsheets to Hadoop.

How was the initial setup?

Very straight forward. Typical Windows type installation...Next, next, next clicks.

What about the implementation team?

In-house.

What was our ROI?

Other department handles all these so I can't comment on that.

What's my experience with pricing, setup cost, and licensing?

 

Which other solutions did I evaluate?

Not really.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user357645 - PeerSpot reviewer
Data/Big Data Architect at a healthcare company with 1,001-5,000 employees
Real User
We were trying AWS Impala as well, but Cloudera won as it had more functionality with HUE, Sqoop, and Solr as built-in functions. At times, heavy queries do not finish at all.

What is most valuable?

Mostly HUE, Impala, Sqoop, and Hive. The impala-shell command is number one.

How has it helped my organization?

We are working on research for genomic data looking for specific genes and variances. Even Hive was not good enough to process it correctly, only with Impala are we getting results quicker.

What needs improvement?

Sometimes the heavy queries do not finish at all. It would be good to see the progress of heavy script in the impala shell or get some way to access it.

For how long have I used the solution?

We started to use Cloudera about one-and-a-half years ago.

What do I think about the stability of the solution?

We are having some issues with stability and are speaking to Cloudera support.

How are customer service and technical support?

Customer Service:

It's acceptable.

Technical Support:

It's acceptable.

Which solution did I use previously and why did I switch?

We were trying AWS Impala as well, but Cloudera won as it had more functionality with HUE, Sqoop, and Solr as built-in functions.

How was the initial setup?

We have struggled a bit in installing and configuring Cloudera Manager on the AWS cluster. For now, it is good.

What about the implementation team?

We did the implementation only using our team and resources. It was a hard start, but an easy landing.

What other advice do I have?

Cloudera is good for mid to big company, but small ones can use AWS Impala/HUE. Go to training, or you are going to spend many hours to find short answers. The Cloudera solution is big with good documentation, but you need to know what and where to read first.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Cloudera Distribution for Hadoop Report and get advice and tips from experienced pros sharing their opinions.
Updated: May 2024
Product Categories
Hadoop NoSQL Databases
Buyer's Guide
Download our free Cloudera Distribution for Hadoop Report and get advice and tips from experienced pros sharing their opinions.