We use Cloudera Distribution for file storage.
This solution is deployed on-premise.
We use Cloudera Distribution for file storage.
This solution is deployed on-premise.
The file system is a valuable feature.
The security of this solution could be improved. There should also be a way to basically have a blockchain enabled storage with the HDFS.
I have been working with Cloudera Distribution for Hadoop for 11 years.
This solution is stable.
This solution is scalable enough for us.
We have created a product, using HDFS, and when our engineers install it for themselves or for customers, we use this solution. There are about 15 to 20 people using it at any point of time.
The installation is straightforward. We use command-line-based installation and we have created our own way of installing with our product.
Depending on the customer or depending on internal usage, our DevOps engineer will install it or my development team will install it.
We are very well-versed on these tools, so we implemented it ourselves.
I haven't bought a license for this solution. I'm only using the Apache license version.
I rate this solution an eight out of ten. Cloudera is a great product and, overall, there are many features.
We actually use Cloudera HDFS underneath, and we build our product on top of it. So, we don't use the Cloudera versions of all the other products, we just use the Cloudera HDFS, nothing else.
I use the solution because my data is too big. It is almost 100 TB.
The product provides many APIs to connect with other applications. The product provides better data processing features than other tools.
The dashboard could be improved.
I have been using the solution for seven years.
The tool is stable. I rate the stability an eight out of ten.
The tool is scalable. I rate the scalability an eight out of ten. It is easy to scale the product. Almost 20 to 25 people use the tool in our organization. We maintain the solution ourselves. We have nine engineers in our maintenance team.
The support is very, very helpful.
Positive
I have worked with Oracle. Oracle is too expensive.
It was pretty easy to install the product. It took us 20 minutes.
The product’s cost is higher compared to other tools. The pricing must be improved.
I recommend the solution to others. Overall, I rate the solution an eight out of ten.
I primarily use CDH for data storage and regular dashboard reports.
The most valuable feature is that I can use CDH for almost all use cases across all industries, including the financial sector, public sector, private retailers, and so on.
Cloudera's prices are too high and are not competitive with other solutions. They could also improve the Data Science Workbench and add some more features, like wizard activities.
CDH is stable.
CDH is scalable, but it's expensive to do it.
Cloudera's support is extremely bad and cannot be relied on.
I wouldn't recommend CDH to others because of its high cost.
I would rate CDH as eight out of ten.
I'm part of the IT team at my company, and our primary use case of this solution is building infrastructure for advanced analytics, where we copy data from our data warehouse that is now our relational database. We copy it to the Cloudera Distribution for Hadoop and then analyze it with Python and machine learning.
The features I find most valuable is that the solution is that it is easy to install and to work with. It starts with the installation and from there on the management is very simple and centralized.
I would like to see an improvement in how the solution helps me to handle the whole cluster. For example, when I'm going down to a specific tool, like Kafka, for example, the Cloudera manager doesn't really help me. Then I have to use Google with other Kafka knowledge and tools.
It is a very stable solution.
Not many people are currently using this solution at my organization, but I do believe it is scalable. I don't, however, have experience with upgrading or adding users.
My problem is that I started using Cloudera Express without technical support and then I purchased the Enterprise edition through another company. So now I don't really have access to Cloudera support, even though I hardly need to use it.
The initial setup was simple, but we had trouble implementing the cables in the Hadoop solution.
I had a bad experience connecting the Cloudera Distribution for Hadoop cluster to my other resources in the company, like the active directory or firewall. I would like to see the outside environment to be easier to handle. I will rate this eight out of ten because the solution doesn't cover everything. It is a very complicated solution because it contains a lot of internal tools.
We use this solution to process data.
When using an SQL Server you have to build indexes and you need to fine-tune the data.
We import the data that is in the SQL Source.
With a single script, we are able to run the jobs within minutes, which is an advantage.
We are using the Power BI model for the business convention. The performance in Power BI will be reduced if you incorporate more calculations. Those calculations are captured in the Hadoop layer and processed.
It could be faster and more user-friendly.
I have been using this solution for seven months.
It's a stable product. I don't see any performance issues.
This solution is scalable. We have 40 users for different projects in our organization.
We will continue to use this solution.
Technical support is good.
I didn't use any other product.
The installing is straightforward.
Our clients provide us with the access to use it directly.
Once you have been given access to the edge nodes we are able to run the scripts in the Hadoop layer.
We do not pay for licensing because our customers forward it, so there is no need to purchase the license for the project.
I would recommend this solution.
I would rate Cloudera Distribution for Hadoop a nine out of ten.
We primarily use it only for big data support for analytical applications.
The feature that we've used quite intensively is Spark, in how it specifically can speed up some of the data to assist with processing.
The one thing that we struggled with predominately was support. Because it was relatively new, support was always a big issue and I think it's still a bit of an ongoing concern with the team currently managing it.
In the next release, I think it would be helpful if there was easier integration into all the other existing data back corners. It will be a big plus as it's a favorite capability. We had to go with a third-party application in order to achieve that.
The stability is problematic. We did encounter quite a lot of issues with the cluster going down quite frequently.
In terms of scalability, if you have enough hardware you can scale out. Scalability doesn't have any issues. Currently, only about 10 people in total are using the solution. So we have about four business users and then four technical people. It's only limited to two environments.
I think there's a lot of room for improvement on the technical support side. Mostly because we don't have a lot of local skills in South Africa that could have supported the solution. It was an issue.
This is our first solution. We tested a bunch of other technologies, but that was our first one and we're still using it.
The initial implementation was straightforward from an application side. There weren't any hiccups. In terms of deployment time, it's going to be difficult to say, because most of it was related to hardware problems. Software took about two months to deploy. We required four people for deployment.
The pricing is very competitive. It's not bad.
We considered working with a few other companies, including IBM Bluemix.
I would recommend the solution given that they've proven the business case and that they've proven the technology. We have found that if you don't use or address the right business code you end up buying a technology that doesn't necessarily solve your business problems.
I would rate the solution seven out of ten. The main reason for not rating it higher is that I think that the overall support is not great and we've found some limitations. It wasn't mature when we started. It's getting there. It's getting better. The main reason for the score of seven is mainly the support as well as the limited functionality.
Our core product is an insurance product and the actuarial module is quite complex. SMEs so far collect data from various sources into Excel sheets and through macros do the analytics which is a very crude form of doing the analysis. So we thought to use big data for such analysis.
That is still in PUC stage, as I have mentioned our analyst used to do the actuarial on a spreadsheet but after Hadoop implementation they are getting confidence that now analysis is more appropriate and fast. Now exploring cloud implementation as well.
Keeping multi copies of the file and tools of map reduce like PIG, HIVE due to their flexibility it is easy to develop the application with less or almost no knowledge of Java and Sql. And capability to handle huge data size.
As such in the product side, I don't have much to comment. But like other upcoming technologies like RPA, AI, GO etc they have ample training materials with variety of USE Cases, which users can understand and aligned with their current requirements. On same ground I didn't see much training materials from Cloudera.
Seems quite stable, as such didn't face any issue.
It is very stable, didn't face any performance issue.
No when we were heard of Hadoop, we tried on that only. I mean tried to migrate from spreadsheets to Hadoop.
Very straight forward. Typical Windows type installation...Next, next, next clicks.
In-house.
Other department handles all these so I can't comment on that.
Not really.
Mostly HUE, Impala, Sqoop, and Hive. The impala-shell command is number one.
We are working on research for genomic data looking for specific genes and variances. Even Hive was not good enough to process it correctly, only with Impala are we getting results quicker.
Sometimes the heavy queries do not finish at all. It would be good to see the progress of heavy script in the impala shell or get some way to access it.
We started to use Cloudera about one-and-a-half years ago.
We are having some issues with stability and are speaking to Cloudera support.
It's acceptable.
Technical Support:It's acceptable.
We were trying AWS Impala as well, but Cloudera won as it had more functionality with HUE, Sqoop, and Solr as built-in functions.
We have struggled a bit in installing and configuring Cloudera Manager on the AWS cluster. For now, it is good.
We did the implementation only using our team and resources. It was a hard start, but an easy landing.
Cloudera is good for mid to big company, but small ones can use AWS Impala/HUE. Go to training, or you are going to spend many hours to find short answers. The Cloudera solution is big with good documentation, but you need to know what and where to read first.