What is our primary use case?
My main use case for Cloudera Data Platform is data analytics and AI.
For data analytics and AI in my day-to-day work, we have a multi-source system where the data keeps coming from different source systems, from RDBMS, in tabular format, or semi-structured, or streaming data from Kafka. We process and store data in the backend ADLS, then apply business rule logic to create a golden table which is published for business or end users who consume the data for analytics. Some AI engineers develop or run that code, Python code, or LLM against those data to gain insights.
What is most valuable?
The most unique feature I love about Cloudera Data Platform is its integration with Ranger services. Ranger is more flexible compared to Cloudera's previous data distribution component, Sentry, making it more reliable and allowing for access control at a more granular level.
The Ranger integration makes it more flexible and reliable for me by allowing control over data access, specifying who can access at what level, such as table level, masking, or data layer level. This is crucial for managing all data inside the farm.
In terms of integration, it is very easy with Cloudera Data Platform. We just hook it up since it comes with the package when we install the CDP runtime, allowing us to select the ecosystem we want in our farm depending on our use cases. It is not a standalone installation requirement; it is an easy job. Scalability and flexibility are very good.
What needs improvement?
From a holistic view in the market, I have not seen enough innovation in Cloudera Data Platform, particularly in support for machine learning. It supports it, but not to a robust extent compared to other tech providers, such as Databricks, which are more flexible and in tune with current trends in AI and machine learning. I wish Cloudera would innovate and keep pace with market demands.
Regarding the user interface of Cloudera Data Platform, I have not faced any challenges, though we definitely look forward to innovation to support varied data models and scalability.
For how long have I used the solution?
I have been using Cloudera Data Platform for almost four years.
What do I think about the stability of the solution?
Cloudera Data Platform is generally stable; however, we occasionally face minor network connectivity issues as confirmed by the vendor. Sometimes a node goes down, but it automatically returns to a healthy state.
What do I think about the scalability of the solution?
Cloudera Data Platform has positively impacted my organization by eliminating challenges we faced with CDH, which had not been supported for a cloud journey. When adding scalability, such as horizontal scalability to our existing cluster, the process was time-consuming and required upfront costs for procuring servers. In contrast, CDP allows for easy, mostly automated scalability where I can schedule job workflows, fine-tune system resource metrics, and add nodes with just a click.
How are customer service and support?
Customer support depends on the case severity, but from my experience, it is great. Cloudera support is timely and responsive, adhering to the SLAs they provide.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
Previously, we used Cloudera Data Distribution, known as CDH, which was on-premises and required more manual efforts among multiple teams, taking almost a month to set up a cluster. We switched primarily for cost-effectiveness, flexibility, and the reduced time required for setup.
How was the initial setup?
Cloudera Data Platform has positively impacted my organization by eliminating challenges we faced with CDH, which had not been supported for a cloud journey. When adding scalability, such as horizontal scalability to our existing cluster, the process was time-consuming and required upfront costs for procuring servers. In contrast, CDP allows for easy, mostly automated scalability where I can schedule job workflows, fine-tune system resource metrics, and add nodes with just a click.
What about the implementation team?
A solution architect from the vendor helps us resolve any ongoing issues such as bugs or vulnerabilities, and we appreciate the flexibility of the cloud journey.
What was our ROI?
In terms of return on investment, I see great changes in operational effectiveness measured by RTO when comparing on-premises solutions with cloud solutions. The difference is notable.
What's my experience with pricing, setup cost, and licensing?
I have not been involved overall in cost negotiation, but we find Cloudera Data Platform to be cost-effective. We work with the Cloudera vendor to secure one or two-year licenses upfront for discounts.
Which other solutions did I evaluate?
We evaluated Databricks three years ago, but it was not up to market standards in feature support at that time, particularly lacking an account console, which was introduced afterward. We have seen clients migrating from Cloudera to Databricks since the rollout of that console.
What other advice do I have?
My advice for those considering Cloudera Data Platform is to evaluate their business use case and budget, as these two factors are crucial. If the organization does not need advanced features such as LLM or machine learning, Cloudera Data Platform may be suitable. However, based on the current market, if rating between Databricks and Cloudera, I would give Databricks a one and Cloudera a two.
There are lots of challenges I face while using Cloudera Data Platform. Sometimes, vulnerabilities depend on which version of CDP runtime I am using, so we work with the Cloudera vendor side to remediate any vulnerabilities based on that version. Along with that, we use it for data audit purposes, gathering all inflow data such as how data is being used, who has access, and how many times.
In terms of cost savings with Cloudera Data Platform, moving from on-premises to cloud is very cost-effective. We can use bare metal servers or on-spot servers, which makes it economical. In performance, it is superior to previous versions since multiple Spark versions are added to the CDP runtime, improving data distribution, handling, and fault tolerance, requiring no code fine-tuning.
I rate Cloudera Data Platform six out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Disclosure: My company does not have a business relationship with this vendor other than being a customer.