What is our primary use case?
We use it for multiple domains, including oil & gas, finance (Morgan Stanley), and healthcare. We process around 186 TB of data per day for analytics purposes.
Currently, we use it for healthcare domain.
What is most valuable?
Distributed computing, secure containerization, and governance capabilities are the most valuable features.
What needs improvement?
Since Cloudera acquired HDP, it's been bundled with CBH and HDP. However, the biggest challenge is cloud storage integration with Azure, GCP, and AWS. These platforms offer competitive storage solutions like Gen2, Gen1, Bigtable, BigQuery, Lightstore, S3 buckets, etc., which pose a significant competition to HDP.
For how long have I used the solution?
I have experience with this product. The short form is HDP 2.7. I have been using it since 2011.
It was on-premises and hybrid for the first three months, then we migrated it to AWS and Azure.
What do I think about the stability of the solution?
In terms of storing data in different formats, it's been somewhat unstable. But when compared to Azure Gen2 and its support and features, it's much more advanced. The suitability depends on specific use cases, but overall, HDP seems more mature than it was in the past.
What do I think about the scalability of the solution?
From my experience with both HDP and CDH, they are both scalable. Currently, most people in my company have shifted to Azure, so they are using Gen2 primarily and discarding Gen1.
How are customer service and support?
I have frequently contacted technical support for both Cloudera and Hortonworks.
We have an IT system to raise issues against their team. Issues usually get attended by someone at an L1, L2, or L3 support level. They connect with us directly.
Which solution did I use previously and why did I switch?
Previously, we used Cloudera Data Platform (CDP), which turned out to be a cloud-based Azure infrastructure, and implemented metadata solutions like Hive and others.
How was the initial setup?
The setup was very difficult on non-cloud platforms. We had to implement a version-based approach. However, it became simpler with the use of Docker. We used to do it HDP sandboxes and VM boxes and then created clusters in the ancient days. Now, on cloud platforms, it's much easier, just a matter of a few clicks. That's another approach we can take.
What's my experience with pricing, setup cost, and licensing?
I haven't done a price analysis specifically for HDP. However, when it was first introduced as Hadoop 2.0, there were a few use cases where the price was quite high.
It was particularly expensive for Cloudera and Hortonworks Data Platform. Both options were quite resource-intensive.
So, seven, or even nine or ten years ago, it was quite expensive.
What other advice do I have?
I recommend a mature decision-making model. Assess your specific needs and use cases. If HDP suits your requirements, use it. Otherwise, there are many advanced options available. Review and choose the best one for your use case.
Overall, I would rate the solution a nine out of ten.
I simply love this technology when it comes to new developments. And I've been working with it for the past twelve to thirteen years. However, with the emergence of new technologies, there might be a chance that I would reduce one point because there's room for improvement.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
*Disclosure: My company does not have a business relationship with this vendor other than being a customer.