We use it for multiple domains, including oil & gas, finance (Morgan Stanley), and healthcare. We process around 186 TB of data per day for analytics purposes.
Currently, we use it for healthcare domain.
We use it for multiple domains, including oil & gas, finance (Morgan Stanley), and healthcare. We process around 186 TB of data per day for analytics purposes.
Currently, we use it for healthcare domain.
Distributed computing, secure containerization, and governance capabilities are the most valuable features.
Since Cloudera acquired HDP, it's been bundled with CBH and HDP. However, the biggest challenge is cloud storage integration with Azure, GCP, and AWS. These platforms offer competitive storage solutions like Gen2, Gen1, Bigtable, BigQuery, Lightstore, S3 buckets, etc., which pose a significant competition to HDP.
I have experience with this product. The short form is HDP 2.7. I have been using it since 2011.
It was on-premises and hybrid for the first three months, then we migrated it to AWS and Azure.
In terms of storing data in different formats, it's been somewhat unstable. But when compared to Azure Gen2 and its support and features, it's much more advanced. The suitability depends on specific use cases, but overall, HDP seems more mature than it was in the past.
From my experience with both HDP and CDH, they are both scalable. Currently, most people in my company have shifted to Azure, so they are using Gen2 primarily and discarding Gen1.
I have frequently contacted technical support for both Cloudera and Hortonworks.
We have an IT system to raise issues against their team. Issues usually get attended by someone at an L1, L2, or L3 support level. They connect with us directly.
Previously, we used Cloudera Data Platform (CDP), which turned out to be a cloud-based Azure infrastructure, and implemented metadata solutions like Hive and others.
The setup was very difficult on non-cloud platforms. We had to implement a version-based approach. However, it became simpler with the use of Docker. We used to do it HDP sandboxes and VM boxes and then created clusters in the ancient days. Now, on cloud platforms, it's much easier, just a matter of a few clicks. That's another approach we can take.
I haven't done a price analysis specifically for HDP. However, when it was first introduced as Hadoop 2.0, there were a few use cases where the price was quite high.
It was particularly expensive for Cloudera and Hortonworks Data Platform. Both options were quite resource-intensive.
So, seven, or even nine or ten years ago, it was quite expensive.
I recommend a mature decision-making model. Assess your specific needs and use cases. If HDP suits your requirements, use it. Otherwise, there are many advanced options available. Review and choose the best one for your use case.
Overall, I would rate the solution a nine out of ten.
I simply love this technology when it comes to new developments. And I've been working with it for the past twelve to thirteen years. However, with the emergence of new technologies, there might be a chance that I would reduce one point because there's room for improvement.
The main use case for Cloudera Data Platform is to support a multi-source system with a multi-data structure. We have streaming services, Kafka services, RDBMS systems, and semi-structured data in the form of CSV and JSON files where we used to have everything in place and centralized.
Cloudera Data Platform also supports a hybrid data warehouse, which is similar to a relational database management system where business users can do query analytics, similar to a select star. Cloudera Data Platform also supports PySpark, where a user can create a data frame and then do a transformation load to perform and get insights.
The best features of Cloudera Data Platform are that it supports hybrid types of environments, real-time streaming analytics, secure data and governance, machine learning and AI workloads, data warehousing and BI, and edge-to-edge AI use cases.
In the hybrid environment, we can have a private cloud as well as a public cloud, which helps us enable both types of workloads. We have data that keeps coming through a pipeline, and then we just ingest our data. The data engineer transforms and loads it to a data lake, which is Amazon S3. Once the data is ready, it's on the downstream, and it's available for the consumer end to consume the data.
The most important features of Cloudera Data Platform are Rangers, which provide a granular level of security, allowing you to provide column-level security and decide what column you want to expose to the consumer, not just the tabular level.
Cloudera Data Platform has a great impact on my organization as it supports the business demand and business requirements, making me happy with the business use case. It depends on what the business demands and the business use case, which allows for an evaluation of what the business wants. Based on that, they can make a decision on where to go and where to migrate a workload.
I would definitely want to see more on the invention part of Cloudera Data Platform to provide a full-fledged AI and ML workload, as AI is supported currently, but I'm interested in having ML and LLM also supported in a full-fledged manner.
I have been working in the current field for almost six to eight years.
Cloudera Data Platform is stable.
Cloudera Data Platform's scalability is very nice, as you can have multiple workloads and even have multiple clusters with different CDP runtimes. You just have to define the business requirement in the configuration, and based on usage, it automatically scales up and scales down.
Customer support for Cloudera Data Platform is very good.
Positive
We have been using a Cloudera distribution for Hadoop, which is a CDP product, a CDH product. The CDH product provided on-premises only, so we migrated from on-premises to the cloud to opt for cloud compute.
The experience with pricing, setup cost, and licensing is very good. The cloud service provider has an inbuilt tool to analyze what zone and what region to use, as the services have costs associated with that, allowing us to manipulate which region is best suitable and cheaper.
In terms of ROI, we definitely have seen a return on investment. Due to security, we cannot disclose the value, but we have definitely seen an ROI.
The experience with pricing, setup cost, and licensing is very good.
I did not evaluate other options before choosing Cloudera Data Platform.
I would rate Cloudera Data Platform an eight out of ten because it's excellent in terms of the product, its deliverability, its support, and its use cases. It might differ for different industries depending on what each industry wants, but overall, it has a good impression, and I'm happy with the work relationship with Cloudera technical support.
If someone is looking for a hybrid environment or a cloud environment, they can definitely consider reviewing Cloudera Data Platform. They can look at all the aspects, as the Cloudera Data Platform ecosystem provides Apache Hive, HBase, Kafka, NiFi, Solr, and Knox, which they can review based on their business use case.
We primarily use the solution for data storage and processing.
It is one of the better technology in terms of Hadoop.
The product offers a fairly easy setup process.
It is quite stable.
The scalability is good.
We'd like the solution to be easier.
There used to be a free version on offer. Now, everything is paid.
It's not quite as easy to install as EMR.
It's at end of life and no longer will there be improvements.
I've been using the solution for more than two years.
The solution for the most part has been stable. There aren't issues with bugs or glitches and it doesn't crash or freeze. it's reliable.
The product is scalable. It's not a problem if you would like to expand it.
Now, the customer support is Cloudera. We are in discussions about migrating it to Cloudera. As we are still in the discussion phase, we have yet to work with technical support.
We need to upgrade and are considering Cloudera. It also used to be free, and now that we have to pay for this product, we are looking to move off of it and onto potentially Cloudera.
The solution is fairly simple to set up. It's not too complex or difficult. If you know the solution, it's easy. However, there is a learning curve. If you don't know anything about it, it can be more complex.
You can typically deploy it within a week. We have five or six people capable of handling a deployment.
We have our own team that is capable of handling deployments. We did not need to have an integrator or consultant come in.
The solution used to have a free tier. They have since taken that away, which is disappointing.
We are constantly evaluating other options. We're always curious to see what could help our business. Currently, we've looked into EMR and Cloudera.
We're a customer and end-user.
We are not using the latest version. We have to upgrade it too, which that's why we are looking at Cloudera now.
Hortonworks is at end of life. They are not adding new things to this, so everything will be Cloudera now.
I'd rate it seven out of ten.
We use Hortonworks Data Platform for data management, significant data ingestion, and analytics.
Hortonworks Data Platform has a limited user community. I haven't seen much discussion about user experiences. More information could be there to simplify the process of running the product.
We have been using Hortonworks Data Platform for a couple of months.
I rate the product's stability an eight out of ten.
We have five Hortonworks Data Platform users in our organization. It is a scalable platform.
The initial setup could be more straightforward. It would help if you are technically inclined to follow the necessary steps. There could be easy ways to set it up. It takes 45 minutes to complete and requires a team of five people to execute the process.
We implement the product in-house.
Currently, we are using the product in a sandbox environment, and there is no licensing. We might choose a licensing option once we get the results.
I recommend Hortonworks Data Platform to others and rate it an eight out of ten.
There are a lot of use cases for the Hortonworks Data Platform. We use it alongside GPFS, so most of the information we use for operational analytics is primarily on the Hortonworks Data Platform.
The upgrades and patches must come from Hortonworks. Therefore, if we encounter any problems, they will be responsible for addressing them. This is one of the instances where we have to rely on them for all the upgrades.
The cost of the solution is high and there is room for improvement.
I have been using the Hortonworks Data Platform for two years.
Hortonworks Data Platform is scalable, but it lacks the capability for horizontal scaling. Therefore, we need to add more servers to increase its capacity.
I am responsible for setting up the infrastructure, but I don't handle the engineering work.
I would rate Hortonworks Data Platform an eight out of ten. The solution delivers on its promises, and Hortonworks provides a sandbox for testing before making a purchase.
The maintenance requires a lot of people, including the DRE and IRE teams.
It is not practical for most organizations that lack large amounts of resources to maintain their own data platform. The Hortonworks Data Platform makes it easier for such organizations.
We use Hortonworks as a storage platform and then we create machine learning models and do the execution using Cloudera Data Science Workbench. (Cloudera and Hortonworks merged in January of 2019.)
The most valuable part of this product is what Cloudera Data Science Workbench can do as a whole for modeling and analysis.
We are happy with the platform but we are also looking at what else is out there. We are comparing what other teams are using in our company to the solution we have adopted.
The main difference, and what seems better with other tools, is the deployment. That part is not entirely clear to us. We can create models, but once we create the models we are not sure as to how to deploy them.
The version control of the software is also an issue.
Maybe these improvements are already included in the newest release but we are not aware of it because nobody on our team has had the opportunity to try it. I do not know how well this product is supported.
We have been using Hortonworks Data Platform for almost a year.
We do not maintain it in our department. We just use it. It has been mostly available other than when they had to upgrade. The IT department did run into some upgrade issues. But as users, it has been stable for us.
We expect to have a lot more data. The scalability is the key reason why we are on this platform.
My IT team is the user group when it comes to installation and setup. They are the ones who do the product installation and management.
I think it is priced well and it is affordable. Hadoop, which we use with the solution, is open-source. That part is free. We pay only for whatever wrappers Cloudera provides on top of the open-source product, Hadoop. I do not know about the actual pricing in total. The whole point of Hadoop is that it is open-source and they have created their own cluster. Cloudera is just the vendor that they are using.
My guess is Hortonworks should not be expensive at all to those looking into using it.
It is important to note that the IT team has to support the product. We are not the IT team so if we have to scale it, someone has to be able to do that administrative job of adding another server and managing the distribution of the data across all the servers. To work with this, you need to have that skill set within the IT department.
On a scale from one to ten (where one is the worst and ten is the best), I would rate Hortonworks Data Platform as an eight or nine out-of-ten. It is meeting a need.
We use this solution for the hospitality industry.
It was for end to end data processing and data manipulations.
The data platform is pretty neat. The workflow is also really good.
The NiFi platform could be enhanced. This refers to the data ingestion in a workflow.
It would also be nice if there was less coding involved.
I have been using this solution for six years.
The technical support is okay, but not excellent. They can take a while to respond.
If you wish to use this solution, make sure you compare it with some other solutions first to make sure it's right for your needs.
Overall, on a scale from one to ten, I would give this solution a rating of nine.
We use this solution to look at and manage big data. It's mostly historical data that we offload from our data warehouse, as well as from other databases in other platforms.
We have two different installations. The first one is based on IBM POWER CPUs, and the other one is based on Intel CPUs. Our data center is on-premise. There is some thought on moving to a private could, or a private IBM cloud, but we have not proceeded with that as of yet.
This solution is a cheaper way for us to offload the otherwise expensive data. We can move data from outdated database versions, such as Oracle 10. It is now out of support, but still hosts some of our historical data. This solution has helped us move our data to the current version.
Previously, we had our data on more expensive platforms. Now, using this solution, it is much cheaper to have all of the data available for searching, not in real-time, but whenever there is a pending request.
We have had problems with the backup and with services that require a disaster site. We are still struggling with some of these issues.
We are having trouble with Active Directory and Hive integration.
I would like to see more support for containers such as Docker and OpenShift.
We have had some issues with the code, but it's mostly from the developers. From our side, we don't see any issues with stability, although it may be that we have a lot of unused CPU capacity.
We have not acquired any additional hardware since our initial purchase. However, we expect more use cases to be added, at which point we may have performance or scalability problems.
The initial setup is not very difficult. The configuration is not easy, but somebody with some experience is able to set it up. We had users for which we had to set up quotas and queues. For us, the basic installation was completed within a matter of a week.
We had IBM set up both of our installations.
This is a good product, but we still have some issues with backup, and the performance monitors that we install on every system. There may be solutions, but we're struggling to integrate them.
This is a product that I recommend. It's a solution that comes at a lower price, and it works well if you don't have expectations that it will behave like a much more expensive system.
I would rate this solution an eight out of ten.
