Data engineer at a tech vendor with 10,001+ employees
Real User
Top 20
Nov 12, 2025
My main use case for Cloudera Data Platform is dealing with large volumes of data and primarily handling unstructured data by combining structured and unstructured data on this platform. I use Cloudera Data Platform for handling unstructured data primarily in a healthcare company where there are many research notes, which are handwritten notes. Using this platform, we have performed PDF extraction where we store PDF data and then extract the data by performing PDF extraction using this platform. That is one use case. The second use case is mainly dealing with voice files. We store the voice files, convert voice to text, and then perform text analytics on that. It is basically dealing with call center voice files.
My main use case for Cloudera Data Platform is measuring HDFS and the SQL queries in Impala to troubleshoot some error in YARN applications based on Spark, and control the reporting data between Informatica and Cloudera for transport data between the DB Oracle, Mongo DB to CDP in Impala, between HDFS. For measuring HDFS, I use Cloudera Data Platform, specifically Cloudera Manager, to analyze small files in HDFS to reduce our number for the duration of jobs that read this file and the partition date. I mainly use Cloudera Data Platform as part of a large-scale data processing and analytics pipeline in a hybrid cloud environment, primarily on Azure, which involves managing the YARN cluster, monitoring workloads, troubleshooting performance issues, and integrating data ingestion and transformation processes from various enterprise systems. We leverage CDP for its scalability, security, and strong integration with Looker, Informatica, Hive, and Spark.
ML Engineer - Director at a financial services firm with 10,001+ employees
Real User
Top 20
Sep 30, 2025
Handling and processing big volumes of data is my main use case for Cloudera Data Platform. We get the instrument data from various providers, and we process them, do reconciliation, and use Cloudera Data Platform to process it and ingest it in a structured manner which is then used by our downstream consumers. One unique aspect about my main use case with Cloudera Data Platform involves multiple application teams building their workflows on the platform. I don't have all the insights into other aspects.
The primary usage of Cloudera Data Platform ( /products/cloudera-data-platform-reviews ) is to offload ETL processes because it's cheaper compared to data warehouse solutions like Teradata ( /products/teradata-reviews ) or Oracle. Furthermore, basic reporting can be done, and some real-time processes can be managed.
We heavily use Cloudera Data Platform for data science activities. Various departments in the company utilize it as a sandbox for data discovery. We have multiple data pipelines running on a daily and hourly basis, along with some real-time data pipelines.
Sr Manager at a transportation company with 10,001+ employees
Real User
Dec 6, 2023
We use it for multiple domains, including oil & gas, finance (Morgan Stanley), and healthcare. We process around 186 TB of data per day for analytics purposes. Currently, we use it for healthcare domain.
There are a lot of use cases for the Hortonworks Data Platform. We use it alongside GPFS, so most of the information we use for operational analytics is primarily on the Hortonworks Data Platform.
Data Science and Data Engineering Leader | Senior Principal Data Scientist at a healthcare company with 10,001+ employees
Real User
Oct 6, 2020
We use Hortonworks as a storage platform and then we create machine learning models and do the execution using Cloudera Data Science Workbench. (Cloudera and Hortonworks merged in January of 2019.)
Senior IT Officer- Head of Administration, System Administration Division for Unix and Linux Servers at a financial services firm with 10,001+ employees
Real User
Jul 28, 2019
We use this solution to look at and manage big data. It's mostly historical data that we offload from our data warehouse, as well as from other databases in other platforms. We have two different installations. The first one is based on IBM POWER CPUs, and the other one is based on Intel CPUs. Our data center is on-premise. There is some thought on moving to a private could, or a private IBM cloud, but we have not proceeded with that as of yet.
Hortonworks actually provides a complete solution and just one user interface that can manage all the packages. It can monitor all the requirements, all the versions and additionally all the quays and all the hardware-dependent services. What I want is a useful user interface which is the reason why I currently prefer to use Hortonworks.
Cloudera Data Platform offers a powerful fusion of Hadoop technology and user-centric tools, enabling seamless scalability and open-source flexibility. It supports large-scale data operations with tools like Ranger and Cloudera Data Science Workbench, offering efficient cluster management and containerization capabilities.Designed to support extensive data needs, Cloudera Data Platform encompasses a comprehensive Hadoop stack, which includes HDFS, Hive, and Spark. Its integration with Ambari...
My main use case for Cloudera Data Platform is dealing with large volumes of data and primarily handling unstructured data by combining structured and unstructured data on this platform. I use Cloudera Data Platform for handling unstructured data primarily in a healthcare company where there are many research notes, which are handwritten notes. Using this platform, we have performed PDF extraction where we store PDF data and then extract the data by performing PDF extraction using this platform. That is one use case. The second use case is mainly dealing with voice files. We store the voice files, convert voice to text, and then perform text analytics on that. It is basically dealing with call center voice files.
My main use case for Cloudera Data Platform is measuring HDFS and the SQL queries in Impala to troubleshoot some error in YARN applications based on Spark, and control the reporting data between Informatica and Cloudera for transport data between the DB Oracle, Mongo DB to CDP in Impala, between HDFS. For measuring HDFS, I use Cloudera Data Platform, specifically Cloudera Manager, to analyze small files in HDFS to reduce our number for the duration of jobs that read this file and the partition date. I mainly use Cloudera Data Platform as part of a large-scale data processing and analytics pipeline in a hybrid cloud environment, primarily on Azure, which involves managing the YARN cluster, monitoring workloads, troubleshooting performance issues, and integrating data ingestion and transformation processes from various enterprise systems. We leverage CDP for its scalability, security, and strong integration with Looker, Informatica, Hive, and Spark.
Handling and processing big volumes of data is my main use case for Cloudera Data Platform. We get the instrument data from various providers, and we process them, do reconciliation, and use Cloudera Data Platform to process it and ingest it in a structured manner which is then used by our downstream consumers. One unique aspect about my main use case with Cloudera Data Platform involves multiple application teams building their workflows on the platform. I don't have all the insights into other aspects.
The primary usage of Cloudera Data Platform ( /products/cloudera-data-platform-reviews ) is to offload ETL processes because it's cheaper compared to data warehouse solutions like Teradata ( /products/teradata-reviews ) or Oracle. Furthermore, basic reporting can be done, and some real-time processes can be managed.
We heavily use Cloudera Data Platform for data science activities. Various departments in the company utilize it as a sandbox for data discovery. We have multiple data pipelines running on a daily and hourly basis, along with some real-time data pipelines.
We use it for multiple domains, including oil & gas, finance (Morgan Stanley), and healthcare. We process around 186 TB of data per day for analytics purposes. Currently, we use it for healthcare domain.
We use Hortonworks Data Platform for data management, significant data ingestion, and analytics.
There are a lot of use cases for the Hortonworks Data Platform. We use it alongside GPFS, so most of the information we use for operational analytics is primarily on the Hortonworks Data Platform.
We primarily use the solution for data storage and processing.
We use this solution for the hospitality industry.
We use Hortonworks as a storage platform and then we create machine learning models and do the execution using Cloudera Data Science Workbench. (Cloudera and Hortonworks merged in January of 2019.)
We use this solution to look at and manage big data. It's mostly historical data that we offload from our data warehouse, as well as from other databases in other platforms. We have two different installations. The first one is based on IBM POWER CPUs, and the other one is based on Intel CPUs. Our data center is on-premise. There is some thought on moving to a private could, or a private IBM cloud, but we have not proceeded with that as of yet.
Hortonworks actually provides a complete solution and just one user interface that can manage all the packages. It can monitor all the requirements, all the versions and additionally all the quays and all the hardware-dependent services. What I want is a useful user interface which is the reason why I currently prefer to use Hortonworks.
We use it for data science activities.