Data aggregation for KPIs. The sources of data come in all forms so the data is unstructured. We needed high storage and aggregation of data, in the background.
Software Architect at a tech services company with 10,001+ employees
Gives us high throughput and low latency for KPI visualization
Pros and Cons
- "High throughput and low latency. We start with data mashing on Hive and finally use this for KPI visualization."
What is our primary use case?
How has it helped my organization?
We start with data mashing on Hive and finally use this for KPI visualization. This intermediate step not only mashes data in the form that we want through data Cube slicing, but also helps us save states as snapshots for multiple time frames.
Without this, we would have had to plan another data source for only this purpose. Moving this step closer to processing worked better than keeping it at visualization. Although we can't completely avoid using data stores/snapshots at visualization, this step proved to be promising for getting data ready for better analytics and insights.
What is most valuable?
High throughput and low latency. We start with data mashing on Hive and finally use this for KPI visualization.
What needs improvement?
At the beginning, MRs on Hive made me think we should get down to Hadoop MRs to have better control of the data. But later, Hive as a platform upgraded very well. I still think a Spark-type layer on top gives you an edge over having only Hive.
Buyer's Guide
Apache Hadoop
May 2024
Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: May 2024.
771,063 professionals have used our research since 2012.
For how long have I used the solution?
Less than one year.
What other advice do I have?
I rate it an eight out of 10. It's huge, complex, slow. But does what it is meant for.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free Apache Hadoop Report and get advice and tips from experienced pros
sharing their opinions.
Updated: May 2024
Product Categories
Data WarehousePopular Comparisons
Azure Data Factory
Snowflake
Microsoft Azure Synapse Analytics
Teradata
Oracle Exadata
Amazon Redshift
Vertica
AWS Lake Formation
BigQuery
Amazon EMR
Dremio
Oracle Autonomous Data Warehouse
SAP BW4HANA
IBM Netezza Performance Server
Oracle Database Appliance
Buyer's Guide
Download our free Apache Hadoop Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which data catalog can provide support for BI data sources such as SAP BO and Tableau?
- Which is the best RDMBS solution for big data?
- Apache Spark without Hadoop -- Is this recommended?
- What is the biggest difference between Apache Hadoop and Snowflake?
- Which solution is better for setting up a data lake: Apache Hadoop or Oracle Exadata?
- Oracle Exadata vs. HPE Vertica vs. EMC GreenPlum vs. IBM Netezza
- When evaluating Data Warehouse solutions, what aspect do you think is the most important to look for?
- At what point does a business typically invest in building a data warehouse?
- Is a data warehouse the best option to consolidate data into one location?
- What are the main differences between Data Lake and Data Warehouse?