When evaluating Data Warehouse solutions, what aspect do you think is the most important to look for?

Let the community know what you think. Share your opinions now!

Ariel Lindenfeld - PeerSpot reviewer
Director of Community at PeerSpot
  • 14
  • 69
PeerSpot user
12 Answers
Mar 9, 2017

It depends on the company requirement. Performance of database when loading and retrieving data is very important for any data warehouse.

Search for a product comparison in Data Warehouse
it_user198933 - PeerSpot reviewer
Chief Data Architect at Lucid Technologies & Solutions
Mar 4, 2017

I would look at it from how the solution addresses two types of requirements - call it Functional and Non-functional or Business and Technical requirements.
This classification becomes critical as the datawarehouse solutions have now evolved from mere high-power databases with enhanced storage and performance optimization features to a key Information management solution in the enterprise.

From Business perspective, look at:
- Analytical capability it can support (likes of SAP HANA have all the analytical layers built within the datawarehouse not needing a high-end analytical tool!)
- Variety of data to support (structured and unstructured - how they can co-exist with the big data solutions)
- Capability to support or integrate with other data management solutions like Master Data Management, Data Governance, Data Quality solutions (isolated DWH solution does not show any business value and will die a slow death)
- Ability to provide or support accelerators such as out-of-the-box industry data models and other Agile development methods (avoids DWH projects becoming multi-year projects delivering no ROI)

From Technical perspective, look at:
- Total cost of ownership - Cloud-based vs appliances vs on-prem database platform
- Scalability and Performance features - in-memory architecture, row-based vs columnar, parallel processing(MPP) arch etc
- Storage optimization - data compression capability, caching etc
- Security (critical for regulated industries) such as support for masking, advance audit techniques and access control

The key point I want to bring up again is that it is just not the technical perspective that should drive the selection.

it_user429198 - PeerSpot reviewer
IT Specialist at a tech services company
Real User
Feb 28, 2017

Ease of use in finding and retrieving data. Having the data stored without providing the meta data for the user is of no value. it needs an interface with access to data that provides more than key word search.

it_user550089 - PeerSpot reviewer
Vertica Support Engineer at a media company with 10,001+ employees
Feb 27, 2017

A very open-ended question.
When evaluating a DW initiative, the most important thing is to start off with the right questions, many of which have already been touched-on directly or indirectly by others here.

Know your data and what you want to do with it; how much do you already have, and for how long do you want to keep it in the DW?
How many people will be querying it at the same time?
What kind of response time do they expect?
Which DW platforms are compatible with your existing IT staff skillset? What skillsets would you have to hire?
Is the data ready to be loaded into the DW? It seldom is - will additional staff be needed for this?

it_user109182 - PeerSpot reviewer
BI Consultant at a consultancy
Feb 23, 2017

About my experience, and seeing how the TICs are moving today, I think that there are two principals factors to consider when one wants to invest in a DW platform. These are: Innovation and support...The first assure you about a continuous modern platform over the years, and the second assure you that this platform runs at the level you want.

it_user339138 - PeerSpot reviewer
Senior Solutions Architect at Mirantis
Feb 23, 2017

It actually boils down to the amount of Sequential IO that can be pushed and massaged. Each physical core of a data warehousing host can consume about 1 gbps of data... In order to keep all the cores working on the data it has to get moved from storage into ram... you need to measure the Maximum Consumption rates and the Maximum Throughput rates for Sequential IO and determine whether there are any bottlenecks to the performance of queries acting against the sequential data.

Learn what your peers think about Oracle Exadata. Get advice and tips from experienced pros sharing their opinions. Updated: May 2023.
708,544 professionals have used our research since 2012.
it_user158343 - PeerSpot reviewer
Software Architect at a tech consulting company with 51-200 employees
Real User
Feb 23, 2017

IMHO, it is the wrong approach to consider one single aspect as the driver for comparing DW solutions from vendors.

Over the years, many aspects have become important for a DW solution:

ETL tools included in the solution, in particular, tools must include required transformations for Dimensions and Fact Tables, like Slowly Changing Dimension, just to name one

Data Quality tools included in the solution

Master Data Services tools included in the solution

Reporting tools included in the solution
Level of integration between ETL tools, Data Quality tools and Master Data Services tools included in the solution

Proper extensibility support in ETL tools included in the solution

Proper table partitioning (data or horizontal partitioning) support included in the solution

Proper columnstore indexes support included in the solution

These are the important aspects that a competitive DW solution should include.

Data Architect at World Vision
Real User
Top 5Leaderboard
Feb 23, 2017

It 100% depends on your needs in terms of data volume, availability, # of potential concurrent users, # of data sources, complexity of data sources. I'm assuming by platform you mean tools/software/hardware? Are you thinking BI as well or just backend?

Cloud-based solutions should be part of the consideration but that again is based on your organization's needs and abilities - data center capacity and support. Cost always matters no matter where you work and we all have budgets to work with so saying scalability is most important or something else doesn't make sense if you don't need scalability or can't afford it.

In other words, if your need is just solving reporting problems for a single source system (not really a DW), your source is only say 100 GB and you have a dozen total users, then buying 10 node Teradata MPP means you have lost your marbles. So scalability in that situation is irrelevant because almost anything will satisfy your needs.

On the other hand if you may have a petabyte of total data and terabyte DW database then get out your wallet and buy top of the line...Informatica or Datastage are probably best bets for ETL, and Teradata or maybe Azure data warehouse would be on top of my list for platform.

it_user433215 - PeerSpot reviewer
Senior Architect at a agriculture with 1,001-5,000 employees
Feb 23, 2017

That very much depends on your requirements. What’s the usage profile of your planned data warehouse solution? Your use cases drive the evaluation and the criteria used to weighted-rate aspects of potential solutions.

- Stefan

it_user343617 - PeerSpot reviewer
Executive Vice President - Sales at a tech vendor with 51-200 employees
Feb 23, 2017

Scalability, flexibility, performance, and ease of management are key. While a traditional RDBMS (originally designed for OLTP) can perform Data Warehouse tasks, you will always be better off with a purpose built platform that was designed from the ground up specifically for Data Warehouse data loading/ingest, report processing, and ideally in place analytics. Highly flexible, cloud-based solutions have reached a level of maturity and a price point that are extremely compelling - take a look at Snowflake.

it_user240024 - PeerSpot reviewer
Managing Director at Accenture
Real User
Feb 23, 2017

The type of data warehouse platform!

it_user396969 - PeerSpot reviewer
Intern at Pitney Bowes
Real User
Feb 29, 2016

The scalability of the data warehouse and its ability to handle multiple queries at the same time without loading the servers.

Related Questions
Content Manager at PeerSpot (formerly IT Central Station)
May 17, 2022
Why would you choose that one?
See 1 answer
Tech blogger
May 17, 2022
When I compared various data warehouse tools and solutions, I found Snowflake’s software as a service (SaaS) platform and Oracle Exadata to be the most effective data warehouse solutions currently available on the market. One of the things that I initially noticed about Snowflake’s software as a service (SaaS) platform was how it made my operations more efficient by enabling me to search for and find relevant data in a more efficient way than had previously been possible. Snowflake allows me to create a custom storage unit for all of my critical data. Part of this customization includes the ability to make any and all data searchable. As soon as I started to use it, it began to show its value. All of the data that I have stored in Snowflake’s virtual warehouse becomes easy for me to locate. Instead of spending long periods of time seeking the particular piece of data that I need, all I have to do is to type in a search term. This will immediately call up the information that I want to find. This aspect of the solution makes it immensely valuable. It enables me to save time that I can then devote to other critical tasks. A major advantage that Snowflake offers me is that it gives me the ability to perform a number of different functions with a single solution. It is a highly flexible solution and allows me to store organized processed data, centrally store raw data that has yet to be processed, process data through data engineering, examine data using data science, develop data applications, and securely share and take in real-time or shared data. It empowers me to make my data really and truly my own. I am able to make full use of my data and shape it in whatever way my needs dictate. Two aspects of the Oracle Exadata Database Machine that I really appreciate are its scalability and its ease of use. This solution makes it so that I can expand my digital warehouses virtually limitlessly. If I needed to, the Oracle Exadata Database Machine would enable me to scale my data warehouses to hold 31 petabytes of data. I can easily meet my data needs without having to worry about running out space for my data. Additionally, every component of this tool, including its database servers, storage servers, and network, are all ready to use straight out of the box. Everything about this solution is pre-configured, pre-tuned and pre-tested before we ever received it from Oracle. All of the components work in perfect harmony without outside intervention. This means that I don’t have to struggle to deploy it or do very much to get all of the features to work in tandem. This solution also offers me the ability to easily move workloads from a data center to the cloud. I am able to migrate my data without having to worry about availability, scaling, or performance. My workloads lose nothing in the transfer. Additionally, the same database options that are available to me on my physical systems are available on the cloud. This allows me to continue using the solution the same way that I had been using it up to this point. Ultimately, either of these two solutions will empower you to take full control of every stage of your data and its lifecycle from its initial storage to its final use.
Director of Community at PeerSpot (formerly IT Central Station)
Sep 9, 2022
What are the relations between them? What are their use cases?
2 out of 7 answers
Consulting Practice Partner - Data, Analytics & AI at FH
Oct 10, 2021
Hi @Evgeny Belenky ​ - great question.  Here is the best answer crafted by Talend  Data Lake Data Warehouse Data Structure Raw Processed Purpose of Data Not Yet Determined Currently In Use Users Data Scientists Business Professionals Accessibility Highly accessible and quick to update More complicated and costly to make change Please read more here https://www.talend.com/resourc...
CEO at WInterCorp LLC
Oct 11, 2021
Many of the comparisons of data lake and data warehouse that you see (such as the one below from Talend) are based on an out-of-date or dumbed-down idea of the data warehouse.   The more advanced data warehouse engines: - support a wide range of data types and formats - can access external data (e.g., in object storage) that has never been ingested - support data scientists as well as business users (e.g., with an ability to run Python, R, SAS routines and data science libraries on data in place in parallel in the data warehouse) - support operational query on live, rapidly changing data While also providing capabilities and services never provided on data lakes or their cloud-based equivalents.  Data warehouses, properly operated and housing data that is properly curated, are much more efficient, cost-effective and performant for data that is intensively shared and widely used. Data lakes are good repositories for data that is more lightly or locally used and does not merit the level of curation usually desired in a data warehouse.
Related Articles
Content Manager at PeerSpot (formerly IT Central Station)
Apr 26, 2022
PeerSpot’s crowdsourced user review platform helps technology decision-makers around the world to better connect with peers and other independent experts who provide advice without vendor bias. Our users have ranked these solutions according to their valuable features, and discuss which features they like most and why. You can read user reviews for the Top 5 Data Warehouse Tools to help you d...
Related Categories
Related Articles
Content Manager at PeerSpot (formerly IT Central Station)
Apr 26, 2022
Top 5 Data Warehouse Tools 2022
PeerSpot’s crowdsourced user review platform helps technology decision-makers around the world to...
Download Free Report
Download our free Oracle Exadata Report and get advice and tips from experienced pros sharing their opinions. Updated: May 2023.
708,544 professionals have used our research since 2012.