5 Tips for Selecting the Right Data Warehouse

Posted by gilad on March 6, 2018 Data, Data Stack, Database, Data Governance

A fundamental aspect of any Business Intelligence strategy for modern enterprises is data-driven decision making. Organizations know that gut instinct and observation alone do not make for adequate business decisions, and there is a real need now to ensure data lies at the heart of key enterprise decisions.

However, there is a problem that needs solving with data-driven decision-making: how can organizations collate all their various sources of data in one centralized repository for use with BI tools? After all, data sources, formats and systems in modern enterprises are disparate, meaning there is a real need to integrate all this data in one place so it can be properly analyzed. This is where a data warehouse comes in useful.

In this article, you’ll find out what exactly a data warehouse is, information on data warehouse architecture, and most importantly, tips on how to select the right data warehouse for your company.

##What Is A Data Warehouse?

A data warehouse is any system used as a central repository to house integrated data derived from multiple data sources at an organization. The data warehouse supports reporting, analysis, and decision-making by consolidating all this data at an aggregate level.

The American computer scientist Bill Inmon, often regarded as the father of the data warehouse, defined the data warehouse as subject-oriented, time-variant and non-volatile.

  • Subject-oriented means analysts from a particular subject area, such as finance, can access their subject area data for analysis in the data warehouse.

  • Time-variant means that the data warehouse contains historical data, which contrasts with transactional systems that only contain the most recent data.

  • Non-volatile means that once the data is stored in a data warehouse it won’t and shouldn’t change.

##Data Warehouse Architecture

The main talking point in data warehouse architecture boils down to the Inmon vs Kimball approach. Bill Inmon, the pioneer of data warehousing, suggested a top-down approach in which enterprises build a large centralized data repository where all sources of data are consolidated. Ralph Kimball, a BI expert, offered an alternative bottom-up approach in which the enterprise begins with dimensional data marts serving the analytic needs of specific business departments.

Modern discussions on data warehouse architecture have moved beyond the Inmon vs Kimball debate on to a discussion about the virtues of on-premise versus cloud-based data warehouses.

##Modern Data Warehousing

Cloud computing has led to a paradigm-shift in almost every aspect of IT architecture in recent years. In terms of data warehouses, there’s been a gradual shift from enterprises using traditional on-premise systems towards cloud-based data warehouse services.

The cloud provides a convenient and cost-effective way to access computing services because there is no need for costly investments in hardware. In the cloud, the organization that consumes cloud-based services simply pays for the service provided and, in some cases, the use of computing resources that are used to deliver those services.

As research company Gartner noted in 2016, more organizations are considering cloud-based deployments for their analytics environments, disrupting the traditional on-premise data warehouse market.

##Data Warehouse Use Cases and Vendors

Regardless of the infrastructures used to deploy a data warehouse, enterprises in many different industries can benefit from their deployment. Beyond the broad notion that data warehouses help to support business decision-making, some particular use cases include:

  • Reviewing and analyzing logistics and operations, using the results of these analyses to improve and optimize how the business runs as a unit.

  • Improve relationships with customers, both prospective and current.

  • Track, analyze and improve company performance/profitability.

  • Analyze sales processes and make them more efficient.

Vendors providing on-premise data warehouse systems include Microsoft, SAP, HP and IBM. Such systems are often database software optimized for analytics and data warehouse workloads, and the company still has to purchase the appropriate hardware to support their software.

Teradata, IBM, and Oracle can provide a combination of hardware and software in a full-fledged on-premise data warehouse package.

In terms of cloud offerings, remember that these systems are data warehouses as a service, meaning no investment in physical hardware is needed beyond the computers used to access and analyze the data over an internet connection. Amazon Redshift, Google BigQuery, Panoply and Azure SQL Data Warehouses are some of the big players in the cloud-based data warehouse market.

##Tips For Selecting Your Data Warehouse

The various data warehouse architectures and slew of products and services offered by many vendors can make the choice of data warehouse a touch dizzying. Here are some tips on how to get your selection right by focusing in on the most important aspects of any data warehouse.

1. Network Latency Is Not The Main Factor

There is much confusion about speed in cloud-based services versus on-premise deployments. The truth is that any speed constraints imposed by having to access your data warehouse over a network do not have as much of an effect on performance as the decision-makers at organizations often assume.

With local servers, latency is of course less an issue with on-premise systems than cloud servers located far from an enterprise’s operations, however, the speed difference is often negligible. Performance with a cloud-based system can be as good, if not superior to an on-premise data warehouse in some cases.

2. Opt For Cost Transparency

Cost is another imperative consideration. For an on-premise deployment, the costs of just building a data warehouse can run into tens of thousands of dollars. Such costs don’t even take into account the extensive administration and management costs of keeping these systems functioning.

With regards to the cloud, costs can vary extensively across the different vendors. The confusion over costs arises because the different vendors offer markedly different pricing structures.

Amazon Redshift, for example, charges you based on the type of computing instances you use to house your data while Google BigQuery charges for each query in addition to charging for storage. The latter stricture, employed based on queries, can lead to challenges in estimating costs and problems arising from that uncertainty, including limiting queries because costs are unpredictable. It is best to opt for the most transparent cost structure that fits your company’s budget.

3. Does The Product Meet Compliance Rules?

This question is particularly relevant for cloud-based data warehouse services. When choosing a data warehouse product, you must map your company’s own compliance requirements against the standards enforced by data warehouse service providers.

For example, HIPAA compliance laws strictly govern security standards for patient data and other healthcare information. Any organization in the healthcare sector would need to ensure their data warehouse complies with HIPAA regulations. Other rules and regulations exist for other industries governing how sensitive data should be handled.

4. Ensure High Availability

Regardless of whether your chosen data warehouse is an on-premise system or a cloud service, high availability should be a prime concern. The move towards real-time analytics and the increased reliance on data to make decisions means that these systems must offer a high level of availability.

In the cloud, many of the major vendors offer impressive availability with a high uptime percentage. Furthermore, top cloud vendors typically offer data replication and backup with their data warehouse services. However, outages can still happen in the cloud, so these services aren’t immune from downtime.

5. Look For Easy Scalability

Scalability is an area in which cloud-based data warehouse services really excel. As an enterprise grows, it accumulates more data, meaning more computer power is eventually needed to help analyze all this data efficiently.

For on-premise systems, scaling the data warehouse up requires expensive investments in new hardware, high costs of setting the systems up, and increased power consumption. In the cloud, scaling up is usually as simple as requesting more resources from the provider, meaning no time-consuming and costly investments in extra IT infrastructure.

##Closing Thoughts

Now that you understand data warehouses, modern data warehouse architecture, and what to look out for in any data warehouse system or service, you can move on to individually examining the products and services available from some of the main vendors. Always remember to opt for the system that best meets your company’s requirements, and don’t invest in unnecessary computing power that you may not utilize; you can always scale upwards.