Calculating Your Data Warehouse Cost

Posted by tina on December 13, 2016 Data Governance

[Editor’s Note: This is our seventh installment in our “Data Warehouse Blog Series.” In our previous installment, we analyzed how Amazon Redshift and Google BigQuery handles security. Click here to read our previous post An Overview on Amazon Redshift and Google BigQuery Security.]

As data becomes more voluminous, accessible and cost efficient—every business must leverage data for their advantage. Of the data warehouses on the market, Amazon Redshift and Google BigQuery are the most cost effective solutions without compromising efficacy.

Amazon Redshift

Having been on the market for some time, AWS has a large and mature ecosystem with more than 10 years of experience that offers a full suite of cloud computing services (EMR, RDS, Kinesis, S3, EC2) that has years over the Google Cloud Platform ecosystem. If you’re already invested in the AWS ecosystem, it’s cost effective to continue to purchase their services.

Amazon Redshift offers on-demand pricing (see below) and several Reserved Instance (RI) options of up to 75% discounted over on-demand pricing. This pricing model leaves users with the ability to construct their data warehouse according to their business needs.

With on-demand pricing, Amazon Redshift charges per-hour per-node, which includes both compute and storage. This pricing model is predictable as users are able to run as many queries as necessary without being penalized by a high cost.

Pricing starts at $0.25 per hour for 160GB of data. A cluster of 8 X dc1.large nodes costs $1,440 for a 30-day month. Additionally, users can purchase two nodes (users get the leader node for free, so in total there are three nodes) for $360 for a 30-day month.

However, it’s important to note that since Amazon Redshift has fixed compute (SSD) and storage (HDD), scaling one requires scaling the other and therefore attributes to overall cost.

Google BigQuery

Since Google BigQuery is able to separate compute and storage, it allows for an extremely flexible pay-as-you-use pricing model (it charges by GB usage, starting at $0.02 per GB per month) This has allowed companies with smaller data sets to experiment with a data warehouse without running up a large purchase order. However, it’s critical to note that running queries is a separate cost from storing data.

For Google BigQuery, query pricing is the cost of running your SQL commands and user-defined functions and changes by the number of bytes processed. Query pricing has three tiers: 1TB for free, $5 per TB for on-demand tier and $40,000 for 2,000 queries for flat-rate pricing.

Though Google has been transparent that flat-rate pricing is aimed for the enterprise, Google BigQuery does offer a handful of free capabilities including: loading data, copying data, exporting data and metadata operations.

A downside to the pay-as-you-use model is that pricing is less transparent and hard to predict for long-term budgeting. Users have stated that complex analytics and query errors can end up in an unexpected cost. However, BigQuery now provides a Cost Control feature that enables users to set a cap to daily costs. As the Google BigQuery administrator, you’ll proactively have to manage the daily quota querying and once the cost effective query limit has been reached, they’ll have to communicate it with their team and cease performing queries until the cost has rebalanced.

Conclusion

For scaling startups to large enterprise companies, cost can greatly fluctuate. For companies that want more predictable pricing, Amazon Redshift is ideal with pricing starting at $0.25 with the up to a 70% discount through Reserved Instance.

On the other hand, Google BigQuery now has a predictable flat-rate pricing of $40,000 per month. However, this is targeted for enterprise companies as smaller companies may not be able to rationalize the high cost. In calculating Google BigQuery’s monthly cost, it would require more strict query monitoring and storage.

To learn how innovative companies like Reddit and Everlane are using Amazon Redshift and Google BigQuery, download our white paper What to Consider When Choosing Between Amazon Redshift and Google BigQuery now.