Amazon Redshift: Distribution Style and Query Performance

Posted by natasha on October 23, 2014 Data, Data Governance, Data Analytics

In preparation for AWS Re:Invent, we’ll be posting weekly with our tips for optimizing queries, optimizing your Amazon Redshift schema, and workload management. Download our Amazon Redshift white paper below.

Optimizing your Amazon Redshift Schema

Chartio gives you a lot of leverage over large databases stored in Amazon Redshift. But because you’re executing complex queries against extremely large datasets - planning ahead will yield big benefits in both performance and cost.

Choose your distribution style carefully

When you load data into a table, Amazon Redshift distributes the rows of the table to compute nodes. This is done according to the distribution style you choose for your table. The distribution style determines the balance of parallel processing across the compute nodes as well as the amount of redistribution needed for joins and aggregations.

An even distribution of data on the cluster gives you the most parallel processing power. Choose a distribution style that ensures each compute node is processing a portion of the work in parallel.

Joins and grouped aggregations will also benefit from planning ahead. Joins will always be faster if the tables being joined are distributed on the same key, so that all of the rows that need to be joined are collocated. Grouped aggregates will always be faster if each group’s rows are collocated.

Select a distribution style that maintains an even distribution of data while minimizing redistribution as much as possible.

Want to Learn More?

Download our white paper on optimizing query performance inside your Amazon Redshift cluster to learn more about optimizing queries with common best practices, designing your Amazon Redshift schema and defining query queues in workload management to increase performance and lower costs.

redshift button