Amazon Redshift: Distribution Style and Query Performance
Posted by Data, Data Governance, Data Analytics
on October 23, 2014In preparation for AWS Re:Invent, we’ll be posting weekly with our tips for optimizing queries, optimizing your Amazon Redshift schema, and workload management. Download our Amazon Redshift white paper below.
Optimizing your Amazon Redshift Schema
Chartio gives you a lot of leverage over large databases stored in Amazon Redshift. But because you’re executing complex queries against extremely large datasets - planning ahead will yield big benefits in both performance and cost.
Choose your distribution style carefully
When you load data into a table, Amazon Redshift distributes the rows of the table to compute nodes. This is done according to the distribution style you choose for your table. The distribution style determines the balance of parallel processing across the compute nodes as well as the amount of redistribution needed for joins and aggregations.
An even distribution of data on the cluster gives you the most parallel processing power. Choose a distribution style that ensures each compute node is processing a portion of the work in parallel.
Joins and grouped aggregations will also benefit from planning ahead. Joins will always be faster if the tables being joined are distributed on the same key, so that all of the rows that need to be joined are collocated. Grouped aggregates will always be faster if each group’s rows are collocated.
Select a distribution style that maintains an even distribution of data while minimizing redistribution as much as possible.
Want to Learn More?
Download our white paper on optimizing query performance inside your Amazon Redshift cluster to learn more about optimizing queries with common best practices, designing your Amazon Redshift schema and defining query queues in workload management to increase performance and lower costs.