How to Use Amazon Redshift Diagnostic Queries for Determining Queue Times

Data Tutorial

Use these queries to determine your WLM queue and execution times, which can help tune your Amazon Redshift Cluster.

Utilizing an Amazon Redshift data source in Chartio is quite popular, we currently show over 2,000 unique Redshift Source connections and our support team has answered almost 700 tickets regarding Amazon Redshift sources. These questions vary greatly, but a theme that is often discussed is query tuning. Time, size, speed, these are all concerns that live at or near the top of the list of concerns for any computer user especially in this day and age where the difference between winning and losing in business could be, in extreme cases, milliseconds or nanoseconds.

Things may not be that dramatic every day, but when you are experiencing slow queries, or even queries failing to load you will want to do some analysis of your Amazon Redshift instances. This analysis can help you determine if some of your queries can be eliminated due to redundancy or if your queries can be tuned to increase performance.

Luckily Amazon Redshift shares many insights into query tuning and also provides us with diagnostic queries here:

How to Use Amazon Redshift Diagnostic Queries Determining Queue Times

In my previous life as a Customer Success Engineer that site was very helpful getting our clients the answers they needed when they wrote into us regarding Redshift Performance issues. The query I used most was this one:

SELECT TRIM(DATABASE) AS DB,
       w.query,
       SUBSTRING(q.querytxt,1,50) AS querytxt,
       w.queue_start_time,
       w.service_class AS class,
       w.slot_count AS slots,
       w.total_queue_time / 1000000 AS queue_seconds,
       w.total_exec_time / 1000000 exec_seconds,
       (w.total_queue_time + w.total_exec_time) / 1000000 AS total_seconds,
       (w.total_queue_time::float / (w.total_queue_time + w.total_exec_time)::float)*100 AS percent_wlm_queue_time
FROM stl_wlm_query w
  LEFT JOIN stl_query q
         ON q.query = w.query
        AND q.userid = w.userid
WHERE w.queue_start_time >= DATEADD (day,-7,CURRENT_DATE)
AND   w.total_queue_time > 0
AND   w.userid > 1
AND   q.starttime >= DATEADD (day,-7,CURRENT_DATE)
ORDER BY w.total_queue_time DESC,
         w.queue_start_time DESC;

We got this query directly from the webpage I mention above and it tells us quite a bit about this Redshift instance’s performance. It tells us:

⋅⋅* The name of the Database ⋅⋅* The query ID ⋅⋅* Some of the queries syntax so we can tell what query this is ⋅⋅* The service class and slot class which refer to the Workload Management (WLM) configuration (for more on that please read here ⋅⋅* It also tells us the total time this query stayed in the WLM’s queue and how long it took to actually execute, which are then also used to determine the entire time this query took to get from Chartio to the Amazon Redshift Instance and return the data you were looking for.

We added in one additional column into the table and that was percent_wlm_queue_time which we use to determine how much of the time this query spent executing was spent waiting in line in the WLM Queue. This column in particular is useful in diagnosing whether or not the query is the problem or the WLM Queue could use some review.

All of this information in a vacuum isn’t likely enough to do a full diagnosis of your Amazon Redshift WLM Queue Performance but it will help you analyze the queries being sent to your cluster.