Amazon Redshift WLM Queue Time and Execution Time Breakdown

Data Tutorial Amazon Redshift

Determining how much time your queries are spending either in the Workload Management (WLM) Queue or executing on your Amazon Redshift source can go a long way to improving your cluster’s performance. For more on what a WLM queue is read here.

In this tutorial we will show you a fairly simple query that can be run against your cluster’s STL table analyzing the amount of time queries spend in your WLM’s queue and how long they execute against the source. This query will have an output of five columns, and they are:

  • Date
  • Total Queue Time
  • Total Exec Time
  • Total Time
  • Percent WLM Queue Time

The Breakdown

  1. Date: This column is the date on which the queries being analyzed were run.

  2. Total Queue Time: This column shows the total amount of time queries on the given day spent waiting for an available connection on the source being analyzed.

  3. Total Exec Time: This column shows the total amount of time queries on the given day spent executing against the data source.

  4. Total Time: This column sums the previous two columns which will indicate how long it took for the queries on this source on the given day to return results to you.

  5. Percent WLM Queue Time: This columns breaks down how long your queries were spending in the WLM Queue on the given day.

The Query

SELECT 
	trunc(w.service_class_start_time) AS "Day", 
SUM(w.total_queue_time) / 1000000 AS "Total Queue Time", 
SUM(w.total_exec_time) / 1000000 AS "Total Exec Time", 
SUM(w.total_queue_time + w.total_exec_time) / 1000000 AS "Total Time", 
(SUM(w.total_queue_time)::FLOAT/ SUM(w.total_queue_time + w.total_exec_time)::FLOAT) AS "Percent WLM Queue Time" 
FROM 
stl_wlm_query w 
WHERE 
w.service_class = {SERVICE_CLASS}
  AND w.service_class_start_time BETWEEN {CALENDAR.START} AND {CALENDAR.END}
GROUP BY 
trunc(w.service_class_start_time)
ORDER BY 
trunc(w.service_class_start_time) desc

A Few Extra Notes

In the where clause, the date span is narrowed and that is to keep the load on the STL limited. In analyzing a particular source, keeping the analysis to around a week’s time is usually enough to make some diagnosis of a problem.

We also limit the service class in the where clause and this is because Amazon Redshift creates WLM query queues based on the service class. Amazon’s docs describe it this way:

“Amazon Redshift WLM creates query queues at runtime according to service classes, which define the configuration parameters for various types of queues, including internal system queues and user-accessible queues. From a user perspective, a user-accessible service class and a queue are functionally equivalent. For consistency, this documentation uses the term queue to mean a user-accessible service class as well as a runtime queue.”

Thus, when we limit the service class to a specific one, we can isolate that particular queue and leave out the other queues being created on this source as they do not necessarily impact each other here.