Detecting queries that are taking unusually long or are run on a higher frequency interval are good candidates for query tuning. In this tutorial we will look at a diagnostic query designed to help you do just that.
Why is the amount of time a query takes to run important?
During its entire time spent querying against the database that particular query is using up one of your cluster’s concurrent connections which are limited by Amazon Redshift.
If a query is sent to the Amazon Redshift instance while all concurrent connections are currently being used it will wait in the queue until there is an available connection. This sort of traffic jam will increase exponentially over time as more and more users are querying this connection. Long running queries are the rubberneckers of the database world. They use up resources that could be made available for follow other queries and can adversely impact your datasource performance.
Using the query below, you will be able to analyze your Amazon Redshift Instance’s STL tables to provide you with information regarding a specific table and expose the performance information:
- Average Run Time
- Quickest Run Time
- Slowest Run Time
- Last time it was run
- Times aborted
Run times are important because, as we discussed earlier, queries with long run times are using up concurrent connections which is a resource drain. A large maximum run time isn’t all that telling but a large average run time is. An average and minimum run time at or near the max query duration is a big issue and those queries are leading candidates for query tuning. Queries with high average or minimum run times along with a number of aborted attempts should be the first queries you investigate to tune.
The Query
SELECT
trim(database) as db, count(query) AS n_qry,
max(substring (qrytext,1,80)) AS qrytext,
min(run_minutes) AS "min",
max(run_minutes) AS "max",
avg(run_minutes) AS "avg",
sum(run_minutes) AS total,
max(query) AS max_query_id,
max(starttime)::DATE AS last_run,
sum(alerts) AS alerts,
aborted
FROM
(SELECT
userid,
label,
stl_query.query,
trim(database) AS database,
trim(querytxt) AS qrytext,
md5(trim(querytxt)) AS qry_md5,
starttime,
endtime,
(datediff(seconds, starttime,endtime)::numeric(12,2))/60 AS run_minutes,
alrt.num_events AS alerts,
aborted
FROM
stl_query
LEFT OUTER JOIN
(SELECT
query,
1 as num_events
FROM
stl_alert_event_log
GROUP BY
query
) AS alrt
on alrt.query = stl_query.query
WHERE userid <> 1 and starttime >= dateadd(day, -7, current_date)
)
GROUP BY
database,
label,
qry_md5,
aborted
ORDER BY total desc limit 50;
Some very useful columns that are going to come out of this query are:
- avg(run_minutes) as “avg” - the average amount of time this query took to run in the last 7 days
- aborted - The count of times this query was aborted in the last 7 days
- max(substring (qrytext,1,80)) as qrytext - this column will give you the first 80 characters of the queries actual syntax to help identify it in your logs.
This query is sourced from the Amazon Docs website.