While analyzing the Amazon Redshift data source performance there may be some evidence of error messages or alerts that redshift will present to you. In that case, you can create a query against the STL Tables that will not only show the event, but also a potential solution based on Amazon Redshift’s own documentation.
In this tutorial we will show you a fairly simple query that can be run against your cluster’s STL table analyzing these alerts, the solutions as well as counting the number of times that alert has occurred in a given period. This query will have an output of 3 columns, and they are:
- Event
- Solution
- Times Occurred
The Breakdown
- Event
- This column is the actual event text that Amazon Redshift returns to you upon the event of an error.
- Solution
- This column shows the potential solution from the Redshift documentation.
- Times Occurred
- Using count (*) this column will show the number of times this error has occurred in the time period you specify in the WHERE clause.
The Query
SELECT
trim(split_part(l.event,':',1)) AS event,
trim(l.solution) AS solution,
count(*) AS "times occurred"
FROM
stl_alert_event_log AS l
LEFT JOIN stl_scan AS s
ON s.query = l.query AND s.slice = l.slice AND s.segment = l.segment AND s.step = l.step
WHERE l.event_time >= dateadd(day, -7, current_date)
GROUP BY 1,2
ORDER BY 2 DESC
A Few Extra Notes
In the where clause the date span is narrowed and that is to keep the load on the STL limited. In analyzing a particular source, keeping the analysis to around a week’s time is usually enough to make some diagnosis of a problem.
As there will be additional information pertaining to the specific instance of an event’s occurrence, we are using the TRIM function to remove everything that appears after the “:” to show the count of an event instead of the count of the specific event’s occurrence which would likely always be 1. This is done to give a more general view of an error’s occurrence count as information on a specific occurrence is not relevant.