Close
Data TutorialsAmazon Redshift

Common Amazon Redshift Alerts

Posted by Tim Miller

While analyzing the Amazon Redshift data source performance there may be some evidence of error messages or alerts that redshift will present to you. In that case, you can create a query against the STL Tables that will not only show the event, but also a potential solution based on Amazon Redshift’s own documentation.

In this tutorial we will show you a fairly simple query that can be run against your cluster’s STL table analyzing these alerts, the solutions as well as counting the number of times that alert has occurred in a given period. This query will have an output of 3 columns, and they are:

  • Event
  • Solution
  • Times Occurred

The Breakdown

  1. Event
    • This column is the actual event text that Amazon Redshift returns to you upon the event of an error.
  2. Solution
    • This column shows the potential solution from the Redshift documentation.
  3. Times Occurred
    • Using count (*) this column will show the number of times this error has occurred in the time period you specify in the WHERE clause.

The Query

SELECT

	trim(split_part(l.event,':',1)) AS event,  

	trim(l.solution) AS solution,

	count(*) AS "times occurred"

FROM

	stl_alert_event_log AS l

	LEFT JOIN stl_scan AS s

	ON s.query = l.query AND s.slice = l.slice AND s.segment = l.segment AND s.step = l.step

WHERE l.event_time >=  dateadd(day, -7, current_date)

GROUP BY 1,2

ORDER BY 2 DESC

A Few Extra Notes

In the where clause the date span is narrowed and that is to keep the load on the STL limited. In analyzing a particular source, keeping the analysis to around a week’s time is usually enough to make some diagnosis of a problem.

As there will be additional information pertaining to the specific instance of an event’s occurrence, we are using the TRIM function to remove everything that appears after the “:” to show the count of an event instead of the count of the specific event’s occurrence which would likely always be 1. This is done to give a more general view of an error’s occurrence count as information on a specific occurrence is not relevant.