Data TutorialsAmazon Redshift

How to Find the Size of Tables, Schemas and Databases in Amazon Redshift

Posted by AJ Welch

Finding the size of everything in your Redshift cluster – including tables, schemas, and even entire databases – is a fairly simply procedure when using a few handy scripts provided by Amazon support. We’ll explore all three methods below.

Getting Table Sizes

The simplest method for listing the size of your database tables is to use a handy and official script provided by Amazon Redshift Utilities, table_info.sql.

This script looks (and, in fact, is) rather complex, but in essence it can be executed to quickly return a list of all the user tables in the system, along with various information about each table. The crucial column for our purposes is mbytes, which indicates the total size, in megabytes, of that particular table.

The results table will look something like this, which is automatically sorted from largest to smallest table:

schema  table     tableid distkey skew    sortkey #sks  rows    mbytes  enc pct_of_total    pct_stats_off   pct_unsorted
public  users     100074  userid  1.00    userid  1     49990   84      Y   0.02            0.00            0.00
public  sales     100087  listid  1.00    dateid  1     172456  52      Y   0.01            0.00            0.00
public  date      100080  dateid  1.00    dateid  1     365     44      N   0.01            0.00            0.00
public  listing   100085  listid  1.00    dateid  1     192497  44      Y   0.01            0.00            0.00
public  event     100083  eventid 1.00    dateid  1     8798    36      Y   0.00            0.00            0.00
public  venue     100076  venueid 1.00    venueid 1     202     32      N   0.00            0.00            0.00
public  category  100078  catid   1.00    catid   1     11      28      N   0.00            0.00            0.00

Getting Schema Sizes

While Amazon hasn’t created a specific script for this purpose, we can easily repurpose an existing script for the job of finding the total schema size. First, copy and execute the code for the v_space_used_per_tbl.sql view script to create a copy of that view in your Redshift cluster. Note: If you receive an error about the admin schema, you likely are missing that schema from the system. Try changing the schema in the first line from admin to something that exists (usually public).

Now we can use this view to grab just the information we need. In this case, we want to group the results by the schema and total the size of all tables therein to find the actual size of our schemas. This is accomplished with the following execution of the installed view:

  SUM(megabytes) as total_mb

The results will look something like this, displaying the list of schemas in the system, along with the total megabytes of disk space used by that schema:

schemaname   total_mb
public       320

Getting Database Sizes

Finding the size of your databases is nearly identical to the method used above for the schema sizes. Again we’ll use the already installed v_space_used_per_tbl view, but this time group by the database instead of schema:

  SUM(megabytes) as total_mb

The results should look familiar to you:

dbase_name   total_mb
public       320