Amazon S3 CSV integration

Creating an S3 bucket

  1. Create an S3 bucket on your Redshift Cluster. If your bucket and cluster aren’t in the same region, you may need to run additional charges for data transfer.

    a. Create a new user. Click on IAM in the AWS services dashboard. Click Users and then Add User.

    AWS services dashboard view

  2. If your cluster has been used by your organization before, you’ll likely see a list of users that have already been added on the IAM home screen. You should also see a button that says Add user.

    a. Create a unique username.

    b. Enable programmatic access so your user has an access key ID and secret access key.

    c. Save the Access key ID and Secret access key.

    View your Secret and Access ID keys

  3. Add an inline policy using the policy generator. Use the dropdowns to select the correct action then add in the name of the S3 bucket you created.

    Using the policy generator creates a JSON Blob that looks something like this:

     "Statement": [
    
         {
    
             "Sid": "Stmt1484751442000",
    
             "Effect": "Allow",
    
             "Action": ["s3:*" ],
    
             "Resource": ["arn:aws:s3:::chartio-tim-test/"]
    
         }
    
     ]
    

Copying data to Redshift

  1. Create the table in your Redshift instance. This will connect to Chartio and will be what you query from.

    An example of the CREATE TABLE command:

     CREATE TABLE demo_data
    
     (
    
         id INTEGER NOT NULL,
    
         campaign_id CHAR(256) NOT NULL,
    
         month DATE NOT NULL,
    
         city CHAR(256) NOT NULL,
    
         state CHAR(256) NOT NULL,
    
         amount CHAR(256) NOT NULL
    
     );
    
  2. Access your Redshift Instance through a SQL client in order to execute a create table statement. Good examples to do this are DataGrip, SQL Workbench (free), re:dash, SQuirreL, and Windows SQL Client.

    a. Download and install Amazon’s Redshift JDBC or ODBC driver.

  3. Get data into your Redshift Instance using the COPY command.

    a. Create a manifest file to make sure that the COPY command loads all the fields properly. An example:

     {
    
         "entries": [
    
             {"url":"s3://chartio-tim-test/s3+upload.csv", "mandatory":true}
    
         ]
    
     }
    

    b. Write a COPY command to load your CSV’s data into your table that you created on your Redshift Instance.

    c. Parameters of COPY command:

    Parameter of COPY command Description
    COPY Where you input your table’s schema name and table name
    FROM Where you direct your table to look in the manifest file you created
    WITH CREDENTIALS '<access key id; secret key>' When you created your S3 bucket, we mentioned the importance of the access key and secret key and this is where it comes into play
    REGION 'us-east-1' The AWS region of your S3 bucket
    MANIFEST This tells your COPY command that the path in the FROM clause is to a manifest file
    TRUNCATECOLUMNS  
    ACCEPTINVCHARS Forces Redshift to accept invalid characters by replacing them with a “?”
    FORMATTING Helpful in ensuring your data is successfully input into your table
    OPTIONAL Helpful in ensuring your data is successfully input into your table

    An example:

     COPY public.table_name_s3_demo
    
     FROM 's3://data-source-domain/domain.manifest'
    
     WITH CREDENTIALS 'aws_access_key_id=;aws_secret_access_key='
    
     REGION 'us-east-1'
    
     MANIFEST
    
     DELIMITER ','
    
     IGNOREHEADER 1
    
     DATEFORMAT AS 'MM/DD/YYYY'
    
     TRUNCATECOLUMNS
    
     ACCEPTINVCHARS
    
     FILLRECORD
    

Troubleshooting

  • Datetimes can be difficult to work with; if you run into errors while loading a field that contains dates or date times:
    • Use ACCEPTANYDATE to force Redshift to accept your date no matter what.
    • Or, use DATEFORMAT AS "MM/DD/YYYY" to dictate a specific formatting.
  • ACCEPTINVCHARS will be very helpful as it forces Redshift to accept invalid characters by replacing them with a “?”.
  • DELIMITER ',' tells Redshift that your delimiter is a comma since this is a CSV.
  • IGNOREHEADER 1 tells Redshift to ignore your CSVs header line.

Additional resources