The simplest option for storing data for later analysis is a CSV. While Chartio can connect to your CSVs directly, we think it is better to have more options to get your data on Chartio. One way to do that is to upload your CSV to your Amazon Redshift cluster and connect Chartio through Redshift, and you can do that by uploading your CSV to Amazon S3.
Creating an S3 Bucket
Create an S3 bucket on your Redshift Cluster. If your bucket and cluster are not in the same region, you may need to run additional charges for data transfer.
a. Create a new user. Click on IAM in the AWS services dashboard. Click Users and then Add User.
If your cluster has been used by your organization before, you will likely see a list of users that have already been added on the IAM home screen. You should also see a button that says Add user.
a. Create a unique username.
b. Enable programmatic access so your user has an access key ID and secret access key.
c. Save the Access key ID and Secret access key.
Add an inline policy using the policy generator. Use the dropdowns to select the correct action, and then add in the name of the S3 bucket that you created.
a. Using the policy generator will create a JSON Blob that looks something like this:
“Action”: [“s3:*” ],
Copying Data to Redshift
Create the table in your Redshift instance. This will connect to Chartio and will be what you query from.
An example of the CREATE TABLE command:
CREATE TABLE demo_data
id INTEGER NOT NULL,
campaign_id CHAR(256) NOT NULL,
month DATE NOT NULL,
city CHAR(256) NOT NULL,
state CHAR(256) NOT NULL,
amount CHAR(256) NOT NULL
Access your Redshift Instance through a SQL client in order to execute a create table statement. Good examples to do this are DataGrip, SQL Workbench (free), re:dash, SQuirreL, and Windows SQL Client.
a. Download and install Amazon’s Redshift JDBC or ODBC driver.
Get data into your Redshift Instance using the COPY command. Create a manifest file to make sure that the COPY command loads all the fields properly. An example:
b. Write a COPY command to load your CSV’s data into your table that you created on your Redshift Instance.
Sections of COPY command:
COPY: Where you input your table’s schema name and table name.
FROM: Where you direct your table to look in the manifest file you created.
WITH CREDENTIALS <access key id; secret key>: When you created your S3 bucket, we mentioned the importance of the access key and secret key and this is where it comes into play.
REGION ‘us-east-1’: The AWS region of your S3 bucket.
MANIFEST: This tells your COPY command that the path in the FROM clause is to a manifest file.
The FORMATTING and OPTIONAL parameters will be helpful in ensuring your data is successfully input into your table.
WITH CREDENTIALS ‘aws_access_key_id=;aws_secret_access_key=’
DATEFORMAT AS ‘MM/DD/YYYY’
- Datetimes can be difficult to work with; if you run into errors while loading a field that contains dates or date times:
- Use ACCEPTANYDATE to force Redshift to accept your date no matter what.
- Or, use DATEFORMAT AS “MM/DD/YYYY” to dictate a specific formatting.
- ACCEPTINVCHARS will be very helpful as it will force Redshift to accept invalid characters by replacing then with a “?”.
- DELIMITER ‘,’ will tell Redshift that your delimiter is a comma since this is a CSV.
IGNOREHEADER 1 will tell Redshift to ignore your CSVs header line.
Supported data types in Amazon Redshift: click here
Amazon’s steps for creating an S3 bucket and uploading a CSV: click here
Data sources in Chartio: click here
CSVs in Chartio: click here
Amazon Redshift: click here
Loading data into Amazon: click here
Using a manifest file on AWS: click here
AWS’ COPY command documents: click here
AWS optional parameters: click here
Connecting Chartio to a Redshift Instance: click here
AWS documentation on STL load errors: click here