What is a Cohort Analysis?
A cohort analysis is a powerful and insightful method to analyze a specific metric by comparing its behavior between different groups of users, called cohorts. This type of data analysis is most often segmented by user acquisition date, and can help businesses understand customer lifecycle and the health of your business and seasonality.
Use Cases for a Cohort Analysis
Since cohorts are most commonly segmented by a time-dependent grouping, such as user acquisition or user start date, it can be used to glean a number of insights on key metrics for your business:
-
Monthly revenue growth based on time of year - allows you to see if new cohorts are more or less valuable than previous cohorts
-
Determines if your business gets impacted by seasonality
-
How updates to your website impact user acquisition
-
Customer Lifetime Value
-
Customer Churn
-
Customer Retention Rate
How to Perform a Cohort Analysis to Track Customer Retention Rate
Customer retention rate can be measured by using a cohort analysis in Chartio. The idea is that you’ll track the customer retention rate by grouping users by their sign up date (such as daily, weekly, or monthly, depending on your business objectives and pricing model).
To perform a cohort analysis in Chartio to track customer retention rate, there are three pieces of data needed:
- Who you are tracking
- Sign up date
- Metric value on all subsequent dates after sign up
All of this data can usually be found in an event table with a user ID and event timestamp. We’ll bring this together in a SQL query and add it to Chartio as a custom table in the data source. The assumption for this is your data source has the ability to do common table expressions (CTEs). If not, you’ll need to do this as subqueries.
To begin forming your query you’ll want to answer two questions: “who are we tracking?” and “when did they first sign up?”.
You might have a users table which shows you both of those items. Often it will be a user ID
and signup date
. If not, you can generally look at an events table and get the user id with the minimum or MIN
event date as their signup. We’re using the Chartio Demo Data Source which runs on PostgreSQL as our example:
SELECT "Activity"."user_id" AS "User_Id",
DATE_TRUNC('day', MIN("Activity"."created_date"))::DATE AS "First_Event_Date"
FROM "public"."activity" AS "Activity"
GROUP BY "Activity"."user_id";
Next, we just select the user and their event dates to compare to their signup event.
SELECT "Activity"."user_id" AS "User_Id",
DATE_TRUNC('day', "Activity"."created_date")::DATE AS "User_Event_Date"
FROM "public"."activity" AS "Activity"
Finally, we pull the two queries together and then calculate the time between the user’s sign up date and their event date. We’ll do this in the form of a common table expression:
WITH "USER_FIRST_EVENT" as (
SELECT "Activity"."user_id" AS "User_Id",
DATE_TRUNC('day', MIN("Activity"."created_date"))::DATE AS "First_Event_Date"
FROM "public"."activity" AS "Activity"
GROUP BY "Activity"."user_id"),
"EVENT" as (SELECT "Activity"."user_id" AS "User_Id",
DATE_TRUNC('day', "Activity"."created_date")::DATE AS "User_Event_Date"
FROM "public"."activity" AS "Activity")
select "USER_FIRST_EVENT"."User_Id",
"USER_FIRST_EVENT"."First_Event_Date",
"EVENT"."User_Event_Date",
"EVENT"."User_Event_Date"- "USER_FIRST_EVENT"."First_Event_Date" as "Days_Since_Signup",
("EVENT"."User_Event_Date"- "USER_FIRST_EVENT"."First_Event_Date")/7 as "Weeks_Since_Signup"
FROM "USER_FIRST_EVENT"
JOIN "EVENT" on "USER_FIRST_EVENT"."User_Id" = "EVENT"."User_Id"
We included a calculation to determine how long it has been for each event from the signup date in both a day and week calculation. We’ll use these in the tables and charts.
Now that we’ve written our SQL query, we can move on to performing the cohort analysis in Chartio by following these steps:
- Add the above to Chartio as a custom table for that data source
- Once this custom table has been created in the schema, go to the Chartio Data Explorer. (Documentation: Using Your Custom Table)
- From the new ‘Cohort’ table add
User_ID
to ‘Measures’ and set the aggregation to Count Distinct - Add the First Event Date to ‘Dimensions’. At this point you can cohort at the Date level or you can change the data bucketing to Week or Month to set the cohorts at higher levels of aggregation
- Add in the Weeks Since Sign Up to the ‘Dimensions’
- Run the Query and then add a Pivot Step
- The Pivot Step will pivot the Weeks Since Signup which will line up your individual weeks in a table
- Each cohort is now a row in the table, with columns indicating activity in each period after their signup; customer retention rate can be observed from how quickly activity changes over time.
Conclusion
A cohort analysis gives you a deeper understanding of your users and can glean insights on the health of your business. In measuring the customer retention rate metric as a cohort analysis, you can see the value of each customer cohort and understand more about how to retain your customers for long-term growth.