How to group email addresses by their domain

Data Pipeline

Grouping a list of email addresses by domain is super easy in the Data Pipeline.

  1. In our example, we want to get the count of events grouped by email domain. So we’ll drag email into Dimensions, and set count of event_id as our Measure.

  2. Next, we’ll add a column in the Data Pipeline called “Domain” that extracts the domain from the email address. This is possible with SQLite’s substr() function. This is our formula:

substr(“Email Column Name”,instr(“Email Column Name”, ‘@’) + 1)

Extract the domain from the emails

  1. Next, we’ll hide the “Email” column, as we don’t need it anymore. Add a Hide Columns step in the Pipeline and select the “Email” column from the dropdown.

Hide Columns in the Pipeline to hide the emails

  1. Then, add a Reorder Columns step to place Domain first, and Count second.

Reorder Columns to order Domain first

  1. Finally, we want to group our values by email domain. Add a Group step in the Pipeline. You’ll want to group by the “Domain” column, and select the SUM aggregation for your count column.

Add a Group step in Pipeline to group by Domain, with SUM as the aggregation

Here’s what our query and final table results look like:

Final result

Click to expand.


Related Data Pipeline Help Articles

See more