Email addresses are commonly used as identifiers in databases and knowing how to manipulate and group by email address domains can be very helpful in day to day analysis operations. Grouping a list of email addresses by domain is super easy in the Data Pipeline.
- In our example, we want to get the count of events grouped by email domain. So we’ll drag email into the Dimensions field, and set count of event_id as our Measure. Now, click run query to populate the data in your table.
Next, we’ll add a column in the Data Pipeline called “Domain” that will be used to extract the domain name from the email address. The best way to achieve this is to use SQLite’s
substr()function. Since the domain portion of an email address always comes after the
@symbol, we can use the
@character to differentiate between the username and the domain name part of the email address. Here is the
substr()formula that we will use to extract our new domain values into our new column:
substr("Email Column Name",instr("Email Column Name", '@') + 1)
Next, we’ll hide the “Email” column, as we don’t to display it in our final chart. Add a Hide Columns step in the Pipeline and select the “Email” column from the list of available options in the dropdown menu and then click apply and close.
Then, add a Reorder Columns step to place Domain first, and Count second.
Finally, we want to group our values by the email domain. Add a Group step in the Pipeline. You’ll want to group by the “Domain” column, and select the SUM aggregation for your count column. This will return a total sum of each event grouped by the email domain.
Here’s what our query and final table results look like: