Email addresses are commonly used as identifiers in databases and knowing how to manipulate and group by email address domains can be very helpful in day to day analysis operations. Grouping a list of email addresses by domain is super easy in the Data Pipeline.
For this example, we use our Dundersign Demo Data data source and add the Email column from the Contact table to the Measures field and the Id column from the Email Event table to the Dimensions field. Make sure to use Count of distinct for the Email aggregation and click Run Query to populate the data.
Once we have our initial query, we add an Add Column step to our Pipeline. We give it the name
Domain, select Custom formula as the Formula Type, and use SQLite’s
substrfunction to extract the domain name from each email address:
substr("Email Column Name",instr("Email Column Name", '@') + 1)
Since the domain of an email address always follows the
@symbol, we can use the
@character to differentiate between the username and the domain of the email address.
Next, we hide the “Email” column since we don’t to display it in our final chart.
We want our “Domain” column to be the leftmost column in our result set, so we use a Reorder Columns step to do that.
Finally, we want to group our values by the email domain, so we add a Group step in the Pipeline. We select GROUP for the “Domain” column and SUM for the “Email Events” column, which returns a total sum of events grouped by the email domain.
For a cleaner look, we rename the “SUM(Email Events)” column to
Here’s what our query and final table results look like: