Documentation Data Pipeline FAQs

Group email addresses by domain in Data Explorer

Email addresses are commonly used as identifiers in databases and knowing how to manipulate and group by email address domains can be very helpful in day to day analysis operations. Grouping a list of email addresses by domain is super easy in the Data Pipeline.

  1. For this example, we use our Dundersign Demo Data data source and add the Email column from the Contact table to the Measures field and the Id column from the Email Event table to the Dimensions field. Make sure to use Count of distinct for the Email aggregation and click Run Query to populate the data.

  2. Once we have our initial query, we add an Add Column step to our Pipeline. We give it the name Domain, select Custom formula as the Formula Type, and use SQLite’s substr function to extract the domain name from each email address:

     substr("Email Column Name",instr("Email Column Name", '@') + 1)
    

    Since the domain of an email address always follows the @ symbol, we can use the @ character to differentiate between the username and the domain of the email address.

    Extract the domain from the emails

  3. Next, we hide the “Email” column since we don’t to display it in our final chart.

    Hide Columns in the Pipeline to hide the emails

  4. We want our “Domain” column to be the leftmost column in our result set, so we use a Reorder Columns step to do that.

    Reorder Columns to order Domain first

  5. Finally, we want to group our values by the email domain, so we add a Group step in the Pipeline. We select GROUP for the “Domain” column and SUM for the “Email Events” column, which returns a total sum of events grouped by the email domain.

    Add a Group step in Pipeline to group by Domain, with SUM as the aggregation

  6. For a cleaner look, we rename the “SUM(Email Events)” column to Total Events.

Here’s what our query and final table results look like:

Final Result