Documentation Data Pipeline FAQs

Group email addresses by domain in Data Explorer

Email addresses are commonly used as identifiers in databases and knowing how to manipulate and group by email address domains can be very helpful in day to day analysis operations. Grouping a list of email addresses by domain is super easy in the Data Pipeline.

  1. In our example, we want to get the count of events grouped by email domain. So we’ll drag email into the Dimensions field, and set count of event_id as our Measure. Now, click run query to populate the data in your table.
  2. Next, we’ll add a column in the Data Pipeline called “Domain” that will be used to extract the domain name from the email address. The best way to achieve this is to use SQLite’s substr() function. Since the domain portion of an email address always comes after the @ symbol, we can use the @ character to differentiate between the username and the domain name part of the email address. Here is the substr() formula that we will use to extract our new domain values into our new column:

    substr("Email Column Name",instr("Email Column Name", '@') + 1)

    Extract the domain from the emails

  3. Next, we’ll hide the “Email” column, as we don’t to display it in our final chart. Add a Hide Columns step in the Pipeline and select the “Email” column from the list of available options in the dropdown menu and then click apply and close.

    Hide Columns in the Pipeline to hide the emails

  4. Then, add a Reorder Columns step to place Domain first, and Count second.

    Reorder Columns to order Domain first

  5. Finally, we want to group our values by the email domain. Add a Group step in the Pipeline. You’ll want to group by the “Domain” column, and select the SUM aggregation for your count column. This will return a total sum of each event grouped by the email domain.

    Add a Group step in Pipeline to group by Domain, with SUM as the aggregation

Here’s what our query and final table results look like:

Final Result