Documentation Data Pipeline FAQs

Omit data using an anti-join in Data Explorer

At times, you may want to exclude rows where two joining datasets have matching values, only keeping the rows that didn’t match for a particular dataset. In this case, we’ll be interested in keeping the rows from the dataset on the left where they did not match with the dataset on the right. This is called an anti-join, and the way to carry this out in Chartio is in the Data Pipeline.

In the following example, we used our Dundersign Demo Data to create a custom column that includes eight cities we’d like to exclude from our results. We performed the anti-join by doing the following:

  1. For the first dataset, add the column with the rows you’d like to exclude in your final result set. You need to add the same column twice since you’ll need to use the second column as a means to filter after joining the two datasets—if it’s not clear now, you’ll see what we mean a bit later.

    Add a column

  2. Click the plus button to the left of Dataset 1 to add a second dataset and include the column you’d like filtered, where you’ll only include the rows that don’t match with the city values from Dataset 1. You also need to add this column twice in the second dataset.

    For our example, we want the list of cities for contacts created after January 1, 2018.

    Add a second dataset with the new column

  3. Next, click Merge Datasets in the Data Pipeline. By default, datasets are merged with an Outer Join on the first column, which is what we want in this case.

    This is where pulling in the same column twice helps us see where we have a match with the other dataset’s result set. In our output, the second column includes the results we’re looking to exclude.

    Select Outer Join as the Merge type

  4. Add a Filter Rows Pipeline step to exclude any rows with results in the second column.

    For our example, we excluded any rows where city_2 is not null. Notice how our results return eight less rows now!

    Add a Filter step in the Pipeline

  5. Since they’re no longer needed, you can hide the duplicate columns.

    Here, we hid city_2 and Mailing City_2 so only “city_1” remained in our result set.

    Hide columns in the Pipeline

We now have successfully removed the cities in the custom column and have 7,217 rows instead of 7,209 rows!

Anti-Join final result