Documentation Data Pipeline FAQs

Omit data using an anti-join in Data Explorer

At times, you may want to exclude rows where two joining datasets have matching values, only keeping the rows that did not match for a particular dataset. In this case, we’ll be interested in keeping the rows from the dataset on the left where they did not match with the dataset on the right. This is called an anti-join, and the way to carry this out in Chartio is in the Data Pipeline.

In the following example, using the Chartio Demo Source, we have created a custom column including eight cities we would like to exclude from our results. Our anti-join example will be carried out in 5 steps below:

  1. Add the column with the results you would like to exclude in your first dataset. You will need to add the same column twice as we’ll be using the second column as a way to filter after joining these two datasets.

    Add a column

  2. Add a second dataset with the column you would like filtered, where we’ll only include those rows that do not match with the city values from the dataset in step 1. We’ll also need to add this column twice in the second dataset.

    Add a second dataset with the new column

  3. Next, we’ll select an Outer Join, joining on the first column. This is where pulling in the same column twice will help us see where we have a match with the other dataset. In our output, it is the second column that includes the results you’re looking to exclude.

    Select Outer Join as the Merge type

  4. Add a Filter Pipeline step to exclude any rows with results in the second column. Here, we will exclude any rows where the result “is not null” in our column ‘City_2’.

    Add a Filter step in the Pipeline

  5. You can now hide the columns ‘City_2’ and ‘City_2:1’ in the data pipeline to only keep your ‘City_1’ column.

Hide columns in the Pipeline

We now have successfully removed the cities in the custom column and have 1,205 rows instead of 1,213 rows!

Anti-Join final result