At times, you may want to exclude rows where two joining datasets having matching values, only keeping the rows that did not match for a particular dataset. In this case, we’ll be interested in keeping the rows from the dataset on the left where they did not match with the dataset on the right. This is called an anti-join, and the way to carry this out in Chartio is in the Data Pipeline.
In the following example, using the Chartio Demo Source, we have created a custom column including eight cities we would like to exclude from our results. Our anti-join example we’ll be carried out in 5 steps.
- Add the column with the results you would like to exclude in your first dataset. You will need to add the same column twice as we’ll be using the second column as a way to filter after joining these two datasets.
- Add a second dataset with the column you would like filtered, where we’ll only include those rows that do not match with the city values from the dataset in step 1. We’ll also need to add this column twice in the second dataset.
- Next, we’ll select an Outer Join, joining on the first column. This is where pulling in the same column twice will help us see where we have a match with the other dataset. In our output, it is the second column that includes the results you’re looking to exclude.
- Add a Filter Pipeline step to exclude any rows with results in the second column. Here, we will exclude any rows where the result “is not null” in our column ‘City_2’. 5. You can now hide the columns ‘City_2’ and ‘City_2:1’ in the data pipeline to only keep your ‘City_1’ column.
We now have successfully removed the cities in the custom column and have 1,205 rows instead of 1,213 rows!