Datasets & Joins: Combining and Comparing Data from Different Tables and Databases
Previously, when discussing Data Sources and Schemas we mentioned the notion of Joins and Join paths. In this tutorial we will go into more detail on what that means and how Visual SQL treats joins. Most importantly we will discuss how you can use Visual SQL to Join and compare disparate data sources to complete complex data investigations.
The term Join describes how two different tables (or subsets of tables) are merged into one table for the end goal of manipulation and/or visualization of complex data sets. These two tables can come from two different databases or they can be from two different tables in the same database. The thing to keep in mind is that we are always after creating one single table of data in order to create a chart and that Joins simply merge two or more tables from anywhere into one table that can be manipulated and/or visualized.
There are also multiple different types of joins that can be used to merge your different tables.
Combines the columns from all datasets on one or more common dimension when possible, and includes all data from both datasets.
Combines the columns on a common dimension (the first N columns) when possible, and only includes data for the columns that share the same values in the common N column(s).
Combines the columns on a common dimension (the first N columns) when possible, returning all rows from the first dataset with the matching rows in the second dataset. The result is NULL in the second dataset when there is no match.
A Union merge will stack the dataset results on top of each other without grouping or combining the data. Unions can be used to generate lists of data to be printed or viewed in table format. To remove duplicate rows, check the Distinct checkbox.
The result of the Cross Join will be a table with all possible combinations of your datasets together. This can result in enormous tables and should be used with caution. Cross Joins will likely only be used when your datasets are returning single values.
So how can you use this information and Visual SQL to accomplish your goal of comparing two data sets against each other in one table? It’s actually very easy. In an example where you want to compare two different databases that do not already have a Join path set up in the Schema that we discussed earlier, you can use the Merging function of Visual SQL in Chartio to join (or merge) these two tables into one for further manipulation or visualization.
In our example fictitious company we have been analyzing MRR over time. Let’s take a look at how our user sign up rates compare with the web traffic we are experiencing.
These two metrics come from two totally different data sources that can not be joined in the schema the way we discussed before. Joining two disparate databases is the truly powerful aspect of Merging in Visual SQL.
We will be comparing a Google Analytics Database, that tracks website activity, with our Company’s internal database that holds our company’s customer and performance information.
First we need to get the user sign up rate metric, we will call it USR, from the Digisign internal database from the past week.
Now to add in the website traffic over the same 1 week time period we need to add a dataset. In Visual SQL clicking the “+” or ‘plus’ symbol in the Editor will add in a second dataset, that we can edit in the Visual SQL Editor the same way we did the USR dataset.
These two Visual SQL queries have now create two different tables that have been merged into one table and visualized as a line graph. You might notice that when creating the second dataset the Merge step was created for us and the join type was defaulted to an Outer Join and the tables were joined on the first column, in our case it was the Month column. Clicking into the Merge Datasets step you will be able to select from a dropdown of the join types we outlined above and select the appropriate join type for your use case.