You might run into a scenario where you have a column that has pieces of information you’d like to extract separated by a character of some sort. In the Visual SQL Pipeline, you have the ability to extract the sub-string to the left or right of the delimiting character.
If your start and end positions are fixed, you could instead use the Extract Text Guided Formula.
In our example, we have a column called “State/City” that has state and city information, where the state abbreviation and the city name are separated by a hyphen:
Let’s say we want to parse the string to only have the city name in a separate column. To do this, we turn to the Pipeline to help transform the initial string. First, we need to use a Formula Column, choose Custom formula, then use the following formula:
substr("State/City", instr("State/City", '-')+1, length("State/City"))
Let’s try to breakdown what’s happening in this formula. First, we’ll look at the
substr function. This function extracts a substring from a string. There are three parameters for the function,
substr(1, 2, 3).
- Parameter 1 is the column name containing the initial string.
- Parameter 2 is an integer representing the starting position of the substring to extract. This can be at the beginning of the string, the middle, or even towards the end of the string. To set the position, you just need to know the numerical position of where you’d like to start the extraction.
- Parameter 3 is also an integer, and it represents the position of where our extracted substring ends in the initial string.
In parameter 2 of our
substr function, we use the
instr() returns the position of a character in a string. There are two parameters for the instr() function,
- Parameter A is the column containing the string which will be searched.
- Parameter B is the character(s) you’re searching for. The output is the position of the first instance of the character in the string, which is why the
instrfunction is in parameter 2 of the
substrfunction. In our example, the function
instr("State/City", '-')+1searches for the position of the hyphen character,
'-', within the strings in our “State/City” column. The
+1is to return the position to the right of the hyphen, in order to not include the hyphen in our new column.
Finally, for parameter 3, we use the
length function, which just calculates the length of a string. Since we want to get the entire city name for each row, and each city name length varies, we can account for this by using the
length function. This returns the last character’s position in the string for each varying city name.
Running the full formula (and renaming the Formula Column to “City Extracted”) outputs the following:
We can also extract the state abbreviation using the following formula:
substr("State/City", 1, instr("State/City", '-')-1)
We’ll quickly go through this example.
- Parameter 1 remains the same as our first example, since we’re extracting a substring from the same initial string.
- Parameter 2 is simply a
1to indicate our substring starts at the first position of our initial string.
- We use the
instrfunction again, but this time we use it in parameter 3. We want to get the position of the character just before the hyphen, so we search for the hyphen character, get its position from the
instrfunction, then subtract
1from that position to get the end position of our substring.
Running the full formula (and renaming the Formula Column to “State Extracted”) outputs the following: