Documentation Data Pipeline FAQs

Extract parts of a string in Data Explorer

If you joined after January 6, 2020, you might be using our new Visual SQL interface. Check out Extract parts of a string in Visual SQL for more relevant content.

You might run into a scenario where you have a column that has pieces of information you’d like to extract separated by a character of some sort. In the Data Pipeline, you have the ability to extract the sub-string to the left or right of the delimiting character.

If your start and end positions are fixed, you could instead use the Extract Text Pre-built Formula.

In our example, we have a column called “State/City” that has state and city information, where the state abbreviation and the city name are separated by a hyphen:

Example state and city column

Let’s say we want to parse the string to only have the city name in a separate column. To do this, we turn to the Data Pipeline to help transform the initial string. First, we need to use an Add Column step, choose Custom formula, then use the following formula:

substr("State/City", instr("State/City", '-')+1, length("State/City"))

Let’s try to breakdown what’s happening in this formula. First, we’ll look at the substr function. This function extracts a substring from a string. There are three parameters for the function, substr(1, 2, 3).

Parameter 1 is the column name containing the initial string.
Parameter 2 is an integer representing the starting position of the substring to extract. This can be at the beginning of the string, the middle, or even towards the end of the string. To set the position, you just need to know the numerical position of where you’d like to start the extraction.
Parameter 3 is also an integer, and it represents the position of where our extracted substring ends in the initial string.

In parameter 2 of our substr function, we use the instr function. instr() returns the position of a character in a string. There are two parameters for the instr() function, instr(A, B).

Parameter A is the column containing the string which will be searched.
Parameter B is the character(s) you’re searching for. The output is the position of the first instance of the character in the string, which is why the instr function is in parameter 2 of the substr function. In our example, the function instr("State/City", '-')+1 searches for the position of the hyphen character, '-', within the strings in our “State/City” column. The +1 is to return the position to the right of the hyphen, in order to not include the hyphen in our new column.

Finally, for parameter 3, we use the length function, which just calculates the length of a string. Since we want to get the entire city name for each row, and each city name length varies, we can account for this by using the length function. This returns the last character’s position in the string for each varying city name.

Running the full formula outputs the following:

Extracted city name from the original string

We can also extract the state abbreviation using the following formula:

substr("State/City", 1, instr("State/City", '-')-1)

We’ll quickly go through this example.

Parameter 1 remains the same as our first example, since we’re extracting a substring from the same initial string.
Parameter 2 is simply a 1 to indicate our substring starts at the first position of our initial string.
We use the instr function again, but this time we use it in parameter 3. We want to get the position of the character just before the hyphen, so we search for the hyphen character, get its position from the instr function, then subtract 1 from that position to get the end position of our substring.

Running the full formula outputs the following:

Extracted state from the original string