Close
Data TutorialsGoogle BigQuery

How to Use Google BigQuery's Wildcard Functions in Legacy SQL vs. Standard SQL

Posted by AJ Welch

BigQuery offers users a number of powerful methods to allow searching and filtering based on the names of tables within a particular dataset using wildcard functions or the asterisk * character.

Below we’ll explore methods of table wildcard filtering for both Legacy SQL and Standard SQL solutions.

Using TABLE_QUERY with Legacy SQL

The TABLE_QUERY function is a powerful method that effectively allows you to generate a secondary sub-query based on the name of the table (table_id) to further hone your results.

For example, we’re using the [gdelt-bq:hathitrustbooks] dataset, which contains a table for each year of publications ranging from 1800 to 2012. Therefore, if we want to query just the table range of the 1920s (1920 - 1929), we can use the TABLE_QUERY function and within that expression use the REGEXP_MATCH function to ensure we only query tables from the 1920s:

SELECT
  BookMeta_Identifier,
  BookMeta_Date
FROM
  TABLE_QUERY([gdelt-bq:hathitrustbooks],
              'REGEXP_MATCH(table_id , r"^192[\d]")')
ORDER BY
  BookMeta_Identifier DESC
LIMIT
  1000

And the results are as expected – a range of publications from the 1920s only.

[
  {
    "BookMeta_Identifier": "yul.11995746_000_00",
    "BookMeta_Date": "1922"
  },
  {
    "BookMeta_Identifier": "yul.11729715_000_00",
    "BookMeta_Date": "1922"
  },
  {
    "BookMeta_Identifier": "yul.11368340_000_00",
    "BookMeta_Date": "1920"
  },
  {
    "BookMeta_Identifier": "yul.11165435_000_00",
    "BookMeta_Date": "1922"
  },
  {
    "BookMeta_Identifier": "yale.39002088672077",
    "BookMeta_Date": "1925"
  },
  {
    "BookMeta_Identifier": "yale.39002084611509",
    "BookMeta_Date": "1925"
  },
  {
    "BookMeta_Identifier": "yale.39002056071179",
    "BookMeta_Date": "1929"
  },
  {
    "BookMeta_Identifier": "yale.39002044555168",
    "BookMeta_Date": "1921"
  },
  {
    "BookMeta_Identifier": "yale.39002040679681",
    "BookMeta_Date": "1925"
  },
  {
    "BookMeta_Identifier": "yale.39002040679657",
    "BookMeta_Date": "1925"
  },
  {
    "BookMeta_Identifier": "yale.39002040679632",
    "BookMeta_Date": "1925"
  },
  {
    "BookMeta_Identifier": "yale.39002040330061",
    "BookMeta_Date": "1925"
  },
  ...
]

Using TABLE_DATE_RANGE with Legacy SQL

In cases where you have a series of daily tables (perhaps from partitioning them within BigQuery) which have a suffix of the date (in the required YYYMMDD format), you can utilize the TABLE_DATE_RANGE function to query a range of those daily tables only.

For example, if our bookstore dataset has a series of daily tables with names in the format: bookstore.booksYYYYMMDD, we can query specific daily tables from January 1st, 1920 to December 31st, 1929 with this query:

SELECT
  title,
  author
FROM
  TABLE_DATE_RANGE(bookstore.books,
                   TIMESTAMP('1920-01-01'),
                   TIMESTAMP('1929-12-31'))
LIMIT
  1000

BigQuery will automatically infer and generate the dated table names based on the prefix we provided as well as the TIMESTAMP range, then it will query the data accordingly.

Using _TABLE_SUFFIX with Standard SQL

If you’re explicitly using Standard SQL with BigQuery, you’ll need an alternative to functions like TABLE_QUERY and TABLE_DATE_RANGE. Thankfully, you can take advantage of the _TABLE_SUFFIX psuedo column, which automatically contains matched values when you use the * wildcard symbol within your table name in the query.

For example, if we wanted to replicate the same query as above using the Legacy SQL TABLE_QUERY function – where we retrieve results only from tables between 1920 and 1929 – we’d use the following query in Standard SQL:

SELECT
  BookMeta_Identifier,
  BookMeta_Date
FROM
  `gdelt-bq.hathitrustbooks.*`
WHERE
  _TABLE_SUFFIX BETWEEN '1920' AND '1929'
ORDER BY
  BookMeta_Identifier DESC
LIMIT
  1000

The key is the * wildcard character we placed within our table name in the FROM clause. While _TABLE_SUFFIX is (by definition) intended to represent the suffix (or final portion) of the full table name (as inmy_table*), we can use it as in the above example to represent the entire table name and filter those names in the WHERE clause.

Sure enough, the returned results are identical to the Legacy SQL example above:

[
  {
    "BookMeta_Identifier": "yul.11995746_000_00",
    "BookMeta_Date": "1922"
  },
  {
    "BookMeta_Identifier": "yul.11729715_000_00",
    "BookMeta_Date": "1922"
  },
  {
    "BookMeta_Identifier": "yul.11368340_000_00",
    "BookMeta_Date": "1920"
  },
  {
    "BookMeta_Identifier": "yul.11165435_000_00",
    "BookMeta_Date": "1922"
  },
  {
    "BookMeta_Identifier": "yale.39002088672077",
    "BookMeta_Date": "1925"
  },
  {
    "BookMeta_Identifier": "yale.39002084611509",
    "BookMeta_Date": "1925"
  },
  {
    "BookMeta_Identifier": "yale.39002056071179",
    "BookMeta_Date": "1929"
  },
  {
    "BookMeta_Identifier": "yale.39002044555168",
    "BookMeta_Date": "1921"
  },
  {
    "BookMeta_Identifier": "yale.39002040679681",
    "BookMeta_Date": "1925"
  }
]