Close

How to export data to a file in Google BigQuery

Posted by: AJ Welch

As of the time of writing, exporting to a file from BigQuery requires the use of Google Cloud Storage to receive that exported file. After the file is stored in Google Cloud Storage you may, of course, download or export it elsewhere as needed.

Once you have Cloud Storage ready, you’ll also need to create a bucket, which can be easily accomplished following the official quickstart guide.


Cloud storage URI format


The Cloud Storage URI, which is necessary to inform BigQuery where to export the file to, is a simple format: gs:///.

If you wish to place the file in a series of directories, simply add those to the URI path: gs://///.

Exporting via the WebUI


To export a BigQuery table to a file via the WebUI, the process couldn’t be simpler.

  • Go to the BigQuery WebUI.
  • Select the table you wish to export.
  • Click on Export Table in the top-right.
  • Select the Export format and Compression, if necessary.
  • Alter the Google Cloud Storage URI as necessary to match the bucket, optional directories, and file-name you wish to export to.
  • Click OK and wait for the job to complete.

Exporting via the API


To export a BigQuery table using the BigQuery API, you’ll need to make a call to the Jobs.insert method with the appropriate configuration. The basic configuration structure is given below:

{
  'jobReference': {
    'projectId': projectId,
    'jobId': uniqueIdentifier
  },
  'configuration': {
    'extract': {
      'sourceTable': {
        'projectId': projectId,
        'datasetId': datasetId,
        'tableId': tableId
      },
      'destinationUris': [cloudStorageURI],
      'destinationFormat': 'CSV',
      'compression': 'NONE'
    }
  }
}
  • uniqueIdentifier is simply a unique string that identifies this particular job, so there won’t be any duplication of data if the job fails during processing and much be retried.
  • projectId is the BigQuery project ID.
  • datasetId is the BigQuery dataset ID.
  • tableId is, of course, the BigQuery table ID.
  • destinationFormat defaults to CSV but can also be NEWLINE_DELIMITED_JSON and AVRO.
  • compression defaults to NONE but can be GZIP as well.

As an example, if we want to export to the melville table in our exports dataset, which is part of the bookstore-1382 project, we might use a configuration of something like this:

{
  'jobReference': {
    'projectId': 'bookstore-1382',
    'jobId': 'bcd56153-b882-4f78-8a30-f509b583a568'
  },
  'configuration': {
    'extract': {
      'sourceTable': {
        'projectId': 'bookstore-1382',
        'datasetId': 'exports',
        'tableId': 'melville'
      },
      'destinationUris': ['gs://bookstore/melville.json'],
      'destinationFormat': 'NEWLINE_DELIMITED_JSON',
      'compression': 'NONE'
    }
  }
}

After a few moments for the job to process, refreshing the bookstore bucket in Cloud Storage reveals the melville.json file, as expected:

{"BookMeta_Title":"Typee, a Romance of the South Seas","BookMeta_Date":"1920","BookMeta_Creator":"Herman Melville","BookMeta_Language":"English","BookMeta_Publisher":"Harcourt, Brace and Howe"}
{"BookMeta_Title":"Typee: A Real Romance of the South Seas","BookMeta_Date":"1904","BookMeta_Creator":"Herman Melville ,  William Clark Russell ,  Marie Clothilde Balfour","BookMeta_Language":"English","BookMeta_Publisher":"John Lane, the BodleyHead"}
{"BookMeta_Title":"Typee: A Narrative of a Four Months' Residence Among the Natives of a Valley of the Marquesas ...","BookMeta_Date":"1893","BookMeta_Creator":"Herman Melville","BookMeta_Language":"English","BookMeta_Publisher":"J. Murray"}
...

Using a wildcard URI for multiple file output


In some cases you may be exporting a table that exceeds the maximum output size of 1 GB per file. In such cases, you should take advantage of the wildcard URI option by adding an asterisk * somewhere in the file-name portion of your URI.

For example, a Cloud Storage URI of gs://bookstore/melville-*.json in the configuration will actually become an iterated series of incremental file names, like so:

gs://bookstore/melville-000000000000.json
gs://bookstore/melville-000000000001.json
gs://bookstore/melville-000000000002.json
...