Keep track of your Bigquery costs by exporting usage logs. | Dataform
Guide

Keep track of your Bigquery costs by exporting usage logs.

Turn on BigQuery audit log exports to start analysing your BigQuery usage. You can use this to keep track of Bigquery costs.

Data modeling
picture of author

Dan Lee on March 31, 2020

BigQuery makes detailed usage logs available through Cloud Logging exports, but before you can start analysing them you probably want to start exporting them to BigQuery. Once you have turned on the export, it's very easy to keep track of things like Bigquery costs using a simple query. It's really simple to turn the export on but unfortunately Google's documentation is pretty vague, so I thought I'd write this blog post to help.

Setting up the export

Open the Logs Viewer and make sure you've chosen the correct GCP project in the drop down at the top left.

Expand the advanced filter box.

Add the following advanced search into the filter box:

resource.type="bigquery_project" OR resource.type="bigquery_dataset"

You should see something like this:

Click Create Sink, enter a "Sink Name", choose BigQuery as the "Sink Service" and select a BigQuery dataset to write the logs into (you can also create a new dataset through the dropdown interface).

That's it! All future logs relating to BigQuery usage in your project will be written to this dataset. Note, it's not possible to backfill historical data.

Modelling the data

You now have a table in BigQuery with details of every single activity relating to your BigQuery project. However, the table isn't particularly user friendly. Answering simple questions like "which user spent the most on BigQuery yesterday" requires complex JSON transformations of difficult to remember fields:

select
  protopayload_auditlog.authenticationInfo.principalEmail
  sum(5.0 * cast(
    json_extract_scalar(
      protopayload_auditlog.metadataJson,
      "$.jobChange.job.jobStats.queryStats.totalBilledBytes"
    ) as int64
  )) as spend
from
  bigquery_logs.cloudaudit_googleapis_com_data_access_*
where
  resource.type = 'bigquery_project'
  and date(timestamp) = date(timestamp_sub(current_timestamp, interval 1 day))
group by 1
order by 2 desc
limit 1

To help with this, we've created a Dataform package (documentation here) that creates a clean and easy-to-use table with all the fields you'd expect. Adding a Dataform package to your project is as simple as adding a few lines of code.

Keeping track of your Bigquery costs and answering questions like "How much did we spend yesterday?" can be done with the much simpler query:

select
  principal_email,
  sum(cost_usd) as spend
from
  dataform.bigquery_compute_logs
where
  date(timestamp) = date(timestamp_sub(current_timestamp, interval 1 day))
group by 1
order by 2 desc
limit 1

If you'd like to learn more about Dataform, check out our documentation and sign up for a free account.

More content from Dataform

What can data teams learn from Google’s State of DevOps report? illustration
Opinion

What can data teams learn from Google’s State of DevOps report?

Google's research shows us that elite engineering teams have a lot in common when it comes to DevOps. We think similar principles apply to data teams or, more precisely, DataOps.
Learn more
Cut data warehouse costs with run caching illustration
Product Update

Cut data warehouse costs with run caching

How to save time and money by using our run caching feature
Learn more
Building the Dataform VS Code extension illustration
Guide

Building the Dataform VS Code extension

How we made our own extension for Visual Studio Code.
Learn more

Learn more about the Dataform data modeling platform

Dataform brings open source tooling, best practices, and software engineering inspired workflows to advanced data teams that are seeking to scale, enabling you to deliver reliable data to your entire organization.