BigQuery makes detailed usage logs available through Cloud Logging exports, but before you can start analysing them you probably want to start exporting them to BigQuery. It's really simple to turn the export on but unfortunately Google's documentation is pretty vague, so I thought I'd write this blog post to help.
Setting up the export
Open the Logs Viewer and make sure you've chosen the correct GCP project in the drop down at the top left.
Expand the advanced filter box.
Add the following advanced search into the filter box:
resource.type="bigquery_project" OR resource.type="bigquery_dataset"
You should see something like this:
Create Sink, enter a "Sink Name", choose BigQuery as the "Sink Service" and select a BigQuery dataset to write the logs into (you can also create a new dataset through the dropdown interface).
That's it! All future logs relating to BigQuery usage in your project will be written to this dataset. Note, it's not possible to backfill historical data.
Modelling the data
You now have a table in BigQuery with details of every single activity relating to your BigQuery project. However, the table isn't particularly user friendly. Answering simple questions like "which user spent the most on BigQuery yesterday" requires complex JSON transformations of difficult to remember fields:
select protopayload_auditlog.authenticationInfo.principalEmail sum(5.0 * cast( json_extract_scalar( protopayload_auditlog.metadataJson, "$.jobChange.job.jobStats.queryStats.totalBilledBytes" ) as int64 )) as spend from bigquery_logs.cloudaudit_googleapis_com_data_access_* where resource.type = 'bigquery_project' and date(timestamp) = date(timestamp_sub(current_timestamp, interval 1 day)) group by 1 order by 2 desc limit 1
To help with this, we've created a Dataform package (documentation here) that creates a clean and easy-to-use table with all the fields you'd expect. Adding a Dataform package to your project is as simple as adding a few lines of code.
"How much did we spend yesterday" can then be answered with the much simpler query:
select principal_email, sum(cost_usd) as spend from dataform.bigquery_compute_logs where date(timestamp) = date(timestamp_sub(current_timestamp, interval 1 day)) group by 1 order by 2 desc limit 1