Summary of Impact:
At approximately 10:14 UTC on Friday 6th May 2022, we detected issues communicating with Google BigQuery. This caused delays in refreshing segments and running programs in our North America region. The issue was resolved around 14:46 UTC on Friday 6th May.
Root Cause:
This was caused by an issue with multiple Google Cloud infrastructure components. The impact of the affected Google services can be seen here.
Mitigation:
Our timeline for events for this this issue was (times stated in UTC):
10:24: We were alerted to a problem in our Big Query data importer
10:31: We identified problems with syncing, causing delays in refreshing segments
11:00: We raised a support ticket with Google
11:04: We updated our status page to reflect delays in refreshing segments
11:44: Google acknowledged the issue affecting BigQuery and other GCP products
11:49: Google updated their status page to show it was affecting Google Compute Engine, Persistent Disk
13:09: Queue backlogs started processing
13:39: We changed our status page to ‘monitoring’ the issue
13:40: All transactional data queues were empty and all backlogs had been processed
14:46: After further checks, we declared the issue resolved
Next steps:
We depend on Google’s BigQuery service for some aspects of our functionality and have configured multi region support to increase availability. In this case a failure in a single region caused BigQuery to fail despite other US regions being online. We’re working with Google to understand why this occurred.