Web Behaviour Tracking Delays - Europe
Incident Report for dotdigital
Postmortem

Summary of impact:

On Wednesday 29th September 2021 at 12:20 UTC, we stopped processing web behavior session data in our European region (region 1). We continued to accept new session data and none was lost. We resumed normal processing at 15:12 UTC.

Customers using web behavior data may have experienced the following problems:

  • Programs which rely on this data may have not executed as expected
  • Segments which rely on this data may not have returned all expected contacts
  • Abandoned Browse may not have worked as expected
  • Product Recommendations may not have worked as expected.

Root Cause:

Our web behavior tracking service relies on “indexes”, which are performance improvements used by many database systems to help them process large volumes of data. Periodically, the web behavior tracking service checks these indexes to see whether they need rebuilding. In this case, the index upgrades took a far longer than was expected. While indexes are rebuilding, the service doesn’t process new information, so a backlog of session data developed.

Mitigation:

During this incident, we prepared a software fix to temporarily prevent indexes from being rebuilt. However, before this was rolled out, the indexes finished rebuilding and service resumed with sessions processing again, so we didn’t complete the rollout.

Next Steps:

We apologize for any inconvenience caused during this incident. From here, we have an action to change our web behavior tracking service so it’s capable of rebuilding indexes and processing new data simultaneously, rather than pausing new data ingestion while waiting for indexes to finish rebuilding.

Posted Sep 30, 2021 - 15:45 BST

Resolved
This incident has been resolved and web behaviour tracking data is no longer delayed.
Posted Sep 29, 2021 - 17:13 BST
Monitoring
We have identified a problem which was preventing access to the web behaviour tracking database due to an index rebuild. The index rebuild is complete and full service has resumed. No data has been lost and we are now processing queued data which we expect to clear in approximately 1.5 hours.
Posted Sep 29, 2021 - 16:35 BST
Investigating
Customers in our European region will experience some delays with web behaviour tracking data appearing in their accounts. As a consequence features such as Abandoned Basket, Programs and Product Recommendations may not have access to up to date web tracking data.
Posted Sep 29, 2021 - 16:09 BST