Delayed Imports (Region 2)
Incident Report for dotdigital
Postmortem

Summary of impact:

At approximately 07:38 UTC on 9th September 2020, we experienced delays with contact imports in our US region. We restored services at 08:37 UTC on 9th September 2020.

Root Cause:

Our contact import system loads a large suppression list file on startup. This file is stored in a cloud storage service, and in this instance the cloud storage service responded unexpectedly slowly. Once it had successfully loaded, we were able to process customer contact imports again.

Mitigation:

The timeline for resolving this issue was:

  • 07:38: We were alerted to delays in contact importing by our monitoring system and began to triage the issue.
  • 07:50: We began restarting the related services and found that they were not starting as promptly as expected. We began investigating what was causing the delayed service start.
  • 08:27: We updated our status page while we continued to investigate the cause of the issue.
  • 08:30: The services started processing and we monitored to ensure imports were being processed correctly.
  • 08:37: We updated the status page and continued to monitor for stability.

Next Steps:

The mechanism we use for loading the suppression file has been in production for many months without causing a prior issue. However, it will be altered to handle slow loading, and make it easier for multiple services to access cloud storage concurrently without incident.

Posted Sep 10, 2020 - 15:04 BST

Resolved
Everything is back on track now.
We’ll write a detailed report for the issue we’ve experienced today. It’ll be posted on here as soon as it’s ready (in a day or two).
Apologies about today’s mishap.
Posted Sep 09, 2020 - 10:25 BST
Monitoring
All functionality is now restored.
We're keeping a watchful eye on things to make sure it stays that way. We'll let you know when we’re 100% confident everything is fully back to normal.
Posted Sep 09, 2020 - 09:37 BST
Update
We are continuing to investigate this issue.
Posted Sep 09, 2020 - 09:31 BST
Investigating
We've discovered an issue causing delayed imports in R2, and our tech team are working flat out to fix it. Sorry if this is affecting you, but things should be back to normal very soon. Look out for more news from us shortly.
Posted Sep 09, 2020 - 09:27 BST
This incident affected: North America - Engagement Cloud r2 (North America - Contact Imports).