Delays processing SMS delivery receipts - all regions
Incident Report for Dotdigital
Postmortem

Summary of impact:

At approximately 08:08 UTC on 12 November 2022, our platform stopped processing SMS delivery receipts. We fully restored receipt processing at 17:01 UTC on the same day.

During this time, customers may have experienced some of the following issues:

  • Webhooks with SMS receipts would have been delayed
  • Any SMS reports in our platform would have been delayed.

This issue only impacted SMS delivery receipts and some SMS reporting. SMS sending and delivery to handsets were unaffected by this issue and continued as usual.

Root Cause:

Although receipts flowed into our database, our processing pipeline had unexpectedly stalled. This meant we weren’t forwarding receipts to the various parts of our platform which rely on them.

Mitigation:

The timeline (in UTC) for resolving this issue was:

  • 14:10: We noticed the issue with SMS receipts and began investigations
  • 16:00: We determined the issue related to job scheduling and the programming unit responsible for processing receipts
  • 16:30: We resolved the issue with job scheduling. Following this, we needed to finetune the process in order to handle the backlog of receipts that had accumulated
  • 16:50: We completed the process tuning work and the backlog of receipts began to clear
  • 17:00: The backlog had been processed and receipts were being processed as normal/expected. The issue was fully resolved at this point.

Next Steps:

In order to prevent this issue from reoccurring, we’ll:

  • Implement additional monitoring on the existing receipt processing pipeline to detect if this particular failure mode happens again
  • Prepare a plan to re-architect the processing pipeline to determine if there’s a better model available.
Posted Nov 14, 2022 - 15:50 GMT

Resolved
We’ve monitored the situation for some time now and we’re confident this issue is fully resolved.
Sorry it happened and for the interruption it caused. We’re going to write a report to share the specific details of today’s issue. We’ll attach it here when it’s ready.
Posted Nov 12, 2022 - 17:28 GMT
Monitoring
We applied a fix a few minutes ago. The great news is we're now seeing immediate and sustained improvements with SMS delivery receipts.
Thanks for your patience while we worked on this problem.
We'll monitor the situation from here and post a final update once we're totally satisfied it's resolved.
Posted Nov 12, 2022 - 17:09 GMT
Identified
Our teams have isolated the issue this side, and we’re now taking care of the problem.
Thanks for your patience, everyone. Normal Service will resume as soon as possible.
Posted Nov 12, 2022 - 16:49 GMT
Investigating
We are currently experiencing delays processing delivery receipts for SMS sends. Although our platform is delivering SMS quickly and as normal, there is a backlog in marking those deliveries as complete. We are investigating this issue.
Posted Nov 12, 2022 - 16:26 GMT