Summary of impact:
At approximately 08:08 UTC on 12 November 2022, our platform stopped processing SMS delivery receipts. We fully restored receipt processing at 17:01 UTC on the same day.
During this time, customers may have experienced some of the following issues:
- Webhooks with SMS receipts would have been delayed
- Any SMS reports in our platform would have been delayed.
This issue only impacted SMS delivery receipts and some SMS reporting. SMS sending and delivery to handsets were unaffected by this issue and continued as usual.
Although receipts flowed into our database, our processing pipeline had unexpectedly stalled. This meant we weren’t forwarding receipts to the various parts of our platform which rely on them.
The timeline (in UTC) for resolving this issue was:
- 14:10: We noticed the issue with SMS receipts and began investigations
- 16:00: We determined the issue related to job scheduling and the programming unit responsible for processing receipts
- 16:30: We resolved the issue with job scheduling. Following this, we needed to finetune the process in order to handle the backlog of receipts that had accumulated
- 16:50: We completed the process tuning work and the backlog of receipts began to clear
- 17:00: The backlog had been processed and receipts were being processed as normal/expected. The issue was fully resolved at this point.
In order to prevent this issue from reoccurring, we’ll:
- Implement additional monitoring on the existing receipt processing pipeline to detect if this particular failure mode happens again
- Prepare a plan to re-architect the processing pipeline to determine if there’s a better model available.