Summary of impact:
At approximately 11:03 UTC on Tuesday 24th December 2019, we experienced delays in sending SMS. We restored services back to normal at 11:59 UTC on Tuesday 24th December 2019. Customers may have experienced delays in:
The delays happened because our API servers experienced a sudden surge in requests. As a result, high CPU on all our API servers lead to intermittent errors when connecting to our database servers. These factors combined to reduce our throughput and lead to messages being delayed.
Our team were proactively alerted to the incident when the initial the failure occurred at 11:03 UTC. After an initial assessment, we posted an incident notification to our customers through our status page. We allocated increased resources to our API cluster based on our expected load, plus additional resources for any overhead. After identifying the issue, our team took action to deploy additional resources to the cluster to handle the load and increased the flow of SMS sends.
Our team continued to monitor the issue and at 11:59 UTC we were confident there were no delays in SMS sends and services were running as normal. Shortly after, we closed our status page.
We’re really sorry this incident occurred and for any disruption it caused. We’re continually reviewing our services and making adjustments to ensure our infrastructure can handle an increased workload to prevent further incidents. With that in mind, we’ll: