From approximately 11:00 UTC on 29 July 2021, we had reports of intermittent user login failures in our North America (R2) and Asia Pacific (R3) regions. We were able to replicate this and could confirm our Europe (R1) region was unaffected.
As a precautionary measure, we paused message sending and services in all regions until we were able to confirm no data was being lost and could pinpoint the exact issue.
After investigation, we found our partner Cloudflare (CF) had implemented a rollout of Edge Side Code (at approx. 08:00 UTC) across their platform. As it propagated globally, it affected the routing of our platform and white-label domain addresses throughout their network. Cloudflare later found many of their global Edge nodes were no longer able to run the Edge Side Code required. Cloudflare posted information on their status page.
Our timeline for rectifying this issue was (times stated in UTC):
17:30:
18:30:
21:00:
00:00: Cloudflare confirms the issue has been resolved at their end.
Cloudflare allows for our Branded and Custom From Addresses (CFA) to each have different origin servers based on the region they are created in. This information is stored as part of the metatdata of the CFA record in Cloudflare.
Cloudflare use Edge Side Code (ESC) on their global network to intercept traffic from these CFA domains and make the necessary adjustments to the origin server routing decision. In the event that a CFA metadata does not contain an origin server or this ESC code does not execute, a fallback option is used, which in our case directs to R1 origin servers.
We understand from Cloudflare that as part of upgrades to their services, new metadata fields are now available that are now better served by their ESC and would remove the dependency we have for them to maintain our existing configuration. We’ve already engaged with their solution team on how to best migrate our existing CFA origin servers across to this in a way that’s transparent to our customers and will continue to deploy those changes.
We’re sorry this incident occurred and we’re grateful for your patience while we worked on resolving it.