Intermittent Access to Engagement cloud - All regions
Incident Report for dotdigital
Postmortem

RCA: Intermittent Access to Engagement Cloud

Summary of impact:

At approximately 21:14 UTC on Friday 17th July 2020, customers experienced some intermittent issues when trying to gain access to the Engagement Cloud products and our APIs. In addition, some customers may have received error 500 messages.

Root Cause:

We isolated the problem to a 3rd party provider, Cloudflare. Cloudflare announced networking hardware on their global network was announcing bad routes, which caused portions of their network to become unavailable.  You can find the full details of the Cloudflare incident at https://blog.cloudflare.com/cloudflare-outage-on-july-17-2020/

Mitigation:

At 22:28 UTC, traffic to Cloudflare and access to our platform and APIs were restored after Cloudflare resolved the issues with their networking hardware.

Next Steps:

We’ve asked Cloudflare to continue their investigation into this incident and to identify any mitigating steps they can take to prevent future issues.

Posted Jul 21, 2020 - 12:09 BST

Resolved
Everything is back on track now.
We’ll write a detailed report for the issue we’ve experienced today. It’ll be posted on here as soon as it’s ready (early next week).
Apologies about today’s mishap.
Posted Jul 17, 2020 - 23:28 BST
Monitoring
Cloudflare have resolved their issue and are monitoring their network for stability. The issue was caused by faulty hardware in their global network and this affected many of their customers around the world.
We apologise for the inconvenience this has caused.
Posted Jul 17, 2020 - 23:16 BST
Update
We are continuing to work on a fix for this issue.
Posted Jul 17, 2020 - 23:11 BST
Identified
Cloudflare have identified an issue within their network and are working on a fix.
Further updates will be provided when details are available from Cloudflare.
We apologise for the inconvenienced caused
Cloudflare status page updates can be found here https://www.cloudflarestatus.com/
Posted Jul 17, 2020 - 23:09 BST
Investigating
We are currently experiencing issues with access to Engagement Cloud in all regions. One of our third party providers are currently experiencing issues with their network and are investigating.

Further update will be provided in due course
Posted Jul 17, 2020 - 22:47 BST
This incident affected: North America - Engagement Cloud r2 (North America - Web Application, North America - API, North America - Open and Link Tracking, North America - Surveys and Forms, North America - Pages and Forms), Global CPaaS (API, Portal, SMS API, SMS Portal, Inbox Portal), Global - Website, Global - Login Page, Global - Image Hosting, Asia Pacific - Engagement Cloud r3 (Asia Pacific - Web Application, Asia Pacific - API, Asia Pacific - Open and Link Tracking, Asia Pacific - Surveys and Forms, Asia Pacific - Pages and Forms), and Europe - Engagement Cloud r1 (Europe - Web Application, Europe - API, Europe - Open and Link Tracking, Europe - Surveys and Forms, Europe - Pages and Forms).