Summary of impact:
From approximately 11:05 UTC on Saturday 25th April 2020, some email sends from our North American region failed until approximately 21:28 UTC.
A small number of users will have noticed some campaign sends failed to complete and remained in their Outbox. We’ve identified any customers affected and have proactively reached out to help resolve the issue.
Our email sending infrastructure was severely restricted after multiple disk failures occurred in multiple servers simultaneously. The disk failures are the result of a vendor firmware bug which causes disks to fail after a fixed period of time.
The failed servers were no longer able to accept new email and this resulted in application errors which impacted campaign email sends.
Engagement Cloud compiles campaign email sends into batches of emails and batches are distributed over multiple email sending servers. During this period, some batches hit failed servers and others were sent by unaffected servers. This resulted in some sends being unaffected, but others could’ve partially or completely failed to send.
We removed the faulty servers from duty and campaign email sends continued using the remaining healthy machines.
We’ve identified 3 follow-on work items: