Sending Delays - Engagement Cloud Europe
Incident Report for dotdigital
Postmortem

Summary of impact:

Starting at approximately 20:15 UTC on Monday 9th March 2020, we experienced significant delays sending email messages for customers hosted in our European region of Engagement Cloud. The root cause was discovered at 23:20 UTC and the backlog was fully cleared by 02:00 UTC on Tuesday 10th March 2020.

Root Cause:

This incident was caused because of a significant attack on signup forms. Our signup forms allow contacts to opt in to email communications. Our newer signup forms come with some spam protection, but some customers using older types of signup forms don’t have this spam protection built-in automatically. In this incident, a spam attack aggressively targeted our signup forms, sending hundreds of thousands of individual opt-in requests within a short time period.

Our signup forms are configured to send an email campaign in response to new signups. Due to the size of the campaign and the volume of requests being received, this overwhelmed the email sending resources configured in our European region.

Mitigation:

Our team were alerted to the incident at 22:00 UTC. After an initial assessment, we posted an incident notification to our customers through our status page. We increased resources to our email sending infrastructure which had some positive impact but didn’t have capacity to reduce the backlog.

After analyzing application logs, we spotted the ongoing spambot attack and immediately converted the offending signup forms to a newer version with spam protection enabled. This prevented any additional spam signups and allowed the email backlog to reduce quickly.

Next Steps:

We’re really sorry this incident occurred and for any disruption it caused. From here, we’ll:

  • Review web application firewall blocklists
  • Review any further customers who are still using unprotected signup forms.
Posted Mar 20, 2020 - 15:36 GMT

Resolved
We’ve monitored the situation for some time now and we’re confident this issue is fully resolved. Sorry for the interruption it caused. We’ll write a detailed description of what caused the issue. Check back here in a day or two for the report.
Posted Mar 10, 2020 - 02:24 GMT
Monitoring
We applied a fix a few minutes ago. The great news is we're now seeing immediate and sustained improvements with the sending delays. Thanks for your patience while we worked on this problem. We'll monitor the situation from here and post a final update once we're satisfied it's resolved.
Posted Mar 09, 2020 - 23:36 GMT
Identified
Our teams have identified the issue and we’re now taking care of the problem. Thanks for your patience, everyone.
Posted Mar 09, 2020 - 23:24 GMT
Investigating
We're investigating an issue with email and SMS sending delays in our European instance. Sorry if you're affected, but our tech team are working as quickly as possible to resolve the issue and get things back to normal. We'll share another update shortly.
Posted Mar 09, 2020 - 23:16 GMT
This incident affected: Europe - Engagement Cloud r1 (Europe - Mail Sending).