Delays sending email in our Asia Pacific (R3) region
Incident Report for Dotdigital
Postmortem

RCA: Delays sending email in our Asia Pacific (R3) region

Summary of impact:

At approximately 00:16 UTC on Tuesday 12th July 2022, customers in our APAC region may have experienced delays in sending campaign emails, test sends and receiving verification emails. Any delayed emails were held in a queue and sent successfully once the issue was resolved. We restored services to normal at 03:05 UTC.

Root Cause:

We use a DNS recursor service for looking up domains prior to sending email to them. Due to a large number of requests, the service entered a throttled state. This meant our sending service couldn’t look up domains in a timely way which caused delays in emails being sent.

Mitigation:

A summarized timeline of events (all times in UTC):

  • 00:20: We were alerted to the issue and begun investigating
  • 01:10: We began a rolling restart of some of our backend services
  • 02:20: We identified that the delays were likely due to issues resolving domain names
  • 02:30: We determined our DNS recursor service had become throttled
  • 02:34: We added additional DNS resolvers to our mail servers and removed the under performing DNS recursor from our configuration
  • 02:40: We observed an improvement in DNS lookups and delayed emails began to send successfully 
  • 02:55: We restarted our DNS recursor server
  • 02:58: We added our DNS recursor server back to configuration

Next Steps:

In order to prevent future issues, we’ll:

  • Review DNS lookup configuration and make changes where appropriate to increase resilience.
  • Enhance logging and monitoring to better monitor for DNS throttling issues and their impact to messaging queues.
Posted Jul 12, 2022 - 14:53 BST

Resolved
Email sending is now fully back to normal in our Asia Pacific (R3) region.
Posted Jul 12, 2022 - 04:04 BST
Monitoring
A fix has been implemented and we're seeing delayed mail beginning to flow normally now. We'll continue to monitor to ensure that mail flow continues as expected.
Posted Jul 12, 2022 - 03:53 BST
Investigating
We're investigating delays with sending emails in our Asia Pacific region of Dotdigital (R3). Sorry if you're affected by this issue. Our tech team are working as quickly as possible to resolve it and get things back to normal. We'll share another update shortly.
Posted Jul 12, 2022 - 03:17 BST
This incident affected: Asia Pacific - Dotdigital R3 (Asia Pacific - Mail Sending).