Images in EasyEditor not loading correctly (R1 - Europe region)
Incident Report for Dotdigital
Postmortem

Summary of impact:

At approximately 08:03 UTC on Friday 23rd September 2022, we identified our image resizing application was slow in responding to requests and would sometimes timeout. We restored services at 11:04 UTC on Friday 23rd September 2022.

During this time, customers may have experienced some of the following issues:

  • Images taking a long time to resize and appear in EasyEditor
  • Images not appearing at all in EasyEditor.

Root Cause:

Our analysis is still ongoing at this stage. We believe our recent project to move our image resizing application to .NET 6 caused the issue. Although we’ve been running on .NET 6 since July 2022, we believe the problem only manifests itself under increased load.

Mitigation:

The timeline (in UTC) for resolving this issue was:

  • 08:03: We received customer reports of images taking a long time to appear or not appearing at all in EasyEditor
  • 08:10: Our investigation showed this problem was only present in our European region (R1)
  • 08:13: Our image resizing applications were redeployed
  • 08:15: We checked error logs and errors were no longer being observed
  • 08:18: We began to see errors in our logs
  • 08:35: We restarted our image resizing applications again and increased the number of instances from 2 to 4
  • 09:25: We took the image resizing application memory dump for analysis
  • 10:00: We increased our image resizing instances from 4 to 10
  • 10:08: We upgraded and deployed an AWS package which stopped errors occurring
  • 10:36: We began to see errors return, so we prepared a rollback to .NET 5 for our image resizing application
  • 11:04: Our rollback to .NET 5 was deployed which stopped errors occurring
  • 11:30: Our error logs remained clear and we closed the status page
  • 17:21: We restricted our image resizing application permanently to .NET 5 whilst more investigation in done.

Next Steps:

Our work to better understand this issue will continue. We’re going to analyze the memory dumps taken during the incident. We’re focusing on why the .NET upgrade caused this issue and why we began to see the problem after weeks of good initial performance and stability.

Posted Sep 26, 2022 - 11:19 BST

Resolved
Following the fix we released earlier, everything is back to normal now. We’ll write a detailed description of what caused the issue. Check back here in a day or two for the report. We’re sorry for the interruption to your day.
Posted Sep 23, 2022 - 12:30 BST
Monitoring
We applied a fix a few moments ago. The great news is we're seeing immediate and sustained improvements with images loading in EasyEditor. Thanks for your patience while we worked on this problem. We'll monitor the situation from here and post a final update once we're totally satisfied it's resolved.
Posted Sep 23, 2022 - 12:20 BST
Identified
Thanks for your patience. We’re making good progress on resolving the issue with EasyEditor images loading. We’ll be back with another update very soon.
Posted Sep 23, 2022 - 11:13 BST
Investigating
We're investigating an issue with images either being slow to load in EasyEditor or not loading at all in our European region (R1). Sorry if you're affected, we're working as quickly as possible to resolve the issue and get things back to normal. We'll share another update shortly.
Posted Sep 23, 2022 - 10:46 BST
This incident affected: Global - Image Hosting.