Issue with loading images in Easy Editor - EU Region
Incident Report for Dotdigital
Postmortem

Summary of impact:

At approximately 08:43 UTC on 10th August 2022, we were alerted of an issue with images not loading within Easy Editor in our EU Region. We restored services at 10:39 UTC on 10th August 2022. 

Root Cause:

Images within Easy Editor are served via a dedicated website. This website acts like a proxy and fetches images to be displayed from our image store. It also provides image resizing functionality. This website recently went through a .NET version upgrade and had been running successfully in production for 6 days prior to this incident. However, on 10th August the website periodically became unable to fetch images from our image store and errors occurred. After analysis we could see application threads were being blocked by a third party component used to record application errors. This starved the website of resources and prevented it from relaying images. Investigation continues but it seems likely that this situation was created as result of the .NET upgrade.

Mitigation:

The timeline for resolving this issue (all times in UTC):

  • 08:32 Multiple customer reports of images not working in Easy Editor
  • 09:05 Investigation showed that the image loading issue was intermittent
  • 09:29 Status Page created
  • 09:30 Restart of the service fixed the issue
  • 09:34 Monitoring of the restarted service showed errors begin to rise and intermittent loading of images
  • 09:50 The affected service was restarted when errors started whilst investigation continued
  • 10:38 A decision was made to roll back code 
  • 10:39 Rolled back code to a previous release in our EU region and monitored for errors
  • 15:15 Errors detected in our US region
  • 15:32 Roll back of the affected service was completed in our US and AU regions

Next Steps:

We are currently investigating thread exhaustion issues which have occurred since the .NET upgrade. These could be related to an incompatibility with our error logging component or possibly a combination of this and a change in behaviour of HTTP connection pooling.

Posted Aug 12, 2022 - 11:20 BST

Resolved
The rollback continues to provide stable service and so this incident is now closed. Investigations continue on the root cause of the issue.
Posted Aug 10, 2022 - 12:17 BST
Update
We have rolled back to a previous version of Easy Editor image website and the error rate has reduced. We continue to investigate the root cause of the problems seen earlier.
Posted Aug 10, 2022 - 11:50 BST
Investigating
We're investigating an issue with images not loading within Easy Editor in our EU Region. Sorry if you're affected, but our tech team are working as quickly as possible to resolve the issue and get things back to normal. We'll share another update shortly.
Posted Aug 10, 2022 - 10:29 BST