On Monday 13th September 2021, we experienced an issue with campaign reports which meant some customers who sent campaigns from the web interface were unable to view their campaign reports. This issue started in our:
We reverted the problem code at 16:07 UTC on 13th September. Campaigns sent through the web interface after this time were not affected.
Campaigns created inside the incident window required further intervention to make the campaign reports visible. We resolved this for the majority of customers at 17:11 UTC on 13th September. However, a small number of customers wouldn’t have been able to access their reports until 17:21 UTC on 14 September (at which point campaign reports had been fully recovered for all affected customer).
During the affected period, customers clicking on a sent campaign would’ve been redirected to the dashboard, rather than viewing the campaign report. It’s important to note, all emails were still sent and we were able to recover all missing reporting data.
This issue only affected email campaigns (SMS campaigns were unaffected).
We applied a patch to an internal monitoring tool in order to add additional metadata to an email send that helped us track it’s progress through our sending infrastructure. This clashed with a change deployed to the sending pipeline at the same time and had the unexpected effect of failing to initialize campaign reports.
Once the issue had been identified, we reverted the change to the internal monitoring at 16:07 UTC on 13th September. We continued to work on mitigating the immediate effect so most customers could access their campaign reports. This was completed at 17:11 UTC on 13th September.
During the next day (14th September), we continued our work on making reports more accurate. We finished updating affected reports at 14:17 UTC on 14 September.
We identified a small number of affected campaigns the following day that had been scheduled for sending during the incident. We rectified these reports at 17:21 UTC on 14 September.
We will analyse why our QA process did not detect this fault prior to release and make subsequent improvements to our test cases.