At approximately 06:45 UTC on the 9th December 2019, Insight Data imports started queuing and users experienced delays submitting their data.
We were alerted to this problem at 09:06 UTC on the 10th of December 2019 and began to take corrective measures immediately. We added additional server capacity and by 11:58 UTC the import backlog was cleared and normal import speeds were restored.
A legacy component dealing with en-queuing insight data imports stopped working. This was due to a locking mechanism that stops this process from running in parallel failed to operate as expected. This led to the platform not processing imports and a backlog building up. We have comprehensive monitoring on our message queues, but as this component used an old proprietary queuing mechanism it was not monitored.
We restarted the en-queuing process which caused imports to start being processed again. We scaled up the virtual machines responsible for processing imports and increased parallelism of imports to clear the backlog faster.
We’re adding the missing monitoring to this legacy en-queuing component. We’ll work towards removing the component and replacing it with our standard message queuing solution.