At approximately 16:21 BST on Monday 16th September 2019 the Engagement Cloud API in our European region went offline. All API requests failed until service was restored at 17:19 BST.
The outage was caused after a server configuration change was rolled out to our API servers. Whilst the update did make the desired change, it also removed web server settings which were thought to be redundant; it was these missing settings which took the API offline.
After a review of the current and previous configuration files the missing settings were restored to bring the API back online.
The configuration update was successfully applied to our staging environments and our production Australian and US instances. However, our production European instance was configured uniquely to support a requirement which has since become redundant. This legacy configuration was removed via the update, but it was necessary in order to receive traffic from our load balancer. This dependency was difficult to identify and is unique to our European region.
The update was applied following our standard procedures, however testing was invalidated due to our staging environments not replicating the configuration used in production. This is a weakness which has already been identified and plans have been made to use the same cloud provider throughout our production and staging instances.