Summary of impact:
At approximately 07:30 UTC on Wednesday 12th February 2020, we deployed an update which contained a change to a frequently used regular expression to validate email addresses. On specific data sets, this new regular expression caused performance degradation in many Engagement Cloud systems.
Most significantly, we saw service disruption and delays in contact imports, SMS sends and reporting. The impact and resolution times varied slightly depending on region:
Services returned to normal at 13:20 UTC apart from reporting data which was delayed for a subset of customers until approximately 22:00 UTC.
North American region:
Email sending was delayed between 23:00 UTC and 01:00 UTC on Thursday 13th February.
The original regular expression to validate email addresses was found to let a small number of invalid, non RFC compliant addresses enter the Engagement Cloud contact database. This bug was addressed and a new regular expression was developed to prevent more malformed addresses. However, the new regular expression executed slower than the previous when certain inputs were used.
At approximately 11:00 UTC, we discovered the poorly performing regular expression and work began to restore it to its previous version.
We updated the most affected services first, which reduced CPU load across our systems and restored normal operation.
The regular expression was logically correct and we used defensive techniques such as unit tests to manage the change. But, the change only suffered performance problems on certain inputs. In future, we’ll run through a more rigorous stress test when amending critical regular expressions and use a far larger set of test data.