Delayed Missing Alerts
Incident Report for Dead Man's Snitch
Resolved
At 12:49:15UTC the process responsible for periodically checking Snitch histories was shutdown during a deploy but was not restarted as part of the deploy process. Around 20:10UTC we were notified by a customer that a Snitch wasn't failing when it was expected to be and discovered the process was offline as part of our investigation. At 20:22UTC we restarted the process which then kicked off checks for all potentially missing Snitches.

Unfortunately we did not receive an alert that the process was offline because, as we should have known, the alert would only have occurred if this process had been running. We're discussing fixes to ensure this kind of error doesn't happen again.
Posted Apr 16, 2024 - 08:30 EDT