Small number of false alerts
Incident Report for Dead Man's Snitch
At 12:14AM UTC our check-in processor had a slow down processing queued check-ins that caused some 15 minute jobs which check-in at the tail end of the interval to report as missing. We have an automated runbook that forced a restart at 12:20AM and the processor caught back up at 12:21AM UTC when we reported the jobs as being healthy.

This caused a small number of false alerts. We know all too well the stress of being paged in the middle of the night and are working to improve the resiliency of our alerting backend in situations like this to avoid these types of false alerts in the future.
Posted Mar 02, 2020 - 00:30 EST