Heartbeat Monitoring 101: Stop Silent Failures in Cron Jobs
A practical guide to heartbeat monitoring (healthchecks) and why a Dead Man’s Switch catches overdue jobs when logs can’t.
What you’ll learn
- What heartbeat monitoring is
- Why it’s effectively a Dead Man’s Switch for scheduled jobs
- How to implement it with a single curl
- How to think about Overdue, Grace Period, and payloads
What is Heartbeat Monitoring?
Heartbeat Monitoring means: your job sends a small “I’m alive” ping on success (and optionally on start/failure). If the ping doesn’t arrive in time, your monitor turns Overdue and you get an alert.
This is more reliable than checking logs because logs are internal and often fail together with the job.
Why it’s better than logs (Dead Man’s Switch)
With logs, you need the system to be healthy to tell you it’s unhealthy. A heartbeat monitor flips that:
- If the job runs, it pings.
- If it doesn’t, you get alerted.
That’s a Dead Man’s Switch.
Quickstart: one-liner (GET)
Use the Heartbeat Monitoring API documentation here: /api/heartbeat/.
POST with interval + payload inspection
This enables payload inspection (custom metrics) so you can alert on things like files_processed = 0.
Overdue and Grace Period
A good default strategy:
- Interval: expected run cadence (e.g. 24h)
- Grace Period: a buffer for retries, queue delays, cold starts