Monitoring Guide¶
Version: 1.0.0 | Last Updated: 2026-03-22 | Status: Active
Monitoring Objectives¶
- Detect outages quickly
- Detect degraded behavior before customer impact
- Separate liveness/readiness/probe failures from deep diagnostic latency
Core Health Endpoints¶
| Endpoint | Use Case | Notes |
|---|---|---|
/api/v1/health/ping |
network/process reachability | fastest signal |
/api/v1/health/live |
liveness checks | process health only |
/api/v1/health/ready |
readiness gate | validates critical dependencies |
/api/v1/health/celery |
Celery operational summary | expected mode: summary |
/api/v1/health/celery/deep |
operator diagnostics | use on demand, can be slower |
/api/v1/health |
full stack diagnostics | comprehensive, not for tight probe intervals |
Recommended Probe Strategy¶
- Fast checks (high frequency):
/health/ping,/health/live
- Readiness checks (medium frequency):
/health/ready
- Operator diagnostics (manual/low frequency):
/health/celery/deep,/health
What to Watch¶
Apache / mod_wsgi¶
AH10159orAH00484signals worker pressure- active daemon process/thread settings in
wsgi.conf - response latency spikes on probe endpoints
Celery¶
- online worker count
- queue consumers and pending message count
- repeated worker exits or SIGKILLs in logs
Dependencies¶
- Postgres availability
- Redis ping/connectivity
- RabbitMQ connectivity and consumer health
Log Locations¶
- Apache:
/var/log/httpd/error_log,/var/log/httpd/access_log - Celery (systemd):
journalctl -u celery
Quick Triage Commands¶
systemctl is-active httpd celery
curl -sk https://dev-backend.mightybox.site/api/v1/health/ready
curl -sk https://dev-backend.mightybox.site/api/v1/health/celery
tail -n 100 /var/log/httpd/error_log
journalctl -u celery --since "30 min ago" --no-pager
Escalation Guidance¶
- If liveness fails: treat as immediate service incident.
- If readiness fails: treat as dependency/system incident.
- If only deep checks are slow: treat as diagnostic overhead/performance issue.