Skip to content

Dev Server Runbook (Backend + Infra)

Version: 1.0.0 | Last Updated: 2026-03-22 | Status: Active


Purpose

This runbook is the current operational reference for developers, system administrators, and DevOps engineers working on the MBPanel development backend stack.

Use this for: - live server configuration verification - service restarts and health validation - incident triage and recovery - safe rollback of infra-level changes


1) Current Runtime Topology (verified 2026-03-22)

1.1 Hosts

Component Host/IP Access User Notes
Backend app node dev-backend.mightybox.site / 15.204.15.210 apache Apache + mod_wsgi + FastAPI
Frontend node dev-frontend.mightybox.site / 15.204.15.169 nodejs Next.js + PM2 + Nginx

1.2 External dependencies

Service Host Port Usage
PostgreSQL 172.16.4.139 5432 app primary database
Redis 172.16.3.213 6379 cache + distributed state
RabbitMQ 172.16.3.83 5672 Celery + SSE broker transport

1.3 Backend capacity settings

Apache MPM (/etc/httpd/conf/httpd.conf): - MaxRequestWorkers 150 (worker/event blocks) - ThreadsPerChild 25

mod_wsgi daemon (/etc/httpd/conf.d/wsgi.conf): - WSGIDaemonProcess ... processes=3 threads=15 ...

Celery worker config (/etc/conf.d/celery): - CELERYD_NODES="w1 w2" - CELERYD_OPTS="--time-limit=300 --concurrency=1"

Operational result: - 2 Celery nodes online (w1, w2) - queue jobs.env.create has 2 consumers under normal healthy state


2) How It Works (Runtime Flow)

2.1 Request path

  1. Client sends request to https://dev-backend.mightybox.site
  2. Apache (httpd) terminates TLS and forwards to mod_wsgi daemon group
  3. mod_wsgi executes wsgi.py and serves FastAPI app
  4. App uses:
  5. PostgreSQL for persistence
  6. Redis for cache/state
  7. RabbitMQ for Celery jobs and SSE fan-out messaging

2.2 Async jobs and events

  • API enqueues background work to Celery
  • Celery workers (w1, w2) consume jobs from RabbitMQ
  • SSE broker consumes RabbitMQ status events and pushes to connected clients

2.3 Health endpoints and intended use

Endpoint Purpose Expected behavior
/api/v1/health/ping process/network reachability very fast, always 200 if app reachable
/api/v1/health/live liveness probe lightweight process alive check
/api/v1/health/ready readiness probe verifies critical dependencies
/api/v1/health/celery Celery summary health fast summary (mode: summary)
/api/v1/health/celery/deep operator diagnostics slower deep worker inspection
/api/v1/health full composite check comprehensive, can be slower

3) Standard Operating Commands

3.1 Service status

ssh apache@15.204.15.210
systemctl is-active httpd celery
systemctl status httpd --no-pager -l
systemctl status celery --no-pager -l

3.2 Quick health verification

curl -sk https://dev-backend.mightybox.site/api/v1/health/ping
curl -sk https://dev-backend.mightybox.site/api/v1/health/live
curl -sk https://dev-backend.mightybox.site/api/v1/health/ready
curl -sk https://dev-backend.mightybox.site/api/v1/health/celery

3.3 Process checks

pgrep -af httpd
pgrep -af "celery.*worker"

3.4 Config inspection

grep -n "WSGIDaemonProcess" /etc/httpd/conf.d/wsgi.conf
grep -nE "MaxRequestWorkers|ThreadsPerChild|ServerLimit" /etc/httpd/conf/httpd.conf
grep -nE "CELERYD_NODES|CELERYD_OPTS" /etc/conf.d/celery

4) Troubleshooting Playbooks

4.1 Apache worker pressure (AH10159 / MaxRequestWorkers)

Symptom: error log includes AH10159: server is within MinSpareThreads of MaxRequestWorkers.

Checks:

tail -n 200 /var/log/httpd/error_log | grep -E "AH10159|AH00484|MaxRequestWorkers"

Actions: 1. Confirm MaxRequestWorkers and mod_wsgi process/thread settings. 2. Confirm available memory before increasing capacity. 3. Apply small, reversible increments only. 4. Validate with burst probe + logs.

4.2 Celery workers unhealthy or missing

Symptom: /health/celery shows online_workers: 0 or queue not bound.

Checks:

systemctl status celery --no-pager -l
pgrep -af "celery.*worker"
curl -sk https://dev-backend.mightybox.site/api/v1/health/celery

Actions: 1. Restart Celery: sudo systemctl restart celery 2. Re-check health endpoint and consumer count. 3. Inspect logs: journalctl -u celery --since "30 min ago" --no-pager

4.3 SSE issues (connected but no events)

Symptom: SSE endpoint connects but client receives no status updates.

Checks: 1. Verify broker dependency health in /health response (rabbitmq, celery_workers). 2. Confirm RabbitMQ reachable from app node. 3. Validate authenticated SSE path (cookie-based auth is required).

Actions: - Verify event publish path and routing keys. - Check SSE broker metrics and recent app logs.

4.4 Full health endpoint slow

Context: /health is comprehensive and can be slower than probe endpoints.

Guidance: - Use /health/ping, /health/live, /health/ready, /health/celery for probes/automation. - Reserve /health/celery/deep and /health for diagnostics.


5) Safe Change Procedure (Infra)

  1. Backup first
  2. /etc/httpd/conf/httpd.conf
  3. /etc/httpd/conf.d/wsgi.conf
  4. /etc/conf.d/celery
  5. Validate syntax before restart:
  6. sudo /usr/sbin/httpd -t
  7. Restart only needed services (httpd, celery)
  8. Post-change verification:
  9. service status
  10. key health endpoints
  11. recent error logs

6) Rollback

Use the latest backup copies and restart services.

# Example pattern (replace timestamps with latest backup files)
sudo cp /etc/httpd/conf/httpd.conf.bak.<timestamp> /etc/httpd/conf/httpd.conf
sudo cp /etc/httpd/conf.d/wsgi.conf.bak.<timestamp> /etc/httpd/conf.d/wsgi.conf
sudo cp /etc/conf.d/celery.bak.<timestamp> /etc/conf.d/celery

sudo /usr/sbin/httpd -t
sudo systemctl restart httpd
sudo systemctl restart celery

7) Developer Notes

  • The canonical favorites endpoints are under environments domain routes (/api/v1/environments/favorites and /favorites/toggle).
  • Legacy /api/v1/favourites route path is retired and should not be reintroduced.
  • For backend reliability checks, always separate lightweight probe checks from deep diagnostics.