Skip to content

AI Agent Guide — Logging, Error Handling & OTEL (SigNoz Aligned)

Active implementation snapshot: December 18, 2025

Grounded in the live code under backend/app/core/*, backend/app/infrastructure/messaging/sse.py, and the authoritative observability backlog in docs/development/tasks/phase1_observability/logging_debugging_system.md. External recommendations cite the official Structlog manual, OpenTelemetry Python docs, and SigNoz instrumentation guides.[^structlog][^otel-python][^signoz-python][^signoz-manual][^signoz-troubleshooting]


0. Purpose & Alignment

  1. Confirm what already ships. The FastAPI app initializes observability in lifespan() by calling setup_logging(), OpenTelemetry, Sentry, and the SSE broker.[^code-lifespan]
  2. Document how to extend it. Every new router, background worker, or infrastructure client must emit structured logs, consistent error telemetry, and OTEL spans that SigNoz can ingest.
  3. Give AI agents an execution playbook. Follow the workflow below whenever you touch backend code so you never guess about logging, tracing, or error handling.

1. Baseline Observability Map

Concern Implementation (Code) Tests / Docs
Structured logging + PII filtering backend/app/core/logging.py (setup_logging, filter_pii, OTEL log exporter) Phase‑1 doc §§0‑5
Correlation + auth context backend/app/core/middleware.py (CorrelationIdMiddleware, AuthContextMiddleware, RequestLoggingMiddleware) backend/tests/integration/api/test_correlation_id.py, test_context_leakage.py
Global exception policy backend/app/core/app_factory.py::_register_exception_handlers Phase‑1 doc §1, integration tests
Security events backend/app/core/security_events.py::log_auth_violation Phase‑7 doc, SSE auth tests
SSE/RabbitMQ visibility backend/app/infrastructure/messaging/sse.py (metrics, manual spans) docs/architecture/WEBSOCKET/SSE-Notif-Phases.md, smoke scripts
OpenTelemetry wiring backend/app/core/telemetry.py + OpenTelemetryRequestSpanMiddleware Phase‑4 doc, SigNoz dashboards
SigNoz log export backend/app/core/logging._configure_otel_log_handler (OTLP gRPC) SigNoz ingest pipelines

2. Implementation Workflow (Follow in Order)

Step 1 — Boot the Observability Stack

  • setup_logging() must run before any user code; lifespan() already guarantees this, so never bypass create_app().[^code-lifespan]
  • Middleware order matters: Correlation ID → CORS → OTEL span middleware → Auth context → Request logging.[^code-middleware]

```68:91:backend/app/core/app_factory.py app.add_middleware(RequestLoggingMiddleware) if getattr(settings, "ENABLE_OPENTELEMETRY", False): app.add_middleware(OpenTelemetryRequestSpanMiddleware) app.add_middleware(AuthContextMiddleware) app.add_middleware(CORSMiddleware, ...) app.add_middleware(CorrelationIdMiddleware, header_name="X-Request-ID")

### Step 2 — Emit Structured Logs Everywhere

1. **Get the logger once per module.**
   ```python
   from app.core.logging import get_logger
   logger = get_logger(__name__)
   ```
2. **Log events, not sentences.** Use `logger.info("job_dispatched", job_id=job.id, ...)`.
3. **PII guard rails:** `filter_pii()` redacts keys listed in `SENSITIVE_KEYS`, and production mode allow-lists `SAFE_FIELDS` so stick to those field names unless you add new ones upstream.[^code-safe-fields]
4. **Respect request context.** `add_correlation_id` and `add_request_context` inject `correlation_id`, `team_id`, and `user_id` automatically; never log those manually unless you are outside request scope.
5. **Prefer domain helpers:**
   * For external HTTP clients use `create_external_async_client()` which already logs sanitized request/response metadata.
   * For auth/security misuse `log_auth_violation()` so abuse counters stay accurate.
   * For SSE fan-out rely on `SSEBroker` logging hooks instead of ad-hoc prints.

```208:236:backend/app/core/logging.py
    structlog.configure(
        processors=[
            structlog.contextvars.merge_contextvars,
            structlog.stdlib.filter_by_level,
            ...
            add_correlation_id,
            add_request_context,
            filter_pii,
            structlog.stdlib.ProcessorFormatter.wrap_for_formatter,
        ],
        ...
    )

Sample pattern for new code

from app.core.logging import get_logger
logger = get_logger(__name__)

async def provision_environment(job_ctx: JobContext) -> None:
    logger.info(
        "env_provision_started",
        job_id=str(job_ctx.job_id),
        env_name=job_ctx.env_name,
    )
    try:
        ...
    except ExternalAPIError as exc:
        logger.error(
            "env_provision_failed",
            job_id=str(job_ctx.job_id),
            env_name=job_ctx.env_name,
            error_type=type(exc).__name__,
            exc_info=True,
        )
        raise

Step 3 — Domain-Specific Logging Hooks

Scenario Required action
FastAPI endpoints Let RequestLoggingMiddleware emit the request lifecycle log; within handlers only log meaningful domain transitions (authorization decision, cross-tenant attempt, etc.).
Background jobs / scripts Call setup_logging() once (scripts already do) and log start/end plus key waypoints. Reuse JobContext so logs include team_id/env_name.
External HTTP calls Instantiate clients via create_external_async_client(service="virtuozzo", ...) to inherit consistent logging and redaction.
Security events Use log_auth_violation(...) instead of ad-hoc warnings so Redis/in-memory abuse counters stay in sync.
SSE / RabbitMQ Wrap fan-out batches in _telemetry_span() and rely on SSEBroker.subscribe/unsubscribe logs for subscriber counts. No payloads, only metadata.

Step 4 — Centralized Error Handling

  • The FastAPI exception handlers already log once per failure; do not duplicate stack traces.
  • Raise HTTPException for expected user errors (401–404) and let _register_exception_handlers log them at DEBUG.
  • For 5xx scenarios set exc_info=True and re-raise so the global handler emits unhandled_exception.
  • Authentication/rate limit denials must flow through log_auth_violation() so SigNoz dashboards can detect abuse spikes.

```108:174:backend/app/core/app_factory.py @app.exception_handler(Exception) async def unhandled_exception_handler(request, exc): logger.error( "unhandled_exception", error_type=type(exc).name, error_msg=str(exc), method=request.method, path=request.url.path, exc_info=True, ) return JSONResponse(status_code=500, content={"detail": "Internal server error"})

### Step 5 — OpenTelemetry & SigNoz Integration

1. **Enable tracing/log export by configuration:**
   * `ENABLE_OPENTELEMETRY=true`
   * `OTEL_EXPORTER_OTLP_ENDPOINT=https://ingest.<region>.signoz.cloud:443` (or your self-hosted collector)
   * `OTEL_EXPORTER_OTLP_INSECURE=false` in production unless SigNoz endpoint is plaintext.
   * `OTEL_LOGS_ENABLED=true` to push Structlog output through the OTLP log exporter.
2. **Automatic server spans** are generated by `OpenTelemetryRequestSpanMiddleware`, which extracts W3C headers, sets HTTP attributes, and forwards correlation/team/user ids to SigNoz.[^code-otel-middleware]
3. **Manual spans**: use `_telemetry_span()` helper or `get_tracer("package.path")` when wrapping background work (DB migrations, RabbitMQ handlers, Virtuozzo client operations). Record exceptions so SigNoz surfaces them as errors.

```247:293:backend/app/core/telemetry.py
        with self._tracer.start_as_current_span(span_name, context=ctx, kind=SpanKind.SERVER) as span:
            set_attr("http.method", method)
            set_attr("url.path", path)
            ...
            if sc >= 500:
                span.set_status(Status(StatusCode.ERROR))

  1. SigNoz Cloud headers: When using SigNoz Cloud, set OTEL_EXPORTER_OTLP_HEADERS="signoz-ingestion-key=<KEY>" per their ingestion guide.[^signoz-python]
  2. Logs to SigNoz: _configure_otel_log_handler() attaches opentelemetry.sdk._logs.LoggingHandler so Structured logs flow over OTLP. Keep log volume manageable by respecting SAFE_FIELDS.

Step 6 — Validation Before You Ship

  1. Unit / integration coverage: run pytest backend/tests/integration/api/test_correlation_id.py backend/tests/integration/api/test_context_leakage.py.
  2. Smoke logging sink: python backend/tests/smoke/logging/log_sink_probe.py --log-dir logs.
  3. SSE smoke: follow backend/tests/smoke/sse/README (ensures broker metrics/logs stay sane).
  4. SigNoz verification: trigger a request locally with OTEL enabled and confirm the span/log shows up in SigNoz’ service view within 1–2 minutes.[^signoz-troubleshooting]

3. SigNoz Configuration Quickstart

Variable Purpose Example value
ENABLE_OPENTELEMETRY Turns on tracing middleware + exporter true
OTEL_SERVICE_NAME Service tag shown in SigNoz mbpanel-api
OTEL_EXPORTER_OTLP_ENDPOINT OTLP gRPC endpoint https://ingest.us-east-2.signoz.cloud:443
OTEL_EXPORTER_OTLP_HEADERS Auth header for SigNoz Cloud signoz-ingestion-key=...
OTEL_LOGS_ENABLED Routes structlog output to OTLP true
OTEL_METRICS_ENABLED Enables OTLP metric exporter true (if collector supports it)
LOG_TO_FILE, LOG_DIR Optional rotating JSONL sink for Virtuozzo true, /var/www/error

Per SigNoz’ OTLP instructions, OTEL exporters speak gRPC on port 4317; no custom protocol adapters are needed.[^signoz-python]


4. Pull Request Checklist (Copy Into Descriptions)

  1. Every new module uses get_logger(__name__) once, no bare print.
  2. Logs only include SAFE_FIELDS or documented additions; sensitive data is never logged.
  3. Error paths rely on central handlers (HTTPException, log_auth_violation, or capture_exception) instead of ad-hoc prints.
  4. OTEL spans wrap any new external calls / background loops; errors mark the span status.
  5. SigNoz-specific env vars documented in README or .env.example when new ones are required.
  6. pytest backend/tests/integration/api/test_correlation_id.py backend/tests/integration/api/test_context_leakage.py succeeds locally.

References

[^structlog]: Structlog best practices (contextvars, processor chains) — https://www.structlog.org/en/stable/best_practices.html [^otel-python]: OpenTelemetry Python manual instrumentation guide — https://opentelemetry.io/docs/languages/python/instrumentation/ [^signoz-python]: SigNoz — Auto-instrument Python apps with OpenTelemetry & OTLP ingest headers — https://signoz.io/docs/instrumentation/opentelemetry-python/ [^signoz-manual]: SigNoz — Manual spans in Python applications — https://signoz.io/opentelemetry/manual-spans-in-python-application/ [^signoz-troubleshooting]: SigNoz — Troubleshooting Python with OpenTelemetry tracing — https://signoz.io/blog/troubleshooting-python-with-opentelemetry-tracing/ [^code-lifespan]: backend/app/core/app_factory.py::lifespan — initializes logging, Sentry, OTEL, SSE broker. [^code-middleware]: backend/app/core/app_factory.py::create_app — documents middleware order for observability. [^code-safe-fields]: backend/app/core/logging.py::SAFE_FIELDS — allow-listed log attributes enforced in production. [^code-otel-middleware]: backend/app/core/telemetry.py::OpenTelemetryRequestSpanMiddleware — extracts trace context, sets HTTP + tenant attributes.