AI Agent Guide — Logging, Error Handling & OTEL (SigNoz Aligned)¶

Active implementation snapshot: December 18, 2025

Grounded in the live code under backend/app/core/*, backend/app/infrastructure/messaging/sse.py, and the authoritative observability backlog in docs/development/tasks/phase1_observability/logging_debugging_system.md. External recommendations cite the official Structlog manual, OpenTelemetry Python docs, and SigNoz instrumentation guides.[^structlog][^otel-python][^signoz-python][^signoz-manual][^signoz-troubleshooting]

0. Purpose & Alignment¶

Confirm what already ships. The FastAPI app initializes observability in lifespan() by calling setup_logging(), OpenTelemetry, Sentry, and the SSE broker.[^code-lifespan]
Document how to extend it. Every new router, background worker, or infrastructure client must emit structured logs, consistent error telemetry, and OTEL spans that SigNoz can ingest.
Give AI agents an execution playbook. Follow the workflow below whenever you touch backend code so you never guess about logging, tracing, or error handling.

1. Baseline Observability Map¶

Concern	Implementation (Code)	Tests / Docs
Structured logging + PII filtering	`backend/app/core/logging.py` (`setup_logging`, `filter_pii`, OTEL log exporter)	Phase‑1 doc §§0‑5
Correlation + auth context	`backend/app/core/middleware.py` (`CorrelationIdMiddleware`, `AuthContextMiddleware`, `RequestLoggingMiddleware`)	`backend/tests/integration/api/test_correlation_id.py`, `test_context_leakage.py`
Global exception policy	`backend/app/core/app_factory.py::_register_exception_handlers`	Phase‑1 doc §1, integration tests
Security events	`backend/app/core/security_events.py::log_auth_violation`	Phase‑7 doc, SSE auth tests
SSE/RabbitMQ visibility	`backend/app/infrastructure/messaging/sse.py` (metrics, manual spans)	`docs/architecture/WEBSOCKET/SSE-Notif-Phases.md`, smoke scripts
OpenTelemetry wiring	`backend/app/core/telemetry.py` + `OpenTelemetryRequestSpanMiddleware`	Phase‑4 doc, SigNoz dashboards
SigNoz log export	`backend/app/core/logging._configure_otel_log_handler` (OTLP gRPC)	SigNoz ingest pipelines

2. Implementation Workflow (Follow in Order)¶

Step 1 — Boot the Observability Stack¶

setup_logging() must run before any user code; lifespan() already guarantees this, so never bypass create_app().[^code-lifespan]
Middleware order matters: Correlation ID → CORS → OTEL span middleware → Auth context → Request logging.[^code-middleware]

```68:91:backend/app/core/app_factory.py app.add_middleware(RequestLoggingMiddleware) if getattr(settings, "ENABLE_OPENTELEMETRY", False): app.add_middleware(OpenTelemetryRequestSpanMiddleware) app.add_middleware(AuthContextMiddleware) app.add_middleware(CORSMiddleware, ...) app.add_middleware(CorrelationIdMiddleware, header_name="X-Request-ID")

### Step 2 — Emit Structured Logs Everywhere

1. **Get the logger once per module.**
   ```python
   from app.core.logging import get_logger
   logger = get_logger(__name__)
   ```
2. **Log events, not sentences.** Use `logger.info("job_dispatched", job_id=job.id, ...)`.
3. **PII guard rails:** `filter_pii()` redacts keys listed in `SENSITIVE_KEYS`, and production mode allow-lists `SAFE_FIELDS` so stick to those field names unless you add new ones upstream.[^code-safe-fields]
4. **Respect request context.** `add_correlation_id` and `add_request_context` inject `correlation_id`, `team_id`, and `user_id` automatically; never log those manually unless you are outside request scope.
5. **Prefer domain helpers:**
   * For external HTTP clients use `create_external_async_client()` which already logs sanitized request/response metadata.
   * For auth/security misuse `log_auth_violation()` so abuse counters stay accurate.
   * For SSE fan-out rely on `SSEBroker` logging hooks instead of ad-hoc prints.

```208:236:backend/app/core/logging.py
    structlog.configure(
        processors=[
            structlog.contextvars.merge_contextvars,
            structlog.stdlib.filter_by_level,
            ...
            add_correlation_id,
            add_request_context,
            filter_pii,
            structlog.stdlib.ProcessorFormatter.wrap_for_formatter,
        ],
        ...
    )

Sample pattern for new code¶

from app.core.logging import get_logger
logger = get_logger(__name__)

async def provision_environment(job_ctx: JobContext) -> None:
    logger.info(
        "env_provision_started",
        job_id=str(job_ctx.job_id),
        env_name=job_ctx.env_name,
    )
    try:
        ...
    except ExternalAPIError as exc:
        logger.error(
            "env_provision_failed",
            job_id=str(job_ctx.job_id),
            env_name=job_ctx.env_name,
            error_type=type(exc).__name__,
            exc_info=True,
        )
        raise

Step 3 — Domain-Specific Logging Hooks¶

Scenario	Required action
FastAPI endpoints	Let `RequestLoggingMiddleware` emit the request lifecycle log; within handlers only log meaningful domain transitions (authorization decision, cross-tenant attempt, etc.).
Background jobs / scripts	Call `setup_logging()` once (scripts already do) and log start/end plus key waypoints. Reuse `JobContext` so logs include `team_id`/`env_name`.
External HTTP calls	Instantiate clients via `create_external_async_client(service="virtuozzo", ...)` to inherit consistent logging and redaction.
Security events	Use `log_auth_violation(...)` instead of ad-hoc warnings so Redis/in-memory abuse counters stay in sync.
SSE / RabbitMQ	Wrap fan-out batches in `_telemetry_span()` and rely on `SSEBroker.subscribe/unsubscribe` logs for subscriber counts. No payloads, only metadata.

Step 4 — Centralized Error Handling¶

The FastAPI exception handlers already log once per failure; do not duplicate stack traces.
Raise HTTPException for expected user errors (401–404) and let _register_exception_handlers log them at DEBUG.
For 5xx scenarios set exc_info=True and re-raise so the global handler emits unhandled_exception.
Authentication/rate limit denials must flow through log_auth_violation() so SigNoz dashboards can detect abuse spikes.

```108:174:backend/app/core/app_factory.py @app.exception_handler(Exception) async def unhandled_exception_handler(request, exc): logger.error( "unhandled_exception", error_type=type(exc).name, error_msg=str(exc), method=request.method, path=request.url.path, exc_info=True, ) return JSONResponse(status_code=500, content={"detail": "Internal server error"})

### Step 5 — OpenTelemetry & SigNoz Integration

1. **Enable tracing/log export by configuration:**
   * `ENABLE_OPENTELEMETRY=true`
   * `OTEL_EXPORTER_OTLP_ENDPOINT=https://ingest.<region>.signoz.cloud:443` (or your self-hosted collector)
   * `OTEL_EXPORTER_OTLP_INSECURE=false` in production unless SigNoz endpoint is plaintext.
   * `OTEL_LOGS_ENABLED=true` to push Structlog output through the OTLP log exporter.
2. **Automatic server spans** are generated by `OpenTelemetryRequestSpanMiddleware`, which extracts W3C headers, sets HTTP attributes, and forwards correlation/team/user ids to SigNoz.[^code-otel-middleware]
3. **Manual spans**: use `_telemetry_span()` helper or `get_tracer("package.path")` when wrapping background work (DB migrations, RabbitMQ handlers, Virtuozzo client operations). Record exceptions so SigNoz surfaces them as errors.

```247:293:backend/app/core/telemetry.py
        with self._tracer.start_as_current_span(span_name, context=ctx, kind=SpanKind.SERVER) as span:
            set_attr("http.method", method)
            set_attr("url.path", path)
            ...
            if sc >= 500:
                span.set_status(Status(StatusCode.ERROR))

SigNoz Cloud headers: When using SigNoz Cloud, set OTEL_EXPORTER_OTLP_HEADERS="signoz-ingestion-key=<KEY>" per their ingestion guide.[^signoz-python]
Logs to SigNoz: _configure_otel_log_handler() attaches opentelemetry.sdk._logs.LoggingHandler so Structured logs flow over OTLP. Keep log volume manageable by respecting SAFE_FIELDS.

Step 6 — Validation Before You Ship¶

Unit / integration coverage: run pytest backend/tests/integration/api/test_correlation_id.py backend/tests/integration/api/test_context_leakage.py.
Smoke logging sink: python backend/tests/smoke/logging/log_sink_probe.py --log-dir logs.
SSE smoke: follow backend/tests/smoke/sse/README (ensures broker metrics/logs stay sane).
SigNoz verification: trigger a request locally with OTEL enabled and confirm the span/log shows up in SigNoz’ service view within 1–2 minutes.[^signoz-troubleshooting]

3. SigNoz Configuration Quickstart¶

Variable	Purpose	Example value
`ENABLE_OPENTELEMETRY`	Turns on tracing middleware + exporter	`true`
`OTEL_SERVICE_NAME`	Service tag shown in SigNoz	`mbpanel-api`
`OTEL_EXPORTER_OTLP_ENDPOINT`	OTLP gRPC endpoint	`https://ingest.us-east-2.signoz.cloud:443`
`OTEL_EXPORTER_OTLP_HEADERS`	Auth header for SigNoz Cloud	`signoz-ingestion-key=...`
`OTEL_LOGS_ENABLED`	Routes structlog output to OTLP	`true`
`OTEL_METRICS_ENABLED`	Enables OTLP metric exporter	`true` (if collector supports it)
`LOG_TO_FILE`, `LOG_DIR`	Optional rotating JSONL sink for Virtuozzo	`true`, `/var/www/error`

Per SigNoz’ OTLP instructions, OTEL exporters speak gRPC on port 4317; no custom protocol adapters are needed.[^signoz-python]

4. Pull Request Checklist (Copy Into Descriptions)¶

Every new module uses get_logger(__name__) once, no bare print.
Logs only include SAFE_FIELDS or documented additions; sensitive data is never logged.
Error paths rely on central handlers (HTTPException, log_auth_violation, or capture_exception) instead of ad-hoc prints.
OTEL spans wrap any new external calls / background loops; errors mark the span status.
SigNoz-specific env vars documented in README or .env.example when new ones are required.
pytest backend/tests/integration/api/test_correlation_id.py backend/tests/integration/api/test_context_leakage.py succeeds locally.

References¶

[^structlog]: Structlog best practices (contextvars, processor chains) — https://www.structlog.org/en/stable/best_practices.html [^otel-python]: OpenTelemetry Python manual instrumentation guide — https://opentelemetry.io/docs/languages/python/instrumentation/ [^signoz-python]: SigNoz — Auto-instrument Python apps with OpenTelemetry & OTLP ingest headers — https://signoz.io/docs/instrumentation/opentelemetry-python/ [^signoz-manual]: SigNoz — Manual spans in Python applications — https://signoz.io/opentelemetry/manual-spans-in-python-application/ [^signoz-troubleshooting]: SigNoz — Troubleshooting Python with OpenTelemetry tracing — https://signoz.io/blog/troubleshooting-python-with-opentelemetry-tracing/ [^code-lifespan]: backend/app/core/app_factory.py::lifespan — initializes logging, Sentry, OTEL, SSE broker. [^code-middleware]: backend/app/core/app_factory.py::create_app — documents middleware order for observability. [^code-safe-fields]: backend/app/core/logging.py::SAFE_FIELDS — allow-listed log attributes enforced in production. [^code-otel-middleware]: backend/app/core/telemetry.py::OpenTelemetryRequestSpanMiddleware — extracts trace context, sets HTTP + tenant attributes.