Team management auth rbac

Enterprise Auth & Team Management (Multi-Tenant RBAC) — Implementation Guide¶

Author: Principal Lead Engineer Scope: Align FastAPI backend + Next.js frontend with multi-tenant RBAC, team management, and secure auth flows. This plan is constrained to officially documented patterns only.

Authoritative References (latest stable)¶

FastAPI security (JWT/OAuth2): https://fastapi.tiangolo.com/tutorial/security/oauth2-jwt/
FastAPI dependencies/middleware: https://fastapi.tiangolo.com/tutorial/dependencies/
StreamingResponse (for SSE alignment): https://fastapi.tiangolo.com/advanced/custom-response/#streamingresponse
Pydantic v2 models/settings: https://docs.pydantic.dev/latest/
SQLAlchemy 2.x asyncio ORM: https://docs.sqlalchemy.org/en/20/orm/extensions/asyncio.html
bcrypt hashing (official Python package): https://pypi.org/project/bcrypt/
python-jose JWT: https://python-jose.readthedocs.io/en/latest/
Redis (rate limiting / blacklist) client docs: https://redis-py.readthedocs.io/en/stable/
pyotp (TOTP RFC 6238): https://pyauth.github.io/pyotp/
Cryptography Fernet (secret encryption/rotation): https://cryptography.io/en/latest/fernet/
RabbitMQ monitoring & DLQ guidance: https://www.rabbitmq.com/docs/monitoring
Next.js 16 App Router & Server Actions: https://nextjs.org/docs
React 19 EventSource (WHATWG SSE): https://html.spec.whatwg.org/multipage/server-sent-events.html#the-eventsource-interface

Version Alignment (match compatibility matrix)¶

Python 3.11, FastAPI 0.124.2, Uvicorn 0.38.0, Pydantic v2, SQLAlchemy 2.x (async), bcrypt, aio-pika 9.4.x (for SSE), Node 22, Next.js 16.0.8, React 19.2.1, TypeScript 5.9.3.

Architecture Overview¶

Tenancy model: User (global identity) ↔ Team (owns resources) via TeamMember with Role.
RBAC: Permission (static, seeded) ↔ Role (per-team, editable except Owner) ↔ TeamMember.
Auth tokens: JWT (access short-lived, refresh longer), issued via FastAPI; stored as HttpOnly cookies.
Enforcement: FastAPI dependencies + middleware to resolve user_id, team_id, role, and check permissions before handlers.
SSE: Reuse existing events.topic exchange; auth will later scope SSE subscriptions by team/job once auth context is available.
Team creation rules: Only the account that upgrades from mightybox_trial to mightybox_billing may create a team; that user becomes Owner. Ownership transfer must be supported (policy-guarded) to move Owner to another account.
Virtuozzo session constraint: Only the Owner may retrieve/refresh a Virtuozzo session key for a team.

Data Model (to implement/migrate)¶

users: id, email (unique), hashed_password (bcrypt), is_active, is_verified, is_superuser, timestamps.
teams: id, owner_user_id, team_name, slug, status (TRIALING|ACTIVE|EXPIRED|ARCHIVED), timestamps.
team_members: id, user_id, team_id, role_id, timestamps.
permissions: slug (pk), description (seeded).
roles: id, team_id, name, is_editable, description.
role_permissions: role_id ↔ permission_slug (m2m).
invite_tokens: token, team_id, role_id, created_by, expires_at, used_at, used_by.
trusted_devices: per-user/device_fingerprint trust records with geo/IP/user_agent metadata and trusted/revoked timestamps.
active_sessions: per-user refresh session state (refresh_token_jti, device_fingerprint, ip, geo, last_seen_at, expires_at, revoked_reason).
login_activity: immutable audit log of login/refresh events with ip/ua/device_fingerprint/geo.
email_verification_tokens, password_reset_tokens (per security flows).

Core Flows (backend)¶

1) Registration - Door A (owner): create user + team + default roles; user becomes Owner only if upgraded to mightybox_billing; mark inactive until email verify; encrypt external creds if any. - Door B (invite): validate invite token, create user, attach to team with invited role, mark active/verified. 2) Login (Step 1) - Verify credentials (bcrypt hashing); return pre-auth token (scope team_select) + teams list. 3) Session Exchange (Step 2) - Input: pre-auth token + team_id; verify membership via indexed team_members (unique user_id+team_id) and ensure team status is allowed (TRIALING|ACTIVE). - Load team-scoped VZ session key by team_id: Redis cache (short TTL) → Postgres. If expired/near-expiry, take per-team lock and refresh using Owner’s encrypted VZ creds; others wait/reuse cached key. Enforce Owner-only refresh. - Issue access/refresh JWT (claims: sub, team_id, role) as HttpOnly cookies after successful membership/status validation. 4) Refresh / Logout - Refresh: validate refresh token, rotate, blacklist old (Redis); reissue cookies. - Logout: blacklist access token, clear cookies. 5) Authorization - Middleware: decode JWT, set request.state.user_id, team_id, role. - Dependency require_permission("perm"): join team_members→roles→role_permissions; raise 403 if missing. 6) Invitations - Create invite (team.invite permission), single-use token with expiry; Door B consumes and invalidates. 7) Password & Email - Email verification tokens (24h), password reset tokens (1h), both single-use. 8) Ownership & Virtuozzo - Team creation allowed only for upgraded (mightybox_billing) accounts; they become Owner. - Ownership transfer flow (policy-guarded) to reassign Owner. - Only Owner can request/refresh Virtuozzo session keys for the team. - VZ email creation (modernized): on email verification or billing upgrade, generate a dedicated VZ email for the Owner (deterministic, collision-resistant, e.g., localpart_userid_timestamp@mightybox.app). Call the env-driven VZ registration endpoint to create the account; store VZ email + VZ UID + encrypted VZ password + the initial team-scoped session key together on teams (not on users). Do not recreate on each login; allow Owner-initiated repair if missing/invalid. 9) Device approval & suspicious login - Device approval: for new device fingerprints or untrusted combinations of device/IP/UA, require email-based OTP before issuing long-lived refresh tokens; persist trusted device records and allow later revoke. - Suspicious login: use recent login activity (IP/geo/device) plus a geo-IP lookup provider to flag high-risk logins, send email alerts with approve/deny actions, and gate refresh/session usage until the user makes a decision. 10) SSE auth & events - SSE /api/v1/events is gated by team membership and an events:read permission; cross-team subscriptions are rejected and auth events are published to a RabbitMQ topic exchange for streaming to clients.

Client Integration (Next.js 16 + React 19)¶

Use Server Actions / Route Handlers for auth POSTs; avoid pages/api.
Store auth cookies HttpOnly; client reads auth state via server-side checks.
SSE consumption: new EventSource("/api/v1/events?team_id=..."); when auth is wired, pass team filter only if user is member.

Security Controls¶

Hashing: bcrypt (official doc above); no plaintext storage.
JWT: HS256 via python-jose; short access (~5–15m), refresh (~7d); HttpOnly + Secure + SameSite.
Rate limiting: Redis counter for login/register; 401/429 on abuse (per FastAPI middleware).
CSRF: double-submit token for state-changing requests if cookies used cross-site.
Audit logging: log login, logout, invite, role edits with user_id, team_id, ip, ua.
Secrets & networking: all creds from env; service DNS (postgres, redis, rabbitmq) not localhost in containers.
External auth URL (Virtuozzo sign-in legacy): EXTERNAL_AUTH_URL=https://app.mymightybox.io/1.0/users/authentication/rest/signin should be injected via env and consumed only by the Virtuozzo client. This repo also supports VZ_VIRTUOZZO_SIGNIN_URL as the preferred name; EXTERNAL_AUTH_URL is treated as an alias for compatibility.
Session inactivity auto-expiry: enforce idle timeout (30–60 minutes, configurable) by requiring every refresh call to update active_sessions.last_seen_at; deny refresh if now - last_seen_at exceeds the idle threshold, return a session_inactive error payload, blacklist the stale tokens, surface the event via SSE, and emit an audit log entry.
Device approval OTP policy: email OTPs carry a 5‑minute TTL, max 5 verification attempts, and are stored hashed (HMAC-SHA256 using ENCRYPTION_KEY) inside Redis; per-user and per-IP rate limits (Redis counters) throttle issuance and verification; each approval/denial is captured in login_activity with device/IP metadata.
Geolocation/IP/device logging: capture IP, UA, and geo (via IP lookup) on login/refresh; store in audit log; surface recent-login metadata to users for security review (similar to Google’s device activity).
New device authorization: detect new device/fingerprint (IP + UA + optional device hash). If new, require the OTP policy above before issuing refresh/access tokens, persist the trusted device row (fingerprint, ip, ua, geo, approved_by), and expose revoke/deny operations.
SSE authentication & CSRF: /api/v1/events validates the same HttpOnly access token used by other endpoints via a dependency that runs on every connection (StreamingResponse per https://fastapi.tiangolo.com/advanced/custom-response/#streamingresponse). Connections require SameSite cookies; on 401/expiry the client replays the session-exchange flow before reconnecting.
Single-session enforcement / double-login block: maintain an active-session record per user (device fingerprint, IP, geo, last_seen). On new login:
If an active session exists, either block issuance of tokens or invalidate the previous session (blacklist prior access/refresh tokens) and notify both sessions.
Notify current active session via frontend channel (e.g., SSE/WS) that another login was attempted with device/IP/geo details; prompt to approve/deny.
If denied, reject new login and keep the existing session; if approved, revoke old session and allow new one.
All session revocations must blacklist existing tokens and clear server-side session state.
Secrets encryption & rotation: all Virtuozzo creds, TOTP secrets (once enabled), refresh session metadata, and backup codes are encrypted using Fernet (ENCRYPTION_KEY, 32-byte urlsafe). Rotate the key at least quarterly: mint new key, re-encrypt stored secrets in a single transaction per team/user, update Redis caches, and alert if any row cannot be re-encrypted (per https://cryptography.io/en/latest/fernet/). Never log plaintext secrets or keys.
MFA/TOTP scope: MFA is deferred until Phase D completes; when enabled, use pyotp (https://pyauth.github.io/pyotp/) with otpauth URI/QR generation, encrypted secret storage, single-use backup codes, and mandatory step-up during new-device approval.

Operational Safeguards (Redis / RabbitMQ)¶

Redis availability: /health/redis endpoint pings PING and a short-lived lock acquisition. Alert if two consecutive checks fail. When Redis is unavailable, reject rate-limited actions with 503 redis_unavailable instead of dropping locks silently.
Redis retries/backoff: lock acquisition and counter increments use capped exponential backoff (max 5 tries, 200 ms cap) to avoid stampedes; failures emit structured logs so SRE can trace contention.
RabbitMQ publishing: use publisher confirms with a 5s timeout (per https://www.rabbitmq.com/docs/monitoring#publishers) and retry at most 3 times with jittered delays to avoid thundering herds. On repeated failure, persist the event to Postgres for later replay.
RabbitMQ DLQs: each topic exchange used for SSE fan-out includes a dead-letter queue; consumers monitor DLQ depth via Prometheus and raise alerts if >100 messages or if redelivered flag spikes, forcing investigation before replay.

Sequenced Work Plan (phases)¶

Phase A (Foundation): async DB session, settings for JWT/bcrypt/Redis; models + Alembic for users/teams/roles/permissions/invites.
Phase B (Auth): password hashing, JWT utils, login + session exchange + refresh + logout endpoints; middleware + require_permission.
Phase C (Team/RBAC): seed permissions; seed Owner/Manager/Developer roles on team create; invite create/consume; role/permission edit endpoints.
Phase D (User lifecycle + MFA prep): email verification, password reset, and gating work needed to launch TOTP (schema placeholders, secret storage, backup code UX) while keeping MFA disabled until security sign-off.
Phase E (SSE alignment): apply auth context to /api/v1/events (filter by team_id, reject unauthorized), keep DB as source of truth.
Phase F (Tests/CI): unit (security/jwt), integration (auth flows), E2E (invite/registration), load targets; enforce constraints install in CI.

Env & Config (examples)¶

JWT_SECRET_KEY, ACCESS_TOKEN_EXPIRE_MINUTES, REFRESH_TOKEN_EXPIRE_DAYS, JWT_ALGORITHM=HS256
BCRYPT_ROUNDS (bcrypt cost factor)
REDIS_URL=redis://redis:6379/0 (session cache, locks, rate limit)
ENCRYPTION_KEY (Fernet, base64 urlsafe 32-byte, for VZ secrets/TOTP)
DATABASE_URL=postgresql+asyncpg://user:pass@postgres:5432/db
EMAIL_SENDER_CONFIG (SMTP) per provider docs; keep out of code.

Deliverables (documentation + code alignment)¶

Models & migrations for auth/RBAC/teams/invites.
Auth router + service + security utils; middleware + permission dependency.
Role/permission seed + team default roles on create.
Invite endpoints; email verification + password reset endpoints.
Tests: unit (hash/JWT), integration (login/session/refresh), E2E (Door A/B).
Ops: rate limit + blacklist (Redis), audit log events, env docs with official links (above).
Ownership transfer flow + policy checks; Owner-only Virtuozzo session retrieval enforcement.
Session inactivity enforcement and recent-login geo/IP/device logging surfaced to users/admins.
New-device authorization flow with trust/revoke tracking.
Single-session enforcement with notify/approve/deny flow for concurrent login attempts.

Non-Goals (defer)¶

MFA/OAuth providers remain disabled in production until Phase D completes and security signs off on TOTP rollout.
Multi-region HA specifics and SOC2 audit runbooks (handled separately).

Virtuozzo API placement¶

Create a dedicated client module under backend/app/infrastructure/external/virtuozzo/ (e.g., client.py, auth.py, accounts.py) to organize multiple endpoints. Keep backend/app/infrastructure/external/virtuozzo.py as a thin facade if desired.
All Virtuozzo endpoints (including https://app.mymightybox.io/1.0/users/authentication/rest/signin) must be env-driven; never hardcode.
Enforce Owner-only access when requesting/refreshing Virtuozzo session keys.

Module placement (per 001-hybrid-modular-ddd)¶

Interface/application for auth lives under backend/app/modules/auth/ (router, schemas, service, dependencies).
Persistence models remain under backend/app/infrastructure/database/models/.
API wiring uses api/v1 to include modules/auth/router.

Virtuozzo session key management (team-scoped, stable)¶

Store one encrypted team-scoped Virtuozzo session key with metadata (expires_at/last_refreshed) in Postgres; optionally cache in Redis with shorter TTL.
Refresh lazily only when the key is expired or near-expiry (e.g., within 5–10 minutes), never on every login. Use the Owner’s encrypted VZ creds to refresh; enforce per-team lock to avoid stampede.
Serve all team members with the same valid session key server-side; never expose the key to clients. Members’ requests use the stored key; only Owner can trigger refresh if needed.
Failure handling: on VZ 401/timeout, try one refresh; if still failing, surface a controlled error and prompt Owner to reauthenticate/update VZ creds.
Bug/edge mitigations:
Over-rotation: do not refresh per login; only on expiry/near-expiry with a single team-scoped key.
Stampede: per-team mutex/lock around refresh; others reuse cached key.
Stale key after failure: if refresh fails, invalidate cached key and require Owner reauth/update VZ creds; do not proceed with stale key.
Ownership check: guard refresh path with Owner-only policy; members cannot trigger or view keys.
Credentials drift: if stored encrypted VZ creds are rejected, bubble a clear error and force Owner reauth to replace creds.
TTL mismatch: use configurable TTL/near-expiry window; refresh if expires_at < now + safety_window.
Ownership transfer race: re-validate current Owner before refresh; lock per team to avoid old Owner overwriting after transfer.
Partial write: treat refresh as an atomic update (write new key + metadata together); on failure, keep last-known-good key until replacement confirmed.
Logging leakage: never log keys or VZ creds; redact sensitive fields in error logs.
Cache/DB mismatch: Redis TTL shorter than VZ TTL; include last_refreshed/version to detect stale cache vs DB.
Repeated 401s: after one failed refresh, drop the key and require Owner action instead of looping.
Multi-team users: always scope lookup/lock by team_id to prevent cross-team key bleed.
Time skew: include safety margin (e.g., 5–10 minutes) when checking expiry to avoid clock drift issues.
VZ email lifecycle: Owner-only (re)creation; write VZ email + VZ UID + encrypted VZ password atomically with initial session key. Never log sensitive values; redact where necessary for support.

Keep all implementation choices traceable to the official references listed above; do not deviate from documented behaviors or unstated versions.

Alignment matrix vs `team_management.md` (special rules)¶

Team creation: Only upgraded (mightybox_billing) account can create a team → becomes Owner; enforced at team creation API and Door A registration.
Ownership transfer: Must exist and be policy-guarded; re-validate Owner before VZ refresh; block old Owner after transfer.
Team status gating: Allow session exchange/refresh only if team status in {TRIALING, ACTIVE}; deny otherwise.
Forked registration: Door A (owner) creates team; Door B (invite) joins existing team with invited role.
Virtuozzo access: Only Owner can retrieve/refresh session keys; members use stored team key server-side.
Billing dependency: Team creation and active usage require billing upgrade; maintain status and enforce on auth/refresh/VZ calls.
SSE gating: After auth integration, /events must filter by membership/team_id; deny expired teams.
Invites: Single-use, expiring tokens; audit create/accept; respect role assignment on accept.
New device/double-login: Require step-up for new device; single-session enforcement with approve/deny and token revocation.

Coverage / CI gates (minimum)¶

Unit tests: security/JWT/crypto/helpers ≥95% coverage for core; fail CI under threshold.
Integration tests: login → team select → refresh → logout; Door A/B flows; invite acceptance; SSE auth gate smoke; target ≥85% api/module coverage.
E2E: Door A (with VZ session refresh), Door B invite accept, ownership transfer, new-device approval, single-session enforcement.
Performance smoke: VZ refresh under contention (lock correctness) and auth rate-limit behavior.
CI install: backend pip install --no-deps -c backend/constraints.txt -e .; frontend npm ci; run lint/type/tests; enforce coverage threshold.