Team management auth rbac
Enterprise Auth & Team Management (Multi-Tenant RBAC) — Implementation Guide¶
Author: Principal Lead Engineer Scope: Align FastAPI backend + Next.js frontend with multi-tenant RBAC, team management, and secure auth flows. This plan is constrained to officially documented patterns only.
Authoritative References (latest stable)¶
- FastAPI security (JWT/OAuth2): https://fastapi.tiangolo.com/tutorial/security/oauth2-jwt/
- FastAPI dependencies/middleware: https://fastapi.tiangolo.com/tutorial/dependencies/
- StreamingResponse (for SSE alignment): https://fastapi.tiangolo.com/advanced/custom-response/#streamingresponse
- Pydantic v2 models/settings: https://docs.pydantic.dev/latest/
- SQLAlchemy 2.x asyncio ORM: https://docs.sqlalchemy.org/en/20/orm/extensions/asyncio.html
- bcrypt hashing (official Python package): https://pypi.org/project/bcrypt/
- python-jose JWT: https://python-jose.readthedocs.io/en/latest/
- Redis (rate limiting / blacklist) client docs: https://redis-py.readthedocs.io/en/stable/
- pyotp (TOTP RFC 6238): https://pyauth.github.io/pyotp/
- Cryptography Fernet (secret encryption/rotation): https://cryptography.io/en/latest/fernet/
- RabbitMQ monitoring & DLQ guidance: https://www.rabbitmq.com/docs/monitoring
- Next.js 16 App Router & Server Actions: https://nextjs.org/docs
- React 19 EventSource (WHATWG SSE): https://html.spec.whatwg.org/multipage/server-sent-events.html#the-eventsource-interface
Version Alignment (match compatibility matrix)¶
- Python 3.11, FastAPI 0.124.2, Uvicorn 0.38.0, Pydantic v2, SQLAlchemy 2.x (async), bcrypt, aio-pika 9.4.x (for SSE), Node 22, Next.js 16.0.8, React 19.2.1, TypeScript 5.9.3.
Architecture Overview¶
- Tenancy model:
User(global identity) ↔Team(owns resources) viaTeamMemberwithRole. - RBAC:
Permission(static, seeded) ↔Role(per-team, editable except Owner) ↔TeamMember. - Auth tokens: JWT (access short-lived, refresh longer), issued via FastAPI; stored as HttpOnly cookies.
- Enforcement: FastAPI dependencies + middleware to resolve
user_id,team_id,role, and check permissions before handlers. - SSE: Reuse existing
events.topicexchange; auth will later scope SSE subscriptions by team/job once auth context is available. - Team creation rules: Only the account that upgrades from
mightybox_trialtomightybox_billingmay create a team; that user becomesOwner. Ownership transfer must be supported (policy-guarded) to moveOwnerto another account. - Virtuozzo session constraint: Only the
Ownermay retrieve/refresh a Virtuozzo session key for a team.
Data Model (to implement/migrate)¶
users: id, email (unique), hashed_password (bcrypt), is_active, is_verified, is_superuser, timestamps.teams: id, owner_user_id, team_name, slug, status (TRIALING|ACTIVE|EXPIRED|ARCHIVED), timestamps.team_members: id, user_id, team_id, role_id, timestamps.permissions: slug (pk), description (seeded).roles: id, team_id, name, is_editable, description.role_permissions: role_id ↔ permission_slug (m2m).invite_tokens: token, team_id, role_id, created_by, expires_at, used_at, used_by.trusted_devices: per-user/device_fingerprint trust records with geo/IP/user_agent metadata and trusted/revoked timestamps.active_sessions: per-user refresh session state (refresh_token_jti, device_fingerprint, ip, geo, last_seen_at, expires_at, revoked_reason).login_activity: immutable audit log of login/refresh events with ip/ua/device_fingerprint/geo.email_verification_tokens,password_reset_tokens(per security flows).
Core Flows (backend)¶
1) Registration
- Door A (owner): create user + team + default roles; user becomes Owner only if upgraded to mightybox_billing; mark inactive until email verify; encrypt external creds if any.
- Door B (invite): validate invite token, create user, attach to team with invited role, mark active/verified.
2) Login (Step 1)
- Verify credentials (bcrypt hashing); return pre-auth token (scope team_select) + teams list.
3) Session Exchange (Step 2)
- Input: pre-auth token + team_id; verify membership via indexed team_members (unique user_id+team_id) and ensure team status is allowed (TRIALING|ACTIVE).
- Load team-scoped VZ session key by team_id: Redis cache (short TTL) → Postgres. If expired/near-expiry, take per-team lock and refresh using Owner’s encrypted VZ creds; others wait/reuse cached key. Enforce Owner-only refresh.
- Issue access/refresh JWT (claims: sub, team_id, role) as HttpOnly cookies after successful membership/status validation.
4) Refresh / Logout
- Refresh: validate refresh token, rotate, blacklist old (Redis); reissue cookies.
- Logout: blacklist access token, clear cookies.
5) Authorization
- Middleware: decode JWT, set request.state.user_id, team_id, role.
- Dependency require_permission("perm"): join team_members→roles→role_permissions; raise 403 if missing.
6) Invitations
- Create invite (team.invite permission), single-use token with expiry; Door B consumes and invalidates.
7) Password & Email
- Email verification tokens (24h), password reset tokens (1h), both single-use.
8) Ownership & Virtuozzo
- Team creation allowed only for upgraded (mightybox_billing) accounts; they become Owner.
- Ownership transfer flow (policy-guarded) to reassign Owner.
- Only Owner can request/refresh Virtuozzo session keys for the team.
- VZ email creation (modernized): on email verification or billing upgrade, generate a dedicated VZ email for the Owner (deterministic, collision-resistant, e.g., localpart_userid_timestamp@mightybox.app). Call the env-driven VZ registration endpoint to create the account; store VZ email + VZ UID + encrypted VZ password + the initial team-scoped session key together on teams (not on users). Do not recreate on each login; allow Owner-initiated repair if missing/invalid.
9) Device approval & suspicious login
- Device approval: for new device fingerprints or untrusted combinations of device/IP/UA, require email-based OTP before issuing long-lived refresh tokens; persist trusted device records and allow later revoke.
- Suspicious login: use recent login activity (IP/geo/device) plus a geo-IP lookup provider to flag high-risk logins, send email alerts with approve/deny actions, and gate refresh/session usage until the user makes a decision.
10) SSE auth & events
- SSE /api/v1/events is gated by team membership and an events:read permission; cross-team subscriptions are rejected and auth events are published to a RabbitMQ topic exchange for streaming to clients.
Client Integration (Next.js 16 + React 19)¶
- Use Server Actions / Route Handlers for auth POSTs; avoid pages/api.
- Store auth cookies HttpOnly; client reads auth state via server-side checks.
- SSE consumption:
new EventSource("/api/v1/events?team_id=..."); when auth is wired, pass team filter only if user is member.
Security Controls¶
- Hashing: bcrypt (official doc above); no plaintext storage.
- JWT: HS256 via python-jose; short access (~5–15m), refresh (~7d); HttpOnly + Secure + SameSite.
- Rate limiting: Redis counter for login/register; 401/429 on abuse (per FastAPI middleware).
- CSRF: double-submit token for state-changing requests if cookies used cross-site.
- Audit logging: log login, logout, invite, role edits with user_id, team_id, ip, ua.
- Secrets & networking: all creds from env; service DNS (
postgres,redis,rabbitmq) not localhost in containers. - External auth URL (Virtuozzo sign-in legacy):
EXTERNAL_AUTH_URL=https://app.mymightybox.io/1.0/users/authentication/rest/signinshould be injected via env and consumed only by the Virtuozzo client. This repo also supportsVZ_VIRTUOZZO_SIGNIN_URLas the preferred name;EXTERNAL_AUTH_URLis treated as an alias for compatibility. - Session inactivity auto-expiry: enforce idle timeout (30–60 minutes, configurable) by requiring every refresh call to update
active_sessions.last_seen_at; deny refresh ifnow - last_seen_atexceeds the idle threshold, return asession_inactiveerror payload, blacklist the stale tokens, surface the event via SSE, and emit an audit log entry. - Device approval OTP policy: email OTPs carry a 5‑minute TTL, max 5 verification attempts, and are stored hashed (HMAC-SHA256 using
ENCRYPTION_KEY) inside Redis; per-user and per-IP rate limits (Redis counters) throttle issuance and verification; each approval/denial is captured inlogin_activitywith device/IP metadata. - Geolocation/IP/device logging: capture IP, UA, and geo (via IP lookup) on login/refresh; store in audit log; surface recent-login metadata to users for security review (similar to Google’s device activity).
- New device authorization: detect new device/fingerprint (IP + UA + optional device hash). If new, require the OTP policy above before issuing refresh/access tokens, persist the trusted device row (fingerprint, ip, ua, geo, approved_by), and expose revoke/deny operations.
- SSE authentication & CSRF:
/api/v1/eventsvalidates the same HttpOnly access token used by other endpoints via a dependency that runs on every connection (StreamingResponse per https://fastapi.tiangolo.com/advanced/custom-response/#streamingresponse). Connections require SameSite cookies; on 401/expiry the client replays the session-exchange flow before reconnecting. - Single-session enforcement / double-login block: maintain an active-session record per user (device fingerprint, IP, geo, last_seen). On new login:
- If an active session exists, either block issuance of tokens or invalidate the previous session (blacklist prior access/refresh tokens) and notify both sessions.
- Notify current active session via frontend channel (e.g., SSE/WS) that another login was attempted with device/IP/geo details; prompt to approve/deny.
- If denied, reject new login and keep the existing session; if approved, revoke old session and allow new one.
- All session revocations must blacklist existing tokens and clear server-side session state.
- Secrets encryption & rotation: all Virtuozzo creds, TOTP secrets (once enabled), refresh session metadata, and backup codes are encrypted using Fernet (
ENCRYPTION_KEY, 32-byte urlsafe). Rotate the key at least quarterly: mint new key, re-encrypt stored secrets in a single transaction per team/user, update Redis caches, and alert if any row cannot be re-encrypted (per https://cryptography.io/en/latest/fernet/). Never log plaintext secrets or keys. - MFA/TOTP scope: MFA is deferred until Phase D completes; when enabled, use
pyotp(https://pyauth.github.io/pyotp/) with otpauth URI/QR generation, encrypted secret storage, single-use backup codes, and mandatory step-up during new-device approval.
Operational Safeguards (Redis / RabbitMQ)¶
- Redis availability:
/health/redisendpoint pingsPINGand a short-lived lock acquisition. Alert if two consecutive checks fail. When Redis is unavailable, reject rate-limited actions with503 redis_unavailableinstead of dropping locks silently. - Redis retries/backoff: lock acquisition and counter increments use capped exponential backoff (max 5 tries, 200 ms cap) to avoid stampedes; failures emit structured logs so SRE can trace contention.
- RabbitMQ publishing: use publisher confirms with a 5s timeout (per https://www.rabbitmq.com/docs/monitoring#publishers) and retry at most 3 times with jittered delays to avoid thundering herds. On repeated failure, persist the event to Postgres for later replay.
- RabbitMQ DLQs: each topic exchange used for SSE fan-out includes a dead-letter queue; consumers monitor DLQ depth via Prometheus and raise alerts if >100 messages or if
redeliveredflag spikes, forcing investigation before replay.
Sequenced Work Plan (phases)¶
- Phase A (Foundation): async DB session, settings for JWT/bcrypt/Redis; models + Alembic for users/teams/roles/permissions/invites.
- Phase B (Auth): password hashing, JWT utils, login + session exchange + refresh + logout endpoints; middleware +
require_permission. - Phase C (Team/RBAC): seed permissions; seed Owner/Manager/Developer roles on team create; invite create/consume; role/permission edit endpoints.
- Phase D (User lifecycle + MFA prep): email verification, password reset, and gating work needed to launch TOTP (schema placeholders, secret storage, backup code UX) while keeping MFA disabled until security sign-off.
- Phase E (SSE alignment): apply auth context to
/api/v1/events(filter by team_id, reject unauthorized), keep DB as source of truth. - Phase F (Tests/CI): unit (security/jwt), integration (auth flows), E2E (invite/registration), load targets; enforce constraints install in CI.
Env & Config (examples)¶
JWT_SECRET_KEY,ACCESS_TOKEN_EXPIRE_MINUTES,REFRESH_TOKEN_EXPIRE_DAYS,JWT_ALGORITHM=HS256BCRYPT_ROUNDS(bcrypt cost factor)REDIS_URL=redis://redis:6379/0(session cache, locks, rate limit)ENCRYPTION_KEY(Fernet, base64 urlsafe 32-byte, for VZ secrets/TOTP)DATABASE_URL=postgresql+asyncpg://user:pass@postgres:5432/dbEMAIL_SENDER_CONFIG(SMTP) per provider docs; keep out of code.
Deliverables (documentation + code alignment)¶
- Models & migrations for auth/RBAC/teams/invites.
- Auth router + service + security utils; middleware + permission dependency.
- Role/permission seed + team default roles on create.
- Invite endpoints; email verification + password reset endpoints.
- Tests: unit (hash/JWT), integration (login/session/refresh), E2E (Door A/B).
- Ops: rate limit + blacklist (Redis), audit log events, env docs with official links (above).
- Ownership transfer flow + policy checks;
Owner-only Virtuozzo session retrieval enforcement. - Session inactivity enforcement and recent-login geo/IP/device logging surfaced to users/admins.
- New-device authorization flow with trust/revoke tracking.
- Single-session enforcement with notify/approve/deny flow for concurrent login attempts.
Non-Goals (defer)¶
- MFA/OAuth providers remain disabled in production until Phase D completes and security signs off on TOTP rollout.
- Multi-region HA specifics and SOC2 audit runbooks (handled separately).
Virtuozzo API placement¶
- Create a dedicated client module under
backend/app/infrastructure/external/virtuozzo/(e.g.,client.py,auth.py,accounts.py) to organize multiple endpoints. Keepbackend/app/infrastructure/external/virtuozzo.pyas a thin facade if desired. - All Virtuozzo endpoints (including
https://app.mymightybox.io/1.0/users/authentication/rest/signin) must be env-driven; never hardcode. - Enforce
Owner-only access when requesting/refreshing Virtuozzo session keys.
Module placement (per 001-hybrid-modular-ddd)¶
- Interface/application for auth lives under
backend/app/modules/auth/(router, schemas, service, dependencies). - Persistence models remain under
backend/app/infrastructure/database/models/. - API wiring uses
api/v1to includemodules/auth/router.
Virtuozzo session key management (team-scoped, stable)¶
- Store one encrypted team-scoped Virtuozzo session key with metadata (
expires_at/last_refreshed) in Postgres; optionally cache in Redis with shorter TTL. - Refresh lazily only when the key is expired or near-expiry (e.g., within 5–10 minutes), never on every login. Use the Owner’s encrypted VZ creds to refresh; enforce per-team lock to avoid stampede.
- Serve all team members with the same valid session key server-side; never expose the key to clients. Members’ requests use the stored key; only Owner can trigger refresh if needed.
- Failure handling: on VZ 401/timeout, try one refresh; if still failing, surface a controlled error and prompt Owner to reauthenticate/update VZ creds.
- Bug/edge mitigations:
- Over-rotation: do not refresh per login; only on expiry/near-expiry with a single team-scoped key.
- Stampede: per-team mutex/lock around refresh; others reuse cached key.
- Stale key after failure: if refresh fails, invalidate cached key and require Owner reauth/update VZ creds; do not proceed with stale key.
- Ownership check: guard refresh path with Owner-only policy; members cannot trigger or view keys.
- Credentials drift: if stored encrypted VZ creds are rejected, bubble a clear error and force Owner reauth to replace creds.
- TTL mismatch: use configurable TTL/near-expiry window; refresh if
expires_at < now + safety_window. - Ownership transfer race: re-validate current Owner before refresh; lock per team to avoid old Owner overwriting after transfer.
- Partial write: treat refresh as an atomic update (write new key + metadata together); on failure, keep last-known-good key until replacement confirmed.
- Logging leakage: never log keys or VZ creds; redact sensitive fields in error logs.
- Cache/DB mismatch: Redis TTL shorter than VZ TTL; include
last_refreshed/version to detect stale cache vs DB. - Repeated 401s: after one failed refresh, drop the key and require Owner action instead of looping.
- Multi-team users: always scope lookup/lock by team_id to prevent cross-team key bleed.
- Time skew: include safety margin (e.g., 5–10 minutes) when checking expiry to avoid clock drift issues.
- VZ email lifecycle: Owner-only (re)creation; write VZ email + VZ UID + encrypted VZ password atomically with initial session key. Never log sensitive values; redact where necessary for support.
Keep all implementation choices traceable to the official references listed above; do not deviate from documented behaviors or unstated versions.
Alignment matrix vs team_management.md (special rules)¶
- Team creation: Only upgraded (
mightybox_billing) account can create a team → becomes Owner; enforced at team creation API and Door A registration. - Ownership transfer: Must exist and be policy-guarded; re-validate Owner before VZ refresh; block old Owner after transfer.
- Team status gating: Allow session exchange/refresh only if team status in {TRIALING, ACTIVE}; deny otherwise.
- Forked registration: Door A (owner) creates team; Door B (invite) joins existing team with invited role.
- Virtuozzo access: Only Owner can retrieve/refresh session keys; members use stored team key server-side.
- Billing dependency: Team creation and active usage require billing upgrade; maintain status and enforce on auth/refresh/VZ calls.
- SSE gating: After auth integration,
/eventsmust filter by membership/team_id; deny expired teams. - Invites: Single-use, expiring tokens; audit create/accept; respect role assignment on accept.
- New device/double-login: Require step-up for new device; single-session enforcement with approve/deny and token revocation.
Coverage / CI gates (minimum)¶
- Unit tests: security/JWT/crypto/helpers ≥95% coverage for core; fail CI under threshold.
- Integration tests: login → team select → refresh → logout; Door A/B flows; invite acceptance; SSE auth gate smoke; target ≥85% api/module coverage.
- E2E: Door A (with VZ session refresh), Door B invite accept, ownership transfer, new-device approval, single-session enforcement.
- Performance smoke: VZ refresh under contention (lock correctness) and auth rate-limit behavior.
- CI install: backend
pip install --no-deps -c backend/constraints.txt -e .; frontendnpm ci; run lint/type/tests; enforce coverage threshold.