Documentation Gap Audit — 2026-05-01¶
This is a systematic audit of OneWallet documentation against the source code as of move-to-tigerbeetle branch on 2026-05-01. It identifies "black holes" — areas where the codebase contains behavior, contracts, or operational concerns that are not (or only partially) reflected in the docs.
Severity legend
- 🔴 Critical — blocks development or causes compliance risk if undocumented
- 🟡 Important — developer will struggle without this
- 🟢 Nice to have — would improve developer experience
1. Payment Manager¶
POST /accounts/register-ipps not in API.md¶
Severity: 🔴 Critical
Where: projects/payment-manager/docs/API.md
What's missing: API.md has zero entry for POST /accounts/register-ipps. The endpoint is fully implemented (Zod body, idempotent 200/201, 403/404 errors, IPPS permission gate). It's the only way to make a user IPPS-capable, and IPPS_NOT_REGISTERED errors point users at it via the error message. Service consumers (Auth Center) will fail integration without this doc.
Source evidence: projects/payment-manager/src/accounts/register-ipps.ts (~118 lines, full Fastify plugin). src/server.ts registers it. src/intent/errors.ts references it.
Suggested doc: Add a section to projects/payment-manager/docs/API.md between POST /accounts and GET /accounts/:name/balance. Include the full body schema (externalUserId, customerEnglishName, senderTaxId, senderAccountName, senderMobileNumber, documentExpiry, dateOfBirth, phoneNumber, nationality, address), 201/200 idempotency semantics, and the 403 (lacks IPPS permission) and 404 (no USER_WALLET) responses.
POST /admin/intents/:id/resolve only in OPERATIONS.md, not in API.md¶
Severity: 🟡 Important
Where: projects/payment-manager/docs/API.md
What's missing: The resolve endpoint is documented as a runbook (OPERATIONS.md), but it's a real HMAC-signed admin endpoint with body schema, 200/201/403/404/422 semantics, forceResolve permission gate, and side-effects (post_pending vs void_pending outbox emission). It belongs in API.md alongside /admin/fee-rules/dry-run.
Source evidence: projects/payment-manager/src/admin/resolve-intent.ts
Suggested doc: Add an Admin Endpoints subsection to API.md covering both /admin/fee-rules/dry-run and /admin/intents/:id/resolve, with explicit cross-references to OPERATIONS.md for the runbook.
OperationType registry / OperationTypeDefinition has no narrative docs¶
Severity: 🟡 Important
Where: PM TECHNICAL.md (or new projects/payment-manager/docs/OPERATION-TYPES.md)
What's missing: Module A introduced OperationTypeDefinition (parseBody + resolveAccounts + optional preValidate) as the canonical extension point for new operation types. Adding a new operationType (e.g. crypto withdrawal, off-ramp, agent cashout) is now a self-contained file in src/operation-types/. Nowhere is this developer flow documented; only the per-module list in TECHNICAL.md hints at it.
Source evidence: src/intent/operation-type.ts, src/intent/operation-registry.ts, all 5 files under src/operation-types/. Plan: docs/superpowers/plans/2026-05-01-pm-modules-a-b.md.
Suggested doc: Either expand TECHNICAL.md "Module Map" with a "How to add an operationType" section, or create projects/payment-manager/docs/OPERATION-TYPES.md with: OperationTypeDefinition interface, BaseIntentBody, resolveAccounts contract, preValidate use cases, and an end-to-end example.
payment_route table semantics under-documented¶
Severity: 🟡 Important
Where: projects/payment-manager/docs/DATABASE.md and ARCHITECTURE.md
What's missing: pm.payment_route is described as a table but the routing semantics are buried in one line: "Resolver picks highest matching amount_min (ORDER BY amount_min DESC LIMIT 1)." No discussion of: how active=false is used (kill-switch), what happens if no row matches (NO_ROUTE error), how amount tiers work in practice, what changing a route does to in-flight intents. Also missing: a worked example showing P2P_TRANSFER vs IPPS_WITHDRAWAL routing decisions.
Source evidence: src/intent/router.ts (resolveChannel), drizzle/seed.ts (seed routes).
Suggested doc: Expand DATABASE.md pm.payment_route section with a "Routing Semantics" subsection covering: tier resolution by amount_min, fallback when no match (NO_ROUTE), how to gracefully retire a channel (active=false), and seeded defaults.
Rate limiting / concurrency control: no rate limits exist, no doc says so¶
Severity: 🟡 Important
Where: projects/payment-manager/docs/API.md and SECURITY.md
What's missing: PM has no application-level rate limiting (no @fastify/rate-limit, no token bucket). Concurrency control exists only at the worker level (PspWorker FOR UPDATE SKIP LOCKED with leases; OutboxWorker explicitly single-instance). Docs do not state this clearly — a security reviewer or new dev will assume PM has rate limits because nginx does. The reality (rate limiting happens at nginx; PM trusts HMAC + service permissions; OutboxWorker MUST be a singleton) needs a callout.
Source evidence: package.json has no rate-limit plugin; no rateLimit* symbol in src/. WORKERS.md correctly says outbox-worker is single-instance and PspWorker is multi-instance with lease semantics, but this is not surfaced in API.md or SECURITY.md.
Suggested doc: Add a "Rate Limiting & Concurrency" section to API.md (or to TECHNICAL.md "Key Invariants"): "PM has no application-level rate limiting; nginx limit_req enforces per-IP rate limits at the edge. PM-internal concurrency control is per-worker (outbox-worker singleton; psp-worker SKIP-LOCKED multi-instance)."
Quote endpoint metadata-driven fee tags not explained¶
Severity: 🟢 Nice to have
Where: projects/payment-manager/docs/API.md § POST /intents/quote
What's missing: API.md shows the request body has metadata and one example uses {"tags":["vip"]}. But it doesn't explain that metadata.tags is the input to the fee_rule tags_include/tags_exclude matcher, nor what other metadata keys the rule engine sees. Without this, a caller can't predict why fees differ.
Source evidence: src/rule-engine/fee-calculator.ts, src/rule-engine/evaluator.ts. FEE-RULES.md has the rule side; it isn't linked from API.md.
Suggested doc: Add a one-paragraph note in POST /intents/quote describing how metadata flows into the fee evaluator and link to FEE-RULES.md.
tb_account_map.id (surrogate sequential) — usage rationale buried in schema comment¶
Severity: 🟢 Nice to have
Where: PM DATABASE.md / Auth Center PM-INTEGRATION.md
What's missing: The surrogate id exists only because Serverpod ORM needs an int id PK on the public.v_user_tb_accounts view. This is a non-obvious cross-service constraint. DATABASE.md mentions it but doesn't explain why — devs adding new shared views will not know.
Source evidence: src/shared/schema.ts:55-58 (comment present); not in DATABASE.md narrative.
Suggested doc: One-paragraph note in DATABASE.md "Surrogate IDs and Serverpod views" linking to the relevant .spy.yaml.
metadata.recipientUserId injection at MINIAPP_CHARGE / SERVICE_DEPOSIT¶
Severity: 🟢 Nice to have
Where: API.md
What's missing: SERVICE_DEPOSIT requires metadata.recipientUserId (positive int). API.md has the row in the requirements table but no explanation of why metadata vs top-level recipientUserId is used (answer: to avoid SERVICE_DEPOSIT ambiguity with sender userId from X-User-Id).
Source evidence: src/operation-types/service-transfer.ts
Suggested doc: Short note in operationType-specific rules table.
2. Serverpod / Auth Center¶
Mini-app catalog: announced in TECHNICAL.md but no model, no endpoint, no doc¶
Severity: 🟡 Important
Where: projects/onewallet_base/docs/TECHNICAL.md and ARCHITECTURE.md
What's missing: TECHNICAL.md says "Mini-app catalog: [check source code]". But there is no miniapp.spy.yaml model, no endpoint, no service in lib/src/ for it. ARCHITECTURE.md and COMPLIANCE.md (OC-09) describe mini-app merchant onboarding as Phase 2A (planned). The current state — "not yet implemented" — should be stated explicitly so devs don't waste time hunting for it.
Source evidence: ls projects/onewallet_base/onewallet_base_server/lib/src/models/ — no miniapp model. grep -r 'miniapp\|mini-app' lib/src/ — no hits.
Suggested doc: Update TECHNICAL.md to say "Mini-app catalog: planned for Phase 2A — see docs/superpowers/specs/2026-04-23-miniapp-merchant-design.md. Not yet implemented." Same in ARCHITECTURE.md.
Agent / referral module: documented in flows/diagrams but not implemented¶
Severity: 🟡 Important
Where: projects/onewallet_base/docs/AUTH.md (registration flow Step 4) and docs/diagrams/agent-referral-flow.*
What's missing: AUTH.md says "Step 4 — Validates agent code via AgentService" and there's an agent-referral-flow.puml diagram. But there's no AgentService, no agent.spy.yaml model, no agent_code column anywhere. RBAC table lists agent role with "Agent portal only" but no portal exists. This contradicts the architecture (which only mentions user/admin roles) and the actual code.
Source evidence: grep -r 'AgentService\|agent_code\|referralCode' lib/src/ — empty. register_referral_screen.dart exists in Flutter but no backend.
Suggested doc: Either remove the Step 4 / agent references from AUTH.md (and archive the diagram), or add a "Status: Planned, not implemented" prefix. Update RBAC table footnotes.
PIN change / reset / forgotten-PIN flow: completely undocumented¶
Severity: 🔴 Critical
Where: projects/onewallet_base/docs/AUTH.md
What's missing: AUTH.md describes PIN setup and lockout, but there is no documentation of: PIN change (when user knows current PIN), PIN reset (forgot-PIN — does it require password? OTP? KYC re-verification?), what happens to biometric enrollment when PIN changes, what happens to active sessions when PIN resets. This is a regulatory / BOT-relevant flow.
Source evidence: register_password_screen.dart, setup_pin_screen.dart, pin_unlock_screen.dart exist; forgot_password_screen.dart and forgot_password_otp_screen.dart exist but only address password, not PIN. PinNotifier supports failedAttempts and lockoutUntil but not change/reset.
Suggested doc: Add PIN Change and PIN Reset sections to AUTH.md describing current behavior (or "Not implemented — tracked in TODO" if absent), and any lockout escalation when PIN is forgotten.
Biometric re-enrollment / revocation flow not documented¶
Severity: 🟡 Important
Where: AUTH.md
What's missing: AUTH.md says biometric key TTL is "30–60 days" and is "invalidated on logout, revoke, or password change." It doesn't describe: how the user re-enrolls after expiry, the UX flow when biometric is disabled at the OS level after enrollment, how PIN-fallback transitions back to biometric, whether server tracks biometric state at all (it doesn't seem to — Flutter uses flutter_secure_storage).
Source evidence: setup_biometric_screen.dart, auto_lock_service.dart. AUTH.md has the high-level info but not the lifecycle.
Suggested doc: Expand AUTH.md "Biometric" section with re-enrollment flow, OS-level revocation handling, server-side state (or lack thereof).
3-phase user lifecycle: timers documented but configurable values & override paths not¶
Severity: 🟡 Important
Where: AUTH.md "User Lifecycle (3 phases)"
What's missing: AUTH.md states timer values (Day 0–7 active, Day 7 archive, Day 30 hard delete) and lists config keys but does NOT explain: where the config is actually loaded (FutureCall vs .env vs auth.yaml), how allow_reactivation_from_archive works in practice, what fields are nulled at archive vs hard-delete, how the 09:00/02:00/Mon-03:00 cron schedule is enforced (Serverpod FutureCall? cron?), and whether financial records (pm.intent, pm.tx_history) are touched on hard delete. This is a PDPA / BOT-7-year-retention crossover.
Source evidence: lib/src/future_calls/registration_cleanup.dart, services/registration_service.dart. AUTH.md is reasonable for happy path but lacks ops detail.
Suggested doc: Expand "User Lifecycle" section with: actual config source, what data archive vs hard-delete touches, retention of pm.tx_history rows after hard delete (PDPA aspect), and the cleanup-job implementation (FutureCall name + schedule).
Serverpod sessions: client-vs-server-session distinction missing¶
Severity: 🟡 Important
Where: AUTH.md and TECHNICAL.md
What's missing: AUTH.md says "Server-side sessions stored in Redis via Serverpod Cache" but doesn't distinguish: Serverpod's Session (per-RPC), AuthKey/AuthSecret in DB, JWT (access_token / refresh_token), and the Redis pubsub session for streaming endpoints. The 4 concepts are conflated. This causes confusion when reading the code.
Source evidence: Serverpod serverpod_auth_idp package; lib/src/auth/jwt_refresh_endpoint.dart. The skill serverpod-sessions covers this concept generally.
Suggested doc: Add "Session Types" section to AUTH.md (or SERVERPOD-PATTERNS.md) clarifying Session (request-scoped) vs AuthKey (persistent identity) vs JWT (transport credential) vs Redis stream session.
Flutter state management approach: only mentioned in 1 line¶
Severity: 🟡 Important
Where: projects/onewallet_base/docs/TECHNICAL.md
What's missing: TECHNICAL.md says "State management via Riverpod" but doesn't explain the Notifier pattern used (PinNotifier, AuthNotifier, WalletNotifier, etc.), the provider naming convention, how Riverpod scopes interact with go_router, where the Serverpod Client is provided. There are 7 providers in lib/providers/ — no doc covers them.
Source evidence: onewallet_base_flutter/lib/providers/ contains: auth_provider, kyc_provider, payment_provider, phone_resolve_provider, pin_provider, transaction_provider, wallet_provider.
Suggested doc: Either a new projects/onewallet_base/docs/FLUTTER.md or expand TECHNICAL.md with a "Flutter Architecture" section covering Riverpod conventions, provider registry, and Serverpod-Client provider pattern.
Push-notification preferences / opt-out / per-channel mute: not implemented and not declared¶
Severity: 🟡 Important
Where: AUTH.md / new notifications.md
What's missing: PushText templates exist for payments, security, kyc, system channels but there is no user preferences model, no notification_preference table, no opt-out endpoint, no per-channel mute toggle. BOT or Apple/Google policies typically require this for marketing pushes. Currently silent — risk of compliance gap not even being on the radar.
Source evidence: grep -r preference\|opt.out returns nothing in Serverpod or notifications-service.
Suggested doc: Either implement & document, or explicitly state in AUTH.md / projects/notifications-service/docs/ the policy: "All users currently receive all notification channels by default. Per-channel preferences not implemented — see TODO X." Compliance officer should sign off.
OTP flow: rate limit and delivery channel SMS-vs-email confusion¶
Severity: 🟢 Nice to have
Where: AUTH.md
What's missing: AUTH.md describes OTP via SMTP (email). COMPLIANCE.md and BOT 18/2568 1.1 reference SMS OTP. No clear statement about whether SMS OTP is in scope at all.
Source evidence: email_idp_endpoint.dart and serverpod_auth_idp are email-only.
Suggested doc: One-line clarification in AUTH.md that current OTP transport is email only; SMS OTP (if planned) is explicitly out of scope for current phase.
3. Admin Panel¶
Admin login flow not documented¶
Severity: 🟡 Important
Where: TECHNICAL.md
What's missing: Admin panel uses email/password (Serverpod-issued tokens) gated by role check (superadmin, operator, finance, support). 8-hour cookie + 14-day refresh cookie pattern. None of this is in TECHNICAL.md — only /nginx/auth/admin is mentioned in passing.
Source evidence: projects/admin-panel/src/routes/login/+page.server.ts. +layout.server.ts has session checks.
Suggested doc: Add "Admin Authentication" section to TECHNICAL.md covering: the four ADMIN_ROLES, login form action, cookie setup (admin_token 8h / admin_refresh_token 14d), and where validateToken() calls the Serverpod backend.
Admin RBAC matrix not documented in admin-panel docs¶
Severity: 🟡 Important
Where: TECHNICAL.md
What's missing: AUTH.md (Serverpod) has the role matrix. Admin Panel docs reference roles in passing but don't document which UI pages/actions each role can access. A finance user vs operator user — what can each see? +layout.server.ts only counts pending_operator_review rows; the actual permission checks are elsewhere.
Source evidence: Login server filters to superadmin/operator/finance/support. Per-page permission gates not surveyed.
Suggested doc: Add an "RBAC by Page" matrix to admin-panel TECHNICAL.md: rows = pages, columns = roles, values = read/write/none.
Direct DB read paths: which routes hit pool.query vs API¶
Severity: 🟢 Nice to have
Where: TECHNICAL.md
What's missing: TECHNICAL.md says "some page loaders read PostgreSQL directly … will be removed in the redesign" but doesn't enumerate which. A new contributor working on KYC vs ledger pages will have to grep for pool.query to know what's transitional.
Source evidence: +layout.server.ts queries kyc_verification directly.
Suggested doc: Add a checklist to TECHNICAL.md "Current Limitations" section listing every route that uses pool directly.
4. KYC Service¶
CompreFace setup / configuration not in DEPLOYMENT.md¶
Severity: 🟡 Important
Where: DEPLOYMENT.md (or new INFRASTRUCTURE.md)
What's missing: INTEGRATIONS.md describes CompreFace API usage (endpoint, request, error mapping, threshold). It does not describe how to set up CompreFace: which Docker image / version, how the named volume is mounted, what GPU vs CPU mode does, how to provision the COMPREFACE_API_KEY (CompreFace UI? CLI?), and the initial-bootstrap step. CLAUDE.md mentions the volume requirement but not the setup.
Source evidence: src/services/facerecog.js. No CompreFace bootstrap script in scripts/.
Suggested doc: Add "CompreFace Setup" section to DEPLOYMENT.md: docker-compose snippet, volume creation, API key provisioning, ArcFace model selection.
Gemini OCR/face prompts: located in source — no link from docs¶
Severity: 🟢 Nice to have
Where: INTEGRATIONS.md
What's missing: INTEGRATIONS.md says "OCR prompt (OCR_PROMPT in src/services/prompts.js)" but doesn't quote it. Compliance reviewers and prompt-engineers should see the exact text without grepping. The prompt file is small enough (~60 lines) to either inline or link with a permalink.
Source evidence: src/services/prompts.js (full text exists)
Suggested doc: Inline both OCR_PROMPT and FACE_PROMPT in INTEGRATIONS.md (or in TECHNICAL.md "Processing Pipeline" subsection) so any change is visible in PR diffs.
KYC OCR result schema: list of fields not enumerated in docs¶
Severity: 🟡 Important
Where: INTEGRATIONS.md or TECHNICAL.md
What's missing: Docs say OCR returns "structured JSON" with passport fields but doesn't list the canonical schema (fullName, dateOfBirth, nationality, countryOfResidence, nationalIdNumber, address, dateOfIssue, dateOfExpiry, gender). Auth Center stores this in kyc_verification.ocrResult (encrypted). Operators editing fields in Admin Panel need to know the exact list.
Source evidence: src/services/prompts.js OCR_PROMPT schema; kyc_ocr_result.spy.yaml model.
Suggested doc: Add "OCR Result Schema" table to KYC docs (kyc-service or onewallet_base) listing all 9 fields, types, nullability, and which are PII-sensitive.
Retry / resubmission flow split-brain: kyc-service vs Serverpod¶
Severity: 🟡 Important
Where: Either projects/onewallet_base/docs/KYC.md or TECHNICAL.md
What's missing: kyc-service docs cover BullMQ retry (5 attempts, exponential backoff). Serverpod KYC.md describes user-side retry options (retryKycProcessing keeps S3 keys; pending requires retake). What's NOT documented: the interaction. If retryKycProcessing is called while BullMQ has the job in retry, what happens? What's the deduplication semantics across the boundary (jobId = kyc-{kycId} is mentioned in kyc-service but not its implication for Serverpod retries)?
Source evidence: KycQueueService.submitKycJob and BullMQ jobId collision handling.
Suggested doc: Add "End-to-End Retry Semantics" subsection covering: dedupe by jobId, what UnrecoverableError does cross-boundary, and how user retake (status → pending) interacts with in-flight jobs.
5. Notifications¶
Notification template content / format: only in Dart source¶
Severity: 🟡 Important
Where: projects/notifications-service/docs/ and projects/onewallet_base/docs/
What's missing: Templates live in notification_templates.dart (Russian, Thai, English) — newDeviceLogin, passwordChanged, kycFullyVerified, transferReceived, etc. The notifications-service docs explicitly say "Publishers are responsible for localizing title and body" but neither service docs reference the template registry or list the available templates. Marketing/product/compliance can't review what users see.
Source evidence: projects/onewallet_base/onewallet_base_server/lib/src/services/notification_templates.dart
Suggested doc: Add projects/onewallet_base/docs/NOTIFICATIONS.md with: full template catalog (template name, locales supported, sample text), publish flow, and "Phase 2: editable templates in Admin Panel" stub.
Notification types / channel taxonomy not in service docs¶
Severity: 🟢 Nice to have
Where: INTEGRATIONS.md
What's missing: The four channels (payments | security | kyc | system) are defined in passing in INTEGRATIONS.md and the SQL schema. There's no narrative: what each channel is for, what notification types map to which channel, what the Android/iOS channel mapping looks like (FCM channelId is set to channel), or BOT/PDPA classification of each.
Source evidence: notification_publisher.dart and notification_log.channel column.
Suggested doc: Add "Channels and Categories" section to notifications-service INTEGRATIONS.md.
Notification opt-out / preferences: not implemented and not declared¶
Severity: 🟡 Important (also listed in §2) See: §2 "Push-notification preferences / opt-out / per-channel mute".
6. Cross-Cutting Gaps¶
nginx configuration: route table is documented; actual config not in repo¶
Severity: 🔴 Critical
Where: docs/ARCHITECTURE.md (table) and missing nginx config
What's missing: ARCHITECTURE.md has the route table (paths, auth_request, upstreams) but the actual nginx config file is not in the repo — there's no nginx.conf, no nginx/ directory, no infra-compose snippet. Auth_request endpoints (/nginx/auth, /nginx/auth/admin) are referenced everywhere but their server-side handlers are not in endpoints/ either. This is the single biggest "black hole" — anyone wanting to reproduce the production setup can't.
Source evidence: find . -name 'nginx*' -not -path '*/node_modules/*' returns nothing.
Suggested doc: Either commit a sample infra/nginx/nginx.conf or add docs/guides/nginx-setup.md with the full config (rate limits, auth_request blocks, HMAC header injection at /api/pm/*, limit_req zones, IP allowlists for /webhooks/ipps).
docker-compose is fragmented, no top-level orchestrator documented¶
Severity: 🔴 Critical
Where: docs/guides/local-dev.md (does not exist)
What's missing: There are TWO compose files: projects/payment-manager/docker-compose.yml (Postgres+PgBouncer+TigerBeetle+Redis+PM) and projects/blnkfinance-service/docker-compose-blnkfinance.yml (legacy ledger). There is no root compose that starts the full system: Auth Center + PM + KYC + Notifications + Admin + nginx + Garage + Typesense + CompreFace + Loki/Grafana. AUTH/DEPLOYMENT docs reference localhost:3900-3901 Garage and localhost:8108 Typesense but no compose is committed. This makes local dev a multi-hour archaeology exercise.
Source evidence: Only two compose files exist.
Suggested doc: Either commit a top-level docker-compose.yml (or infra/docker-compose.yml) and document its services, or write docs/guides/local-dev.md covering: which services to start in which compose file, in what order, and how they interconnect on Docker networks.
Local development end-to-end setup: spread across 5 docs¶
Severity: 🔴 Critical
Where: docs/guides/local-dev.md (does not exist)
What's missing: "How do I get the system running locally?" requires reading: PM DEPLOYMENT.md, Serverpod DEPLOYMENT.md, KYC DEPLOYMENT.md, Notifications DEPLOYMENT.md, Admin TECHNICAL.md. No single doc says: "1. Clone, 2. Bring up Postgres+Redis+TB+Garage+CompreFace+Typesense, 3. Run migrations in this order, 4. Seed in this order, 5. Start services, 6. Verify with these curl commands."
Source evidence: No LOCAL-DEV.md, QUICKSTART.md, or top-level compose-driven guide exists.
Suggested doc: Create docs/guides/local-dev.md as a numbered runbook. Reference the per-service DEPLOYMENT docs for details but provide the single happy-path sequence.
CI/CD pipeline: only kyc-service has GitLab CI; others have GitHub Actions; not documented¶
Severity: 🟡 Important
Where: docs/guides/ci-cd.md (does not exist)
What's missing: projects/kyc-service/.gitlab-ci.yml (build only, deploy disabled) and projects/onewallet_base/.github/workflows/{format,analyze,tests}.yml exist. No PM/admin/notifications CI. No documentation of which pipelines run on which branches, what envs are deployed where, what SSH_PRIVATE_KEY/SSH_HOST deploy mode looks like end-to-end, or any cross-service release coordination.
Source evidence: Mixed CI in different folders, both Gitlab and GitHub.
Suggested doc: New docs/guides/ci-cd.md covering: GitLab vs GitHub usage per service, deploy targets, branch policy, secrets management, version coordination across services (CHANGELOG/VERSION rules from CLAUDE.md).
Monitoring / Loki / Grafana: referenced but not explained¶
Severity: 🟡 Important
Where: docs/guides/observability.md (does not exist)
What's missing: SECURITY.md mentions "Fluent Bit → Loki → Grafana" with baseline alerts to be configured "in Month 1 Week 4, task 4.7." OPERATIONS.md (PM) references docker logs pm greps. PM emits structured pino events (state_transition, manual_review_required, balance_drift, low_partner_balance). There is no doc tying these to actual Grafana dashboards or alert rules. SLOs / SLIs are not documented.
Source evidence: src/workers/balance-monitor.ts emits structured events; no monitoring/ or grafana/ directory.
Suggested doc: New docs/guides/observability.md listing: each event taxonomy tag, recommended Grafana panel, alert threshold, and on-call escalation. Should track SECURITY.md task 4.7.
Garage (S3) setup: not documented¶
Severity: 🟡 Important
Where: docs/guides/storage.md or kyc-service DEPLOYMENT.md
What's missing: Garage is referenced in 6 docs but there's no setup runbook: garage container start, layout init (Garage requires garage layout assign+apply on first boot), bucket creation (kyc-data), forcePathStyle: true quirk, key/secret generation, lifecycle policy (which is in compliance open items but not in setup). The localhost:3900–3901, 3903 ports reference is only in onewallet_base DEPLOYMENT.md.
Source evidence: src/services/s3.js (kyc-service); storage_service.dart (Serverpod).
Suggested doc: New docs/guides/storage.md with full Garage bootstrap, bucket layout, key naming convention, presigned URL TTL conventions, and link to the upcoming S3 lifecycle policy.
Typesense: declared in TECHNICAL.md, no usage documented¶
Severity: 🟢 Nice to have
Where: docs/guides/search.md or onewallet_base TECHNICAL.md
What's missing: Serverpod TECHNICAL.md and DEPLOYMENT.md mention Typesense (localhost:8108) but no service uses it (no typesense import grep hit). It's listed as "(optional)" — but is it scaffolded in compose? Is it a future feature? Nothing says so.
Source evidence: grep -r typesense lib/src/ returns no source code.
Suggested doc: One-line clarification: "Typesense is provisioned in compose for future search features (Phase 3?) — no service currently uses it."
Redis / Valkey configuration: persistence and memory limits not documented¶
Severity: 🟡 Important
Where: docs/guides/redis.md or service DEPLOYMENT docs
What's missing: Notifications-service CLAUDE.md says maxmemory-policy noeviction is required for BullMQ. Redis docker-compose in PM uses redis-server --requirepass with no maxmemory or persistence flags. There's no doc explaining: which services need persistence, AOF vs RDB choice, eviction policy implications (noeviction means OOM if you don't size for streams), or the single-Redis vs separate-instance topology decision.
Source evidence: PM docker-compose (no maxmemory); kyc-service uses noeviction; notifications uses noeviction.
Suggested doc: New docs/guides/redis.md covering: per-service Redis usage (PM pubsub, KYC BullMQ queue, notifications stream, Serverpod sessions+cache), persistence requirements per use, recommended maxmemory and policy, single-instance vs separate-instance trade-offs, and password/ACL setup.
7. Operations Gaps¶
Developer onboarding procedure: doesn't exist¶
Severity: 🔴 Critical
Where: docs/guides/onboarding.md (does not exist)
What's missing: No "Day 1 / Day 2 / First PR" runbook for a new dev. Skills, accounts (GitHub/GitLab/Loki), tools to install, .env templates, who to ask for credentials, where to find the latest plan/spec. CLAUDE.md/SKILLS_NEEDED.md has some of this but it's Claude-Code-oriented, not human-oriented.
Source evidence: No onboarding doc at root or in docs/guides/.
Suggested doc: New docs/guides/onboarding.md. Reference SETUP.md, DEVELOPMENT.md, TECH_STACK.md but provide the linear day-1 sequence.
Production deployment procedure: not documented¶
Severity: 🔴 Critical
Where: docs/guides/deploy-production.md (does not exist)
What's missing: Per-service DEPLOYMENT.md docs cover env vars and Docker. None covers: production VPS layout (Thai data residency open question per COMPLIANCE.md G-9), TLS certificate provisioning, production secrets management (Vault? Docker Secrets?), production migration strategy (PM blue/green? Serverpod downtime?), health-check gating before traffic shift, and the cross-service deploy order.
Source evidence: Per-service DEPLOYMENT.md exists; no production-deploy guide.
Suggested doc: New docs/guides/deploy-production.md covering deploy order: Postgres migrations (Serverpod first, then PM via drizzle), TigerBeetle (no migrations), Redis (config), then services (PM api, PM workers, Serverpod, Admin, KYC, Notifications, nginx). Cite SECURITY.md for secrets management.
Rollback procedure: not documented¶
Severity: 🔴 Critical
Where: docs/guides/rollback.md (does not exist)
What's missing: What if a release breaks production? PM has irreversible Drizzle migrations (forward-only); Serverpod migrations are forward-only too. TigerBeetle is append-only. There is no documented rollback procedure for: (a) buggy code (revert image, restart) — easy; (b) migration that broke a column — hard; (c) a poisoned outbox event — needs SQL surgery. Ops needs this.
Source evidence: OPERATIONS.md (PM) covers diagnostics but not rollback.
Suggested doc: New docs/guides/rollback.md with: code-only rollback, migration rollback strategies (forward-fix vs serverpod_migrations revert), data hot-fix workflows (poisoned outbox, stuck intent).
Database backup / restore: not documented¶
Severity: 🔴 Critical
Where: docs/guides/backup.md (does not exist)
What's missing: No doc covers: PostgreSQL backup strategy (pg_dump of public.* and pm.*, encrypted PII implications), retention (7 years for financial per COMPLIANCE.md G-5), where backups are stored, RPO/RTO, restore drill cadence, point-in-time recovery (WAL archiving), and how to test a restore without leaking decrypted PII to a dev environment.
Source evidence: Nothing in docs/.
Suggested doc: New docs/guides/backup.md covering pg_dump cadence, encryption-at-rest of backups, restore-test cadence, PII-aware staging restore.
TigerBeetle backup / restore: explicitly not documented¶
Severity: 🔴 Critical
Where: docs/guides/tigerbeetle-backup.md (does not exist)
What's missing: TigerBeetle uses a single data file format (replicated cluster in production). Backup strategy for the data file (which the docker-compose puts on volume tb-data) is not documented. Recovery scenarios: corrupted file, accidental delete, cluster loss. Given TB account IDs and transfer IDs are deterministic, partial reconstruction is theoretically possible from pm.intent + pm.outbox_event — not documented.
Source evidence: projects/payment-manager/scripts/tb-init.sh formats the file; no backup script.
Suggested doc: New docs/guides/tigerbeetle-backup.md covering: cluster topology (single replica in dev, N-replica in prod), data file backup (or "backup is replication"), disaster recovery from pm.intent (theoretical), and the deterministic-ID property's role in audits.
8. Compliance Gaps (code vs COMPLIANCE.md)¶
Hardcoded thresholds in code that COMPLIANCE.md doesn't mention¶
Severity: 🟡 Important Where: COMPLIANCE.md and PM DEPLOYMENT.md What's missing:
IPPS_DRIFT_THRESHOLD_SATANG = 10000(100 THB) — alert threshold for nostro vs IPPS partner drift. COMPLIANCE.md G-8 (AML monitoring) doesn't mention this is an existing partial mitigation.IPPS_LOW_BALANCE_THRESHOLD_SATANG = 100000(1000 THB) — low-balance alert. Not in COMPLIANCE.md.- HMAC replay window
±60s— security tolerance not in SECURITY.md HMAC table. - KYC service
FACE_MATCH_THRESHOLD = 0.75— COMPLIANCE.md "Compliance Architecture Notes" cites≥0.85 auto-approve, 0.50–0.84 manual review, <0.50 reject(this is the Gemini-era policy). The actual production code uses CompreFace at0.75single-threshold. This is a contradiction — COMPLIANCE.md is wrong. - BullMQ retries
attempts=5, backoff=15s..240s— KYC service config that is also a compliance/operational parameter (failed-KYC SLA), not in COMPLIANCE.md.
Source evidence: projects/payment-manager/src/shared/config.ts; projects/kyc-service/src/services/facerecog.js; projects/onewallet_base/onewallet_base_server/config/development.yaml; PM DEPLOYMENT.md.
Suggested doc: Update COMPLIANCE.md "Compliance Architecture Notes" face-match section to match production threshold (0.75 single-threshold, no auto-approve tier — operator review for all). Add a "Hardcoded thresholds" appendix table to COMPLIANCE.md (or SECURITY.md) listing every threshold/tolerance with source file path.
pm.intent_event.payload PII issue: declared open in COMPLIANCE.md but no mitigation timeline¶
Severity: 🟡 Important
Where: COMPLIANCE.md OC-04
What's missing: OC-04 documents the issue (PSP PII leaking into pm.intent_event.payload in plaintext). No estimated resolution date, no responsible owner, no link to a tracking spec/plan. This is a live PDPA risk.
Source evidence: pm.intent_event.payload in DATABASE.md.
Suggested doc: Add owner + timeline + tracking link to OC-04.
CompreFace (face matching) replaces Gemini face-match — COMPLIANCE.md still shows Gemini¶
Severity: 🟡 Important
Where: COMPLIANCE.md "Compliance Architecture Notes" → "KYC Pipeline"
What's missing: That section says "Face match via Google Gemini AI." Production uses CompreFace (ArcFace) per kyc-service TECHNICAL.md and CLAUDE.md. The Gemini face-match implementation is a fallback only.
Source evidence: src/services/facerecog.js (production path); src/services/gemini.js matchFaces (fallback).
Suggested doc: Update COMPLIANCE.md to: "Face match via CompreFace (ArcFace embeddings). Fallback Gemini implementation is dev-only."
IPPS SIT TLS workaround NODE_TLS_REJECT_UNAUTHORIZED=0 — production policy not codified¶
Severity: 🔴 Critical
Where: COMPLIANCE.md OC-10 (covers it) and SECURITY.md
What's missing: OC-10 is the "open question" — but SECURITY.md doesn't reference this anywhere. Anyone reading SECURITY.md (which says "TLS 1.2+, valid certs only" for IPPS) will not know the SIT workaround exists. Production deploy guide doesn't have a "fail loud if NODE_TLS_REJECT_UNAUTHORIZED=0 is set in production" check.
Source evidence: OC-10 in COMPLIANCE.md.
Suggested doc: Add cross-reference from SECURITY.md "External Traffic" table to OC-10. Add CI/deploy guard ensuring NODE_TLS_REJECT_UNAUTHORIZED is never set in production env.
Summary¶
Counts by severity¶
- 🔴 Critical: 9
- POST /accounts/register-ipps not in API.md
- PIN change/reset/forgotten flow undocumented
- nginx config not in repo
- No top-level docker-compose
- No local-dev end-to-end runbook
- No developer onboarding doc
- No production deployment runbook
- No rollback procedure
- No DB / TigerBeetle backup/restore docs (counted as 1 critical for combined backup gap; the IPPS TLS workaround production guard is also critical)
Plus: IPPS TLS production policy missing safety guard.
- 🟡 Important: 22
- PM: resolve-intent in API.md; OperationType registry guide; payment_route routing semantics; rate limit policy callout
- Serverpod: mini-app catalog status; agent module status; biometric re-enrollment; user-lifecycle config; session types; Flutter state; notification preferences; KYC OCR schema; KYC retry semantics
- Admin: login flow; RBAC matrix
- KYC: CompreFace setup
- Notifications: template content; opt-out
- Cross-cutting: CI/CD; observability; Garage S3; Redis config
-
Compliance: hardcoded thresholds; intent_event.payload owner; CompreFace correction
-
🟢 Nice to have: 8
- PM: quote tags, surrogate id rationale, miniapp metadata
- Serverpod: SMS-vs-email OTP clarity
- Admin: direct DB read paths checklist
- KYC: prompts inlined
- Notifications: channels taxonomy
- Cross-cutting: Typesense status
Top 5 most critical gaps to address first¶
- No nginx config in repo — this is the root of the public ingress and there's nothing to copy. Anyone setting up staging/prod is blocked.
- No top-level docker-compose / local-dev runbook — a new dev cannot bring the system up end-to-end without reverse-engineering 5+ docs.
- PIN change/reset/forgotten flow is undocumented — regulatory (BOT) requirement; user UX; and the actual implementation status is unknown from docs.
- POST /accounts/register-ipps missing from PM API.md — blocking integration for Auth Center and any service trying to make users IPPS-capable; the error message points users at this endpoint but the contract is undiscoverable.
- Backup/restore (PostgreSQL + TigerBeetle) and rollback procedures undocumented — single hardware failure in production becomes an unrecoverable incident.
Recommended order to close: 1 → 2 → 4 (these unblock day-to-day work), then 3, then 5 (these are pre-production-launch blockers).