Error Handling — Scenarios and Resolution¶
| Scope | Payment Manager, Auth Center, PSP adapters |
| Last updated | 2026-05-01 |
Error Taxonomy¶
| Layer | Error Type | HTTP | Code |
|---|---|---|---|
| Client | Invalid input | 400 | BAD_REQUEST |
| Auth | HMAC invalid | 401 | UNAUTHORIZED |
| Auth | JWT invalid / expired | 401 | — (nginx rejects) |
| Auth | Service key inactive or operationType forbidden | 403 | FORBIDDEN |
| Business | Rule violation (limit, KYC tier) | 422 | VALIDATION_ERROR |
| Business | Account not found | 422 | ACCOUNT_NOT_FOUND |
| Business | No route for operationType | 400 | NO_ROUTE |
| Business | Intent not found | 404 | NOT_FOUND |
| Infrastructure | TigerBeetle error | 500 | TB_TRANSFER_ERROR |
| Infrastructure | Service unavailable | 503 | — |
| Data | Duplicate intent (idempotency replay) | 200 | — (returns existing) |
Payment Intent Errors¶
Insufficient Funds¶
- Where caught: Saga → TigerBeetle
createTransfers()atAUTHORIZEDstep → TB returnsEXCEEDS_CREDITS(orEXCEEDS_DEBITSon transit account flags). - Intent state → FAILED
- TB state: Nothing committed — the linked transfer batch is atomic; TB rejects the entire batch, no state change in any account.
- User message: "Insufficient balance"
pm.intent_event:status_from='VALIDATED', status_to='FAILED', reason='TB_TRANSFER_ERROR'
Daily/Monthly Limit Exceeded¶
- Where caught:
LimitsServiceat theCREATED → VALIDATEDtransition, before any TB call. - Intent state → FAILED (never reaches
AUTHORIZED— TB PENDING is never created) - Resolution for user: Wait until limit resets. Daily limits reset at midnight Thailand time (
Asia/BangkokTZ). Monthly limits reset on the 1st of each month. - Admin note: Limits are per-KYC-tier and are configured in
pm.payment_limitsvia the Admin PanelAdminLimitsEndpoint.
Recipient Account Not Found¶
P2P_TRANSFER(internal): Caught duringVALIDATEDstep whengetAccountByName('user.{id}.THB')returns null. Intent →FAILEDwithACCOUNT_NOT_FOUNDbefore any TB call.IPPS_TRANSFER: PspWorker's IppsDriver calls IPPS query → IPPS returns bank error E004/E005 (receiver not found or inactive) →driver.classifyError()returns'fail'→applyOutcome(failed)→void_pendingoutbox event → OutboxWorker voids TB PENDING → intent →FAILED.- TB state (IPPS case): PENDING transfer was created at
AUTHORIZED; voided whenvoid_pendingis processed by OutboxWorker. Funds are returned to sender's account. - User message: "Recipient account not found or inactive"
Fee Rule Error (rule throws)¶
- Where caught:
calculateFees()→ rule-engine JavaScript sandbox evaluatespm.fee_rule.expression→ throws or returns invalid shape. - Intent state → FAILED during
VALIDATEDstep. - Admin action: Inspect
pm.fee_ruletable. UsePOST /admin/fee-rules/dry-run(requiresAuthorization: Bearer <ADMIN_SECRET>) to test rule expressions without creating an intent. Fix the expression, setactive=falseon the bad rule temporarily.
IPPS Transfer Timeout (network or IPPS-side)¶
- Where caught: PspWorker → IppsDriver → HTTP call to IPPS times out (
IPPS_HTTP_TIMEOUT_MS=8000). - On
querycall: classified as'retry'→in-progress(QUERY_PENDING, retryIncrement=true). No TB PENDING created yet...
Wait — correction: TB PENDING is created at AUTHORIZED step (before PspWorker). On query timeout: psp_tx_map.state cycles through QUERY_PENDING retries. TB PENDING already holds funds.
- On
confirmcall: classified as'inquire'(transport error on confirm) →in-progress(CONFIRM_PENDING, retryIncrement=true). Worker transitions toINQUIRINGto check if IPPS actually received the payment. - After
PSP_MAX_RETRIES=3: IppsDriver escalates tomanual-review→psp_tx_map.state = 'MANUAL_REVIEW',pm.intentstaysAUTHORIZEDwith TB PENDING frozen. - Admin action required: Use force-resolve API. Check with IPPS support whether the transfer was received before voiding TB PENDING.
!!! CRITICAL: IPPS Confirm is NOT Idempotent !!!¶
SIT-confirmed 2026-04-29 (Q-IPPS-2): Two confirm calls with the same lookupRef produce two separate real-money transfers. IPPS does NOT deduplicate.
PM rule: NEVER retry a confirm call with the same lookupRef.
If the confirm HTTP call times out or the connection drops before PM receives a response:
1. IppsDriver classifies transport error on confirm as 'inquire'.
2. Worker sets state to CONFIRM_PENDING, increments retry_count.
3. On next pickup: worker transitions to INQUIRING and calls inquiry using the stored confirmRqUid.
4. If confirmRqUid was never saved (crash between confirm response and DB write): row enters MANUAL_REVIEW with reason orphan_lookup_ref. Do NOT retry the confirm. Contact IPPS support with the lookupRef to determine the outcome.
The only safe recovery path after a lost confirm response is inquiry — never re-confirm.
Authentication Errors¶
JWT Expired (Flutter ↔ nginx)¶
- Behavior: nginx
auth_requestto Serverpod'sGET /nginx/authreturns 401 → nginx returns 401 to Flutter. - Flutter action: Call refresh endpoint (
POST /auth/refresh) with refresh token → get new JWT pair → retry original request. - Token TTL: Access 15 min, Refresh 30 days (one-time rotation on every refresh).
- If refresh also fails (refresh token expired or revoked): Flutter must prompt user to re-authenticate.
Invalid HMAC Signature (inter-service)¶
- Behavior: PM returns HTTP 401 with
{ error: 'UNAUTHORIZED', message: 'HMAC invalid or timestamp expired' }. - Common causes:
- Clock skew between calling service and PM:
X-Timestampmust be within ±60 seconds of PM's clock. - Wrong HMAC secret in calling service's config.
- Signature algorithm mismatch: must be HMAC-SHA256 over
{X-Timestamp}\n{METHOD}\n{PATH}\n{sha256(body)}. - Body hash mismatch (e.g., body was re-serialized with different key order).
- Debug: Check
X-Timestampvalue in the rejected request. Verify PM's system time is synchronized (NTP).
Service Key Inactive or Forbidden operationType¶
- Behavior: PM returns HTTP 403 with
{ error: 'FORBIDDEN' }. - Causes:
pm.service_key.active = falsefor the caller'sX-Service-Id.- Requested
operationTypeis not inpm.service_key.allowedOperationTypesfor this caller. - Resolution: Check
pm.service_keytable. Re-enable or updateallowedOperationTypesvia Admin Panel or direct DB update (with care — this is a security boundary).
Session Revoked¶
- Behavior: 401 on next Serverpod request after logout, password change, or manual revoke.
- Mechanism: All Redis session keys for the user's
userIdare deleted on logout/password-change. Subsequent JWT validation fails because session is not found in Redis. - User action: Re-authenticate.
Database Errors¶
Duplicate Intent ID (Idempotency)¶
- Scenario:
POST /intentsreceived twice with the sameidempotencyKeyandserviceId. - Behavior: PM returns HTTP 200 (not 201) with the existing intent's state. No duplicate TB transfers, no duplicate DB rows.
- When this matters: Network retry by Flutter/Serverpod after a timeout — the second call is safe.
- When this is a bug: If the same
idempotencyKeyis reused for a semantically different payment — the second caller gets back the first intent's result, which may be wrong. EnsureidempotencyKeyis UUID v4 generated per payment intent.
Concurrent Worker Pick-up (OutboxWorker)¶
- Prevention:
SELECT FOR UPDATE SKIP LOCKEDin OutboxWorker's pickup query — only one worker can lock a givenoutbox_eventrow at a time. - OutboxWorker is currently single-instance (see
WORKER_ROLES=outbox-workerdeployment note). Running two instances is technically safe at the DB level due toSKIP LOCKED, but the single-instance constraint is tracked in backlog (tech-debt-outbox-multi-instance).
PspWorker Concurrent Pick-up¶
- Prevention: Atomic CTE in
pickUpJobs()—WITH picked AS (SELECT ... FOR UPDATE SKIP LOCKED) UPDATE ... RETURNING. State flip (NEW → QUERY_PENDING) happens in the same transaction, so a second concurrent worker sees the row already inQUERY_PENDINGand does not pick it. - Multiple PspWorker instances are safe — this is the designed deployment mode for scale-out (
WORKER_ROLES=psp-worker PSP_NAMES=IPPS).
Infrastructure Errors¶
TigerBeetle Unavailable¶
- PM behavior:
createTransfers()orlookupAccounts()throws → PM returns HTTP 500 withTB_TRANSFER_ERROR. - Retry: PM does NOT auto-retry TB calls. The error is surfaced immediately.
- Intent state: Stays at the step that failed (
VALIDATED, orAUTHORIZEDif OutboxWorker post-pending failed). - Resolution: See fault-tolerance.md Scenario 2.
Redis PUBLISH Fails (intent.{id})¶
- Impact: Serverpod does not receive real-time intent status push for
requiresMonitoring=trueintents. - Fallback: Flutter polls
getIntent(id)via Serverpod → PMGET /intents/:iduntil a terminal state (SETTLEDorFAILED) is returned or a timeout is reached. - User impact: Status update may be delayed by polling interval, not lost — final state is always readable from
pm.intentviaGET /intents/:id. - PM behavior: Redis PUBLISH failure is logged but does not affect intent state or TB transfers.
Notifications Stream Write Fails¶
- Impact: Push notification is not queued.
- No retry at producer: PM, Auth Center, and KYC Service publish to
stream.notifications.jobsas fire-and-forget. If Redis is down, theXADDfails; the notification is lost for that event. - Mitigation: Redis persistence must be enabled so the stream survives Redis restart. The stream consumer (
notifications-workergroup) usesXAUTOCLAIMto re-deliver unACK'd messages after 30s idle.
IPPS-Specific Error Classification¶
IppsDriver classifies errors via classifyError(err, op):
| Error source | Classification | Effect |
|---|---|---|
Network timeout / connection reset on confirm |
inquire |
→ CONFIRM_PENDING → next pick runs inquiry |
Network timeout on query or inquiry |
retry |
→ QUERY_PENDING / INQUIRING with retryIncrement |
| HTTP 429, 502, 503, 504 | retry |
as above |
| HTTP 400, 401, 403, 404, 409, 422 | fail |
→ void_pending → FAILED |
| Bank code E008 (service unavailable) | retry |
as above |
| Bank codes E001–E007, E009, E010 | fail |
→ void_pending → FAILED |
| HTTP 500 on any operation | inquire (on confirm) / retry (on query/inquiry) |
Intentionally conservative — avoid double-sending |
Unknown bank error code on confirm |
inquire |
Safety-first — don't retry confirm |
Inquiry rqUID: IPPS inquiry only accepts the confirm-level rqUID (the rqUID field from the confirm response). lookupRef, retrievalReferenceNumber, and query-level rqUID all return 404. If confirmRqUid is null, inquiry is impossible → MANUAL_REVIEW.
IPPS error response shape (SIT-observed vs docs): IPPS actual error responses use { statusCode, code: "BBL_undefined", message: "..." } format — different from the data: { code: 'E001' } format shown in PPXC v1.2.4 docs. Pending clarification from IPPS (B5 in ipps-questions.md).
Error Logging Requirements¶
For every error in PM logs:
- Always log intentId (= trace_id) for all payment-related errors.
- Always log userId (integer) — not name, not phone, not document number.
- Always log service name and error code.
- Always log event field matching the Required Log Events catalog in docs/SECURITY.md.
Must NOT log:
- Full request bodies (may contain PII metadata).
- HMAC secrets or HMAC signature values.
- JWT token values (log last 8 characters of token ID only if identification is needed).
- IPPS request/response bodies in full — log only rqUID, status_code, and classified error code.
- OCR extraction results from KYC (contains national ID, name, DOB).
- Notification payload content (title/body may contain account info).
Admin Force-Resolve (Plan 3, pending implementation)¶
For intents stuck in MANUAL_REVIEW or AUTHORIZED with unresolvable PSP state, the Admin Panel will provide a force-resolve endpoint. Until Plan 3 is complete, resolution requires direct DB update + manual TB operation via PASSPORT.md helper scripts.
Procedure (manual, pre-Plan 3):
1. Identify the pm.intent.id and pm.psp_tx_map row.
2. For MANUAL_REVIEW — determine actual PSP outcome by checking IPPS support dashboard with the lookupRef or confirmRqUid.
3. If transfer was received by IPPS: create post_pending outbox_event manually → OutboxWorker processes it.
4. If transfer was NOT received: create void_pending outbox_event manually → OutboxWorker voids TB PENDING.
5. Write pm.intent_event row documenting the manual resolution with actor and reason.