Перейти к содержанию

Error Handling — Scenarios and Resolution

Scope Payment Manager, Auth Center, PSP adapters
Last updated 2026-05-01

Error Taxonomy

Layer Error Type HTTP Code
Client Invalid input 400 BAD_REQUEST
Auth HMAC invalid 401 UNAUTHORIZED
Auth JWT invalid / expired 401 — (nginx rejects)
Auth Service key inactive or operationType forbidden 403 FORBIDDEN
Business Rule violation (limit, KYC tier) 422 VALIDATION_ERROR
Business Account not found 422 ACCOUNT_NOT_FOUND
Business No route for operationType 400 NO_ROUTE
Business Intent not found 404 NOT_FOUND
Infrastructure TigerBeetle error 500 TB_TRANSFER_ERROR
Infrastructure Service unavailable 503
Data Duplicate intent (idempotency replay) 200 — (returns existing)

Payment Intent Errors

Insufficient Funds

  • Where caught: Saga → TigerBeetle createTransfers() at AUTHORIZED step → TB returns EXCEEDS_CREDITS (or EXCEEDS_DEBITS on transit account flags).
  • Intent state → FAILED
  • TB state: Nothing committed — the linked transfer batch is atomic; TB rejects the entire batch, no state change in any account.
  • User message: "Insufficient balance"
  • pm.intent_event: status_from='VALIDATED', status_to='FAILED', reason='TB_TRANSFER_ERROR'

Daily/Monthly Limit Exceeded

  • Where caught: LimitsService at the CREATED → VALIDATED transition, before any TB call.
  • Intent state → FAILED (never reaches AUTHORIZED — TB PENDING is never created)
  • Resolution for user: Wait until limit resets. Daily limits reset at midnight Thailand time (Asia/Bangkok TZ). Monthly limits reset on the 1st of each month.
  • Admin note: Limits are per-KYC-tier and are configured in pm.payment_limits via the Admin Panel AdminLimitsEndpoint.

Recipient Account Not Found

  • P2P_TRANSFER (internal): Caught during VALIDATED step when getAccountByName('user.{id}.THB') returns null. Intent → FAILED with ACCOUNT_NOT_FOUND before any TB call.
  • IPPS_TRANSFER: PspWorker's IppsDriver calls IPPS query → IPPS returns bank error E004/E005 (receiver not found or inactive) → driver.classifyError() returns 'fail'applyOutcome(failed)void_pending outbox event → OutboxWorker voids TB PENDING → intent → FAILED.
  • TB state (IPPS case): PENDING transfer was created at AUTHORIZED; voided when void_pending is processed by OutboxWorker. Funds are returned to sender's account.
  • User message: "Recipient account not found or inactive"

Fee Rule Error (rule throws)

  • Where caught: calculateFees() → rule-engine JavaScript sandbox evaluates pm.fee_rule.expression → throws or returns invalid shape.
  • Intent state → FAILED during VALIDATED step.
  • Admin action: Inspect pm.fee_rule table. Use POST /admin/fee-rules/dry-run (requires Authorization: Bearer <ADMIN_SECRET>) to test rule expressions without creating an intent. Fix the expression, set active=false on the bad rule temporarily.

IPPS Transfer Timeout (network or IPPS-side)

  • Where caught: PspWorker → IppsDriver → HTTP call to IPPS times out (IPPS_HTTP_TIMEOUT_MS=8000).
  • On query call: classified as 'retry'in-progress(QUERY_PENDING, retryIncrement=true). No TB PENDING created yet...

Wait — correction: TB PENDING is created at AUTHORIZED step (before PspWorker). On query timeout: psp_tx_map.state cycles through QUERY_PENDING retries. TB PENDING already holds funds.

  • On confirm call: classified as 'inquire' (transport error on confirm) → in-progress(CONFIRM_PENDING, retryIncrement=true). Worker transitions to INQUIRING to check if IPPS actually received the payment.
  • After PSP_MAX_RETRIES=3: IppsDriver escalates to manual-reviewpsp_tx_map.state = 'MANUAL_REVIEW', pm.intent stays AUTHORIZED with TB PENDING frozen.
  • Admin action required: Use force-resolve API. Check with IPPS support whether the transfer was received before voiding TB PENDING.

!!! CRITICAL: IPPS Confirm is NOT Idempotent !!!

SIT-confirmed 2026-04-29 (Q-IPPS-2): Two confirm calls with the same lookupRef produce two separate real-money transfers. IPPS does NOT deduplicate.

PM rule: NEVER retry a confirm call with the same lookupRef.

If the confirm HTTP call times out or the connection drops before PM receives a response: 1. IppsDriver classifies transport error on confirm as 'inquire'. 2. Worker sets state to CONFIRM_PENDING, increments retry_count. 3. On next pickup: worker transitions to INQUIRING and calls inquiry using the stored confirmRqUid. 4. If confirmRqUid was never saved (crash between confirm response and DB write): row enters MANUAL_REVIEW with reason orphan_lookup_ref. Do NOT retry the confirm. Contact IPPS support with the lookupRef to determine the outcome.

The only safe recovery path after a lost confirm response is inquiry — never re-confirm.


Authentication Errors

JWT Expired (Flutter ↔ nginx)

  • Behavior: nginx auth_request to Serverpod's GET /nginx/auth returns 401 → nginx returns 401 to Flutter.
  • Flutter action: Call refresh endpoint (POST /auth/refresh) with refresh token → get new JWT pair → retry original request.
  • Token TTL: Access 15 min, Refresh 30 days (one-time rotation on every refresh).
  • If refresh also fails (refresh token expired or revoked): Flutter must prompt user to re-authenticate.

Invalid HMAC Signature (inter-service)

  • Behavior: PM returns HTTP 401 with { error: 'UNAUTHORIZED', message: 'HMAC invalid or timestamp expired' }.
  • Common causes:
  • Clock skew between calling service and PM: X-Timestamp must be within ±60 seconds of PM's clock.
  • Wrong HMAC secret in calling service's config.
  • Signature algorithm mismatch: must be HMAC-SHA256 over {X-Timestamp}\n{METHOD}\n{PATH}\n{sha256(body)}.
  • Body hash mismatch (e.g., body was re-serialized with different key order).
  • Debug: Check X-Timestamp value in the rejected request. Verify PM's system time is synchronized (NTP).

Service Key Inactive or Forbidden operationType

  • Behavior: PM returns HTTP 403 with { error: 'FORBIDDEN' }.
  • Causes:
  • pm.service_key.active = false for the caller's X-Service-Id.
  • Requested operationType is not in pm.service_key.allowedOperationTypes for this caller.
  • Resolution: Check pm.service_key table. Re-enable or update allowedOperationTypes via Admin Panel or direct DB update (with care — this is a security boundary).

Session Revoked

  • Behavior: 401 on next Serverpod request after logout, password change, or manual revoke.
  • Mechanism: All Redis session keys for the user's userId are deleted on logout/password-change. Subsequent JWT validation fails because session is not found in Redis.
  • User action: Re-authenticate.

Database Errors

Duplicate Intent ID (Idempotency)

  • Scenario: POST /intents received twice with the same idempotencyKey and serviceId.
  • Behavior: PM returns HTTP 200 (not 201) with the existing intent's state. No duplicate TB transfers, no duplicate DB rows.
  • When this matters: Network retry by Flutter/Serverpod after a timeout — the second call is safe.
  • When this is a bug: If the same idempotencyKey is reused for a semantically different payment — the second caller gets back the first intent's result, which may be wrong. Ensure idempotencyKey is UUID v4 generated per payment intent.

Concurrent Worker Pick-up (OutboxWorker)

  • Prevention: SELECT FOR UPDATE SKIP LOCKED in OutboxWorker's pickup query — only one worker can lock a given outbox_event row at a time.
  • OutboxWorker is currently single-instance (see WORKER_ROLES=outbox-worker deployment note). Running two instances is technically safe at the DB level due to SKIP LOCKED, but the single-instance constraint is tracked in backlog (tech-debt-outbox-multi-instance).

PspWorker Concurrent Pick-up

  • Prevention: Atomic CTE in pickUpJobs()WITH picked AS (SELECT ... FOR UPDATE SKIP LOCKED) UPDATE ... RETURNING. State flip (NEW → QUERY_PENDING) happens in the same transaction, so a second concurrent worker sees the row already in QUERY_PENDING and does not pick it.
  • Multiple PspWorker instances are safe — this is the designed deployment mode for scale-out (WORKER_ROLES=psp-worker PSP_NAMES=IPPS).

Infrastructure Errors

TigerBeetle Unavailable

  • PM behavior: createTransfers() or lookupAccounts() throws → PM returns HTTP 500 with TB_TRANSFER_ERROR.
  • Retry: PM does NOT auto-retry TB calls. The error is surfaced immediately.
  • Intent state: Stays at the step that failed (VALIDATED, or AUTHORIZED if OutboxWorker post-pending failed).
  • Resolution: See fault-tolerance.md Scenario 2.

Redis PUBLISH Fails (intent.{id})

  • Impact: Serverpod does not receive real-time intent status push for requiresMonitoring=true intents.
  • Fallback: Flutter polls getIntent(id) via Serverpod → PM GET /intents/:id until a terminal state (SETTLED or FAILED) is returned or a timeout is reached.
  • User impact: Status update may be delayed by polling interval, not lost — final state is always readable from pm.intent via GET /intents/:id.
  • PM behavior: Redis PUBLISH failure is logged but does not affect intent state or TB transfers.

Notifications Stream Write Fails

  • Impact: Push notification is not queued.
  • No retry at producer: PM, Auth Center, and KYC Service publish to stream.notifications.jobs as fire-and-forget. If Redis is down, the XADD fails; the notification is lost for that event.
  • Mitigation: Redis persistence must be enabled so the stream survives Redis restart. The stream consumer (notifications-worker group) uses XAUTOCLAIM to re-deliver unACK'd messages after 30s idle.

IPPS-Specific Error Classification

IppsDriver classifies errors via classifyError(err, op):

Error source Classification Effect
Network timeout / connection reset on confirm inquire CONFIRM_PENDING → next pick runs inquiry
Network timeout on query or inquiry retry QUERY_PENDING / INQUIRING with retryIncrement
HTTP 429, 502, 503, 504 retry as above
HTTP 400, 401, 403, 404, 409, 422 fail void_pendingFAILED
Bank code E008 (service unavailable) retry as above
Bank codes E001–E007, E009, E010 fail void_pendingFAILED
HTTP 500 on any operation inquire (on confirm) / retry (on query/inquiry) Intentionally conservative — avoid double-sending
Unknown bank error code on confirm inquire Safety-first — don't retry confirm

Inquiry rqUID: IPPS inquiry only accepts the confirm-level rqUID (the rqUID field from the confirm response). lookupRef, retrievalReferenceNumber, and query-level rqUID all return 404. If confirmRqUid is null, inquiry is impossible → MANUAL_REVIEW.

IPPS error response shape (SIT-observed vs docs): IPPS actual error responses use { statusCode, code: "BBL_undefined", message: "..." } format — different from the data: { code: 'E001' } format shown in PPXC v1.2.4 docs. Pending clarification from IPPS (B5 in ipps-questions.md).


Error Logging Requirements

For every error in PM logs: - Always log intentId (= trace_id) for all payment-related errors. - Always log userId (integer) — not name, not phone, not document number. - Always log service name and error code. - Always log event field matching the Required Log Events catalog in docs/SECURITY.md.

Must NOT log: - Full request bodies (may contain PII metadata). - HMAC secrets or HMAC signature values. - JWT token values (log last 8 characters of token ID only if identification is needed). - IPPS request/response bodies in full — log only rqUID, status_code, and classified error code. - OCR extraction results from KYC (contains national ID, name, DOB). - Notification payload content (title/body may contain account info).


Admin Force-Resolve (Plan 3, pending implementation)

For intents stuck in MANUAL_REVIEW or AUTHORIZED with unresolvable PSP state, the Admin Panel will provide a force-resolve endpoint. Until Plan 3 is complete, resolution requires direct DB update + manual TB operation via PASSPORT.md helper scripts.

Procedure (manual, pre-Plan 3): 1. Identify the pm.intent.id and pm.psp_tx_map row. 2. For MANUAL_REVIEW — determine actual PSP outcome by checking IPPS support dashboard with the lookupRef or confirmRqUid. 3. If transfer was received by IPPS: create post_pending outbox_event manually → OutboxWorker processes it. 4. If transfer was NOT received: create void_pending outbox_event manually → OutboxWorker voids TB PENDING. 5. Write pm.intent_event row documenting the manual resolution with actor and reason.