Перейти к содержанию

04 — Reliability & Data Integrity


The Core Guarantee

No payment is ever lost, duplicated, or left in an unresolvable state.

This guarantee is achieved through three interlocking mechanisms:

  1. TigerBeetle — atomic financial operations that cannot be partially applied
  2. Outbox pattern — durable work queue that survives PM crashes
  3. Status Poller — continuous crash recovery for external PSP flows

TigerBeetle — Financial Source of Truth

Atomic linked transfers

A P2P transfer involves three accounts: sender, transit, and recipient. All three must move together — or not at all.

TigerBeetle flags.linked chains transfers into one atomic batch:

Batch:
  [0] user.sender  → system.transit   ฿1,005  PENDING | linked
  [1] system.transit → user.recipient  ฿1,000  PENDING | linked
  [2] system.transit → system.revenue  ฿5      PENDING          ← last, no linked

If any one fails → all fail → no state change in TigerBeetle

Two-phase transfers — freeze before release

AUTHORIZE:  createTransfer(PENDING)   → funds frozen, deducted from available
SETTLE:     createTransfer(POST_PENDING) → funds released to recipient
CANCEL:     createTransfer(VOID_PENDING) → funds returned to sender

This is how card networks work. Applying it to all payment types means funds are never in ambiguous state.

transit.balance = 0 invariant

The transit account acts as an escrow. After every settlement batch:

In:   amount + preFee  (from sender)
Out:  (amount − postFee) + preFee + postFee = amount + preFee

∴ transit.balance = 0  ← enforced by TB's linked batch math

If transit has a non-zero balance, a transfer was incomplete. This is detectable and alertable.

Immutability

TigerBeetle transfers cannot be updated or deleted. The ledger history is permanent. Corrections are made via compensating transfers — a separate intent with operationType: CORRECTION — which creates a full audit trail of the correction.


Outbox Pattern — Crash-Safe Settlement

The outbox pattern ensures that every payment that reaches the IPPS network eventually completes, even if PM crashes between steps.

How it works

1. IPPS confirm returns success
2. PM writes outbox_event { action: "post_pending", status: "pending" }  ← in PG transaction
3. PM returns 201 AUTHORIZED to caller

[PM can crash here — outbox_event survives in PostgreSQL]

4. OutboxWorker (setInterval 1s) reads pending outbox_events
5. Calls TB postPending   ← idempotent: already_posted = OK
6. Writes tx_history + updates intent to SETTLED ← in PG transaction
7. Publishes to Redis    ← Flutter gets notified
8. Marks outbox_event as processed

Idempotency at every step

Operation Idempotent? How
POST /intents idempotencyKey — same key returns same result
TigerBeetle PENDING Transfer ID = uuidv5(intentId + phase + index) — deterministic
TigerBeetle POST_PENDING already_posted status = not an error
TigerBeetle VOID_PENDING already_voided status = not an error
OutboxWorker processing outbox_event.status = processed prevents re-execution

Crash Recovery Matrix

Internal channels (P2P — tbPendingTimeout = 300s)

PM crashed at State Recovery
Before TB pending Nothing moved Retry with same idempotency key
After TB pending, no PG commit AUTHORIZED + TB pending active TB auto-expires in 300s → funds returned. Status Poller voids and marks FAILED.
After PG commit, before TB post AUTHORIZED + PG record exists OutboxWorker retries TB post (idempotent)
After TB post, before PG SETTLED AUTHORIZED + TB posted OutboxWorker retries PG update (no TB call needed)

External channels (IPPS — tbPendingTimeout = 0)

TB funds never auto-expire. The Status Poller is responsible for recovery.

psp_tx_map state What happened Recovery
QUERYING > 5 min PM crashed before IPPS query returned StatusPoller: insert void_pending → OutboxWorker voids
CONFIRMING > 5 min PM crashed between query and confirm StatusPoller: retry IPPS confirm with same lookupRef
CONFIRMED + no outbox_event PM crashed after confirm, before outbox insert StatusPoller: insert post_pending → OutboxWorker posts
CONFIRMED + outbox_event pending Normal OutboxWorker delay OutboxWorker processes next tick

Dead Letter Queue (DLQ) — Phase 2B

For PSP adapters using Redis Streams (Phase 2B), different DLQ behaviour applies:

stream.{psp}.jobs.dlq   — adapter never processed the job:
  PSP request never sent → safe to void
  → TB void_pending → intent FAILED

stream.{psp}.results.dlq — PM could not process the PSP response:
  PSP may have settled → NOT safe to auto-void
  → insert pm.reconciliation_alert → ops review → manual resolution

Reconciliation

What is checked

The Reconciliation Service (Phase 2C / B16) cross-checks three data sources:

pm.intent  ←→  pm.tx_history  ←→  TigerBeetle
(status)        (amounts)           (actual transfers)
Check Detects
SETTLED intent + TB transfers match Normal ✅
SETTLED intent + TB amount mismatch Ledger discrepancy — alert
SETTLED intent + no TB transfers found Missing transfer — alert
AUTHORIZED intent + pending TB transfers + no outbox_event Stuck payment — auto-recover
pm.account_balance snapshot vs TB live balance Balance drift — alert

Phase 1 coverage (B6 startup reconciler)

At PM startup, the reconciler automatically resolves stuck AUTHORIZED intents for internal channels (safe to void — TB pending will expire or is already expired).

For external channels: generates a pm.reconciliation_alert entry and waits for ops review. Auto-void is never applied to external channels — the PSP may have already settled the payment.

Correction flow

TigerBeetle is immutable — existing transfers cannot be edited. Corrections use a compensating transfer:

1. Ops reviews pm.reconciliation_alert
2. Creates CORRECTION intent (requires manual approval)
3. PM creates direct (non-pending) TB transfer for the missing amount
4. intent_event: CORRECTION_APPLIED with reference to original intentId
5. tx_history: new CREDIT row linking back to the discrepancy