Documentation
Runbooks

Operational runbooks

First-response playbooks for the most common production issues.

Webhook failure rate spike

Trigger: delivery success falls below 95% for 10+ minutes.

Checks

  • Verify downstream endpoint availability and TLS validity.
  • Inspect response body/status in delivery logs.
  • Confirm webhook secret rotations are synced downstream.

Recovery

  • Replay failed deliveries after endpoint recovery.
  • Rotate compromised secret and notify consumer.

Missing referrer payouts

Trigger: calls attributed but referrer share is zero.

Checks

  • Confirm attribution header or API key is present on paid requests.
  • Confirm deployment includes non-zero referrer bps.
  • Check whether rows are settled vs distribute_failed.

Recovery

  • Fix attribution source and run a fresh paid-call test.
  • Retry failed distributions and validate final split amounts.

Attribution mismatch report

Trigger: partner-reported volume doesn’t match dashboard.

Checks

  • Compare partner logs to call IDs and timestamps.
  • Validate wallet/handle mapping used by integration.
  • Check downstream filtering and delayed event ingestion.

Recovery

  • Replay affected deliveries for downstream correction.
  • Share export snapshot for reconciliation with partner.

Escalation threshold

Escalate to incident mode when payout-impacting failures persist for 30 minutes, or when delivery backlog continues to grow after replay.