Retries

Configure retry behavior per Step with work_config. Covers what triggers a retry (5xx and transport errors), what is permanent (4xx), the three backoff strategies, and the global defaults that apply when fields are omitted.

When Retries Happen

Retries are triggered when a Work Item is marked work_not_completed:

  • Network or transport failures (connection errors, timeouts)
  • HTTP 5xx responses from the Step endpoint

HTTP 4xx responses are permanent failures and do not retry. If business logic failed permanently, return a 4xx with application/problem+json.

Configuration

Retry behavior is configured in work_config on the Step:

{
  "id": "charge-card",
  "type": "async",
  "http": { "endpoint": "https://api.example.com/payments", "timeout": 5000 },
  "work_config": {
    "max_retries": 3,
    "init_backoff": 1000,
    "max_backoff": 30000,
    "backoff_type": "exponential"
  }
}
FieldMeaning
max_retriesMax retry attempts. -1 = unlimited. 0 = use global default
init_backoffInitial delay in milliseconds. 0 = use global default
max_backoffMaximum delay in milliseconds. 0 = use global default
backoff_typefixed, linear, or exponential. Empty = use global default

Steps with max_retries = 0 and empty backoff fields inherit the engine’s global defaults set at startup.

Backoff Strategies

  • Fixed: constant delay between every attempt
  • Linear: delay grows by a fixed increment each attempt
  • Exponential: delay doubles each attempt, up to max_backoff

Retry Lifecycle

Each Work Item retries independently:

graph TD Start["Work item starts"] Execute["Execute step"] Success["2xx response"] Fail4xx["4xx response"] Fail5xx["5xx / transport error"] Retry["Schedule retry
(with backoff)"] Done["Work item complete"] Failed["Work item failed permanently"] Exhausted["Max retries reached"] Start --> Execute Execute --> Success --> Done Execute --> Fail4xx --> Failed Execute --> Fail5xx --> Retry Retry --> Execute Retry -->|"max_retries hit"| Exhausted --> Failed

Compensation Retries

When a step has a compensate endpoint configured, compensation attempts use the same work_config retry settings as normal work. The engine treats 5xx compensation responses as transient and retries with the configured backoff. When max_retries is exhausted, the compensation is marked comp_failed.

See Compensation for full details.

Design Tips

  • Use fixed backoff for quick retry of flaky dependencies
  • Use exponential backoff for rate-limited or unstable services
  • Keep max_retries low unless your Step is idempotent (honor Argyll-Receipt-Token)
  • Return HTTP 4xx when the failure is permanent and retrying would be wrong