Retries

Configure retry behavior per Step with work_config. Covers what triggers a retry (5xx and transport errors), what is permanent (4xx), the three retry delay strategies, and the global defaults that apply when fields are omitted.

When Retries Happen

Retries are triggered when a Work Item enters not_completed status, recorded by a work_not_completed event:

  • Network or transport failures (connection errors, timeouts)
  • HTTP 5xx responses from the Step endpoint

HTTP 4xx responses are permanent failures and do not retry. If business logic failed permanently, return a 4xx with application/problem+json.

Configuration

Retry behavior is configured in work_config on the Step:

{
  "id": "charge-card",
  "type": "async",
  "http": { "endpoint": "https://api.example.com/payments", "timeout": 5000 },
  "work_config": {
    "max_retries": 3,
    "init_backoff": 1000,
    "max_backoff": 30000,
    "backoff_type": "exponential"
  }
}
FieldMeaning
max_retriesMax retry attempts. -1 = unlimited. 0 = use global default
init_backoffInitial delay in milliseconds. 0 = use global default
max_backoffMaximum delay in milliseconds. 0 = use global default
backoff_typefixed, linear, or exponential. Empty = use global default

Steps with max_retries = 0 and empty backoff fields inherit the engine’s global defaults set at startup.

Backoff Strategies

Backoff is the delay before another retry attempt.

  • Fixed: constant delay between every attempt
  • Linear: delay grows by a fixed increment each attempt
  • Exponential: delay doubles each attempt, up to max_backoff

Retry Lifecycle

Each Work Item retries independently:

graph TD Start["Work item starts"] Execute["Execute step"] Success["2xx response"] Fail4xx["4xx response"] Fail5xx["5xx / transport error"] Retry["Schedule retry
(with backoff)"] Done["Work item complete"] Failed["Work item failed permanently"] Exhausted["Max retries reached"] Start --> Execute Execute --> Success --> Done Execute --> Fail4xx --> Failed Execute --> Fail5xx --> Retry Retry --> Execute Retry -->|"max_retries hit"| Exhausted --> Failed

Compensation Retries

When a step has a compensate endpoint configured, compensation attempts use the same work_config retry settings as normal work. The engine treats 5xx compensation responses as temporary failures and retries after the configured delay. When max_retries is exhausted, the compensation is marked compensation_failed.

See Compensation for full details.

Design Tips

  • Use fixed backoff for quick retry of unreliable dependencies
  • Use exponential backoff for rate-limited or unstable services
  • Keep max_retries low unless your Step is idempotent, meaning repeated requests do not repeat external changes (honor Argyll-Receipt-Token)
  • Return HTTP 4xx when the failure is permanent and retrying would be wrong