Retries
Configure retry behavior per Step with work_config. Covers what triggers a retry (5xx and transport errors), what is permanent (4xx), the three retry delay strategies, and the global defaults that apply when fields are omitted.
When Retries Happen
Retries are triggered when a Work Item enters not_completed status, recorded by a work_not_completed event:
- Network or transport failures (connection errors, timeouts)
- HTTP 5xx responses from the Step endpoint
HTTP 4xx responses are permanent failures and do not retry. If business logic failed permanently, return a 4xx with application/problem+json.
Configuration
Retry behavior is configured in work_config on the Step:
{
"id": "charge-card",
"type": "async",
"http": { "endpoint": "https://api.example.com/payments", "timeout": 5000 },
"work_config": {
"max_retries": 3,
"init_backoff": 1000,
"max_backoff": 30000,
"backoff_type": "exponential"
}
}
| Field | Meaning |
|---|---|
max_retries | Max retry attempts. -1 = unlimited. 0 = use global default |
init_backoff | Initial delay in milliseconds. 0 = use global default |
max_backoff | Maximum delay in milliseconds. 0 = use global default |
backoff_type | fixed, linear, or exponential. Empty = use global default |
Steps with max_retries = 0 and empty backoff fields inherit the engine’s global defaults set at startup.
Backoff Strategies
Backoff is the delay before another retry attempt.
- Fixed: constant delay between every attempt
- Linear: delay grows by a fixed increment each attempt
- Exponential: delay doubles each attempt, up to
max_backoff
Retry Lifecycle
Each Work Item retries independently:
(with backoff)"] Done["Work item complete"] Failed["Work item failed permanently"] Exhausted["Max retries reached"] Start --> Execute Execute --> Success --> Done Execute --> Fail4xx --> Failed Execute --> Fail5xx --> Retry Retry --> Execute Retry -->|"max_retries hit"| Exhausted --> Failed
Compensation Retries
When a step has a compensate endpoint configured, compensation attempts use the same work_config retry settings as normal work. The engine treats 5xx compensation responses as temporary failures and retries after the configured delay. When max_retries is exhausted, the compensation is marked compensation_failed.
See Compensation for full details.
Design Tips
- Use fixed backoff for quick retry of unreliable dependencies
- Use exponential backoff for rate-limited or unstable services
- Keep
max_retrieslow unless your Step is idempotent, meaning repeated requests do not repeat external changes (honorArgyll-Receipt-Token) - Return HTTP 4xx when the failure is permanent and retrying would be wrong