Reliability & Error Handling
Production agentic pipelines fail in predictable ways, external APIs time out, data arrives in inconsistent formats, and tools return malformed results. This module covers the three patterns that make pipelines resilient: programmatic tool calling, structured error responses, and output normalization hooks.
1. Programmatic Tool Calling (Code Execution)
The code execution tool lets Claude call tools multiple times inside a sandboxed container before results reach the context window. This is the right pattern for filtering large prospect lists, Claude can run Python to process 10,000 records and surface only the 50 that match your ICP.
- Pattern: Include
code_execution_20260120alongside your custom tools. Claude can loop, filter, and aggregate tool results programmatically. - ZDR note: Programmatic tool calling is not ZDR-eligible, execution state is held server-side during the run.
2. Structured Error Responses
When a tool fails (CRM API down, quota exceeded), return a structured JSON error object instead of a raw string. This gives Claude the information it needs to decide whether to retry or pivot.
errorCategory, Categorize the failure:API_TIMEOUT,INVALID_PERMISSIONS,RATE_LIMIT_EXCEEDED.isRetryable, Boolean.true= try again;false= pivot to a different strategy.message, Human-readable explanation Claude can reason over.
def handle_crm_tool(tool_input):
try:
result = crm_client.search_contacts(tool_input["query"])
return {"contacts": result}
except TimeoutError:
return {
"errorCategory": "API_TIMEOUT",
"isRetryable": True,
"message": "CRM API timed out after 30s. Retry with a narrower query."
}
except PermissionError:
return {
"errorCategory": "INVALID_PERMISSIONS",
"isRetryable": False,
"message": "API key lacks read access to the contacts endpoint. Escalate to admin."
}
3. PostToolUse Hook: Output Normalization
A PostToolUse hook is a shell command that runs after each tool call but before the result reaches the model. Use it to normalize inconsistent outputs, different date formats, currency representations, or encoding issues, so Claude always sees clean, uniform data.
{
"hooks": {
"PostToolUse": [
{
"matcher": {"tool_name": "fetch_crm_data"},
"hooks": [
{
"type": "command",
"command": "python normalize_dates.py"
}
]
}
]
}
}
The isRetryable field is the key signal in your agentic loop. On true, Claude can retry with adjusted parameters (e.g., a shorter timeout or a narrower query). On false, Claude should branch to a fallback strategy rather than looping indefinitely. Always structure errors, raw string errors give Claude no actionable signal.
Lab Exercise: Resilient Tool Error Handling
Self-driven lab Module8_Self_Driven_Lab.ipynbObjective: make agent recovery decisions based on structured errors instead of generic failure text.
- Implement a tool dispatcher that returns `errorCategory`, `isRetryable`, and a concise message.
- Add retry logic for transient errors and pivot behavior for non-retryable failures.
- Sketch a `PostToolUse` hook that normalizes inconsistent provider output.
- Log enough context for an operator to debug the failed tool call.
A resilient tool-handling pattern for production agents.