Module 8

Reliability & Error Handling

Production agentic pipelines fail in predictable ways, external APIs time out, data arrives in inconsistent formats, and tools return malformed results. This module covers the three patterns that make pipelines resilient: programmatic tool calling, structured error responses, and output normalization hooks.

Answer key Module8_Complete.ipynb

1. Programmatic Tool Calling (Code Execution)

The code execution tool lets Claude call tools multiple times inside a sandboxed container before results reach the context window. This is the right pattern for filtering large prospect lists, Claude can run Python to process 10,000 records and surface only the 50 that match your ICP.

  • Pattern: Include code_execution_20260120 alongside your custom tools. Claude can loop, filter, and aggregate tool results programmatically.
  • ZDR note: Programmatic tool calling is not ZDR-eligible, execution state is held server-side during the run.

2. Structured Error Responses

When a tool fails (CRM API down, quota exceeded), return a structured JSON error object instead of a raw string. This gives Claude the information it needs to decide whether to retry or pivot.

  • errorCategory, Categorize the failure: API_TIMEOUT, INVALID_PERMISSIONS, RATE_LIMIT_EXCEEDED.
  • isRetryable, Boolean. true = try again; false = pivot to a different strategy.
  • message, Human-readable explanation Claude can reason over.
Python (tool handler)
def handle_crm_tool(tool_input):
    try:
        result = crm_client.search_contacts(tool_input["query"])
        return {"contacts": result}
    except TimeoutError:
        return {
            "errorCategory": "API_TIMEOUT",
            "isRetryable": True,
            "message": "CRM API timed out after 30s. Retry with a narrower query."
        }
    except PermissionError:
        return {
            "errorCategory": "INVALID_PERMISSIONS",
            "isRetryable": False,
            "message": "API key lacks read access to the contacts endpoint. Escalate to admin."
        }

3. PostToolUse Hook: Output Normalization

A PostToolUse hook is a shell command that runs after each tool call but before the result reaches the model. Use it to normalize inconsistent outputs, different date formats, currency representations, or encoding issues, so Claude always sees clean, uniform data.

JSON (claude hooks config)
{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": {"tool_name": "fetch_crm_data"},
        "hooks": [
          {
            "type": "command",
            "command": "python normalize_dates.py"
          }
        ]
      }
    ]
  }
}
Architect Tip for the Exam

The isRetryable field is the key signal in your agentic loop. On true, Claude can retry with adjusted parameters (e.g., a shorter timeout or a narrower query). On false, Claude should branch to a fallback strategy rather than looping indefinitely. Always structure errors, raw string errors give Claude no actionable signal.

Lab Exercise: Resilient Tool Error Handling

Self-driven lab Module8_Self_Driven_Lab.ipynb

Objective: make agent recovery decisions based on structured errors instead of generic failure text.

  1. Implement a tool dispatcher that returns `errorCategory`, `isRetryable`, and a concise message.
  2. Add retry logic for transient errors and pivot behavior for non-retryable failures.
  3. Sketch a `PostToolUse` hook that normalizes inconsistent provider output.
  4. Log enough context for an operator to debug the failed tool call.
Expected Deliverable

A resilient tool-handling pattern for production agents.