Module 10

Advanced Reliability & Compaction

This final module focuses on the operational stability and long-term viability of your marketing agent. You must demonstrate the ability to manage extremely long conversation histories and handle tool failures gracefully using Structured Error Responses and Server-Side Compaction.

Answer key Module10_Complete.ipynb

1. Context Management with Server-Side Compaction

As your "Marketing Strategist" agent conducts weeks of research, its conversation history will eventually approach the 1M-token context window limit. While large windows are powerful, they can lead to context rot, where the model loses focus on early instructions.

The Feature: Use the compact_20260112 strategy (requires the compact-2026-01-12 beta header).

How it works: When the conversation hits a specific token trigger (minimum 50,000 tokens), Claude automatically generates a concise summary of the history and replaces the older, "stale" content with a compaction block.
Key Parameter: pause_after_compaction. Set this to true if you want to inspect the summary or add manual context (like a "must-remember" brand rule) before the model continues its marketing task.

Architect Tip for the Exam

You must pass the resulting compaction block back to the API on all subsequent turns. When the API receives a compaction block, it automatically drops all messages prior to it, effectively resetting the context while preserving the narrative summary.

2. Resilient Tooling: Structured Error Responses

In production agentic loops, tools will fail (e.g., a CRM API is down or a web search returns no results). A senior architect doesn't just return a raw string error, they use Structured Error Responses to guide Claude's next decision.

Implementation Task: When your custom "Lead Generation" tool fails, return a JSON object with these specific fields:

errorCategory, Categorize the failure (e.g., API_TIMEOUT, INVALID_PERMISSIONS).
isRetryable, A boolean telling Claude if it should try the same action again or pivot to a different marketing strategy.
message, A human-readable explanation the model can reason over.

3. Architectural Strategy: Task Decomposition

To prevent a single agent from becoming overwhelmed by a "Master Marketing Plan," you must implement Task Decomposition.

Pattern: Instead of one session for "All Marketing," break the work into discrete sub-tasks:

Analyst Subagent: Performs search and data gathering.
Copywriter Subagent: Generates email/social drafts based on analysis.
Coordinator: Uses the Advisor tool (Module 3) to review subagent outputs against the high-level strategy.

This keeps individual context windows small, improves reliability, and allows you to use different models (e.g., Haiku for research, Opus for coordination) to optimize costs.

4. Reliability & Prompt Caching

To maintain performance during compaction, follow the Cache-System-Prompt pattern:

Place a cache_control breakpoint at the very end of your system prompt.
When compaction occurs and a new summary is written, the system prompt cache remains valid. This ensures you only pay for the "write" of the new summary, rather than re-caching your entire 20,000-token marketing instruction set.

5. The Case Facts Pattern

Compaction summarizes conversation history, but summaries can lose precise transactional details, agreed-upon meeting dates, signed contract terms, confirmed budget figures. The Case Facts pattern prevents this loss.

Pattern: Extract critical, non-negotiable facts into a persistent "case_facts" block placed after the compaction block, outside the summary. This block is never compacted.
Implementation: After each session turn, update the case facts block with any newly confirmed data before the next compaction trigger fires.

6. Context Awareness: Remaining Token Budget

Claude Sonnet 4.6 and Haiku 4.5 can track their remaining token budget during a session. As the context window fills, the model can signal that it is approaching capacity, enabling your orchestrator to trigger early compaction, summarize, or hand off to a subagent before the window is exhausted.

Use case: Long-running research loops where you cannot predict how many tool calls will be needed.
Pattern: Poll usage.context_window_remaining in the response and trigger compaction proactively at 80% utilization rather than waiting for a hard limit.

7. Thinking Persistence & Context Exhaustion

On Opus 4.5+ and Sonnet 4.6+, thinking blocks from previous turns are kept in context by default, they are not stripped the way earlier models stripped them. This is intentional for reasoning continuity, but has a critical side effect in long conversations.

Risk: Each turn's thinking blocks accumulate in the context window. A 20-turn conversation with 8,000-token thinking blocks per turn consumes 160,000 tokens of context for thinking alone.
Mitigation: Use server-side compaction (compact_20260112) with a lower trigger threshold when Adaptive Thinking is active. Or explicitly omit thinking output ("thinking_output": "omitted") on turns where continuity is not needed.

Lab Complete

Exam Readiness Checklist

You have now built a robust architectural blueprint for an autonomous agent. Before your exam, make sure you can answer:

✓ Which features are ZDR-eligible (Adaptive Thinking, Compaction, Citations) and which are not (Batch API, Files API)?
✓ How do you use the CLAUDE.md hierarchy and .claude/rules/ to scope brand voice in Claude Code?
✓ When should you choose Adaptive Thinking over the deprecated budget_tokens? (Always for Opus 4.7)
✓ What are the Structured Outputs limits? (20 strict tools, 24 optional params, 16 union types)
✓ What is the Batch API discount and its ZDR eligibility?

View Quick Reference Sheet →

Lab Exercise: Compaction & Long-Horizon Reliability

Self-driven lab Module10_Self_Driven_Lab.ipynb

Objective: preserve important facts while keeping long-running agent sessions inside context limits.

Create a persistent case-facts block with non-negotiable data.
Configure or sketch server-side compaction with instructions to preserve case facts verbatim.
Simulate a subagent failure report with category, retryability, partial findings, and recommended next action.
Count or estimate tokens before and after compaction and document the operational threshold.

Expected Deliverable

A long-horizon session management plan for research agents.