Advanced Reliability & Compaction
This final module focuses on the operational stability and long-term viability of your marketing agent. You must demonstrate the ability to manage extremely long conversation histories and handle tool failures gracefully using Structured Error Responses and Server-Side Compaction.
1. Context Management with Server-Side Compaction
As your "Marketing Strategist" agent conducts weeks of research, its conversation history will eventually approach the 1M-token context window limit. While large windows are powerful, they can lead to context rot, where the model loses focus on early instructions.
The Feature: Use the compact_20260112 strategy (requires the compact-2026-01-12 beta header).
- How it works: When the conversation hits a specific token
trigger(minimum 50,000 tokens), Claude automatically generates a concise summary of the history and replaces the older, "stale" content with a compaction block. - Key Parameter:
pause_after_compaction. Set this totrueif you want to inspect the summary or add manual context (like a "must-remember" brand rule) before the model continues its marketing task.
You must pass the resulting compaction block back to the API on all subsequent turns. When the API receives a compaction block, it automatically drops all messages prior to it, effectively resetting the context while preserving the narrative summary.
2. Resilient Tooling: Structured Error Responses
In production agentic loops, tools will fail (e.g., a CRM API is down or a web search returns no results). A senior architect doesn't just return a raw string error, they use Structured Error Responses to guide Claude's next decision.
Implementation Task: When your custom "Lead Generation" tool fails, return a JSON object with these specific fields:
errorCategory, Categorize the failure (e.g.,API_TIMEOUT,INVALID_PERMISSIONS).isRetryable, A boolean telling Claude if it should try the same action again or pivot to a different marketing strategy.message, A human-readable explanation the model can reason over.
3. Architectural Strategy: Task Decomposition
To prevent a single agent from becoming overwhelmed by a "Master Marketing Plan," you must implement Task Decomposition.
Pattern: Instead of one session for "All Marketing," break the work into discrete sub-tasks:
- Analyst Subagent: Performs search and data gathering.
- Copywriter Subagent: Generates email/social drafts based on analysis.
- Coordinator: Uses the Advisor tool (Module 3) to review subagent outputs against the high-level strategy.
This keeps individual context windows small, improves reliability, and allows you to use different models (e.g., Haiku for research, Opus for coordination) to optimize costs.
4. Reliability & Prompt Caching
To maintain performance during compaction, follow the Cache-System-Prompt pattern:
- Place a
cache_controlbreakpoint at the very end of your system prompt. - When compaction occurs and a new summary is written, the system prompt cache remains valid. This ensures you only pay for the "write" of the new summary, rather than re-caching your entire 20,000-token marketing instruction set.
5. The Case Facts Pattern
Compaction summarizes conversation history, but summaries can lose precise transactional details, agreed-upon meeting dates, signed contract terms, confirmed budget figures. The Case Facts pattern prevents this loss.
- Pattern: Extract critical, non-negotiable facts into a persistent
"case_facts"block placed after the compaction block, outside the summary. This block is never compacted. - Implementation: After each session turn, update the case facts block with any newly confirmed data before the next compaction trigger fires.
6. Context Awareness: Remaining Token Budget
Claude Sonnet 4.6 and Haiku 4.5 can track their remaining token budget during a session. As the context window fills, the model can signal that it is approaching capacity, enabling your orchestrator to trigger early compaction, summarize, or hand off to a subagent before the window is exhausted.
- Use case: Long-running research loops where you cannot predict how many tool calls will be needed.
- Pattern: Poll
usage.context_window_remainingin the response and trigger compaction proactively at 80% utilization rather than waiting for a hard limit.
7. Thinking Persistence & Context Exhaustion
On Opus 4.5+ and Sonnet 4.6+, thinking blocks from previous turns are kept in context by default, they are not stripped the way earlier models stripped them. This is intentional for reasoning continuity, but has a critical side effect in long conversations.
- Risk: Each turn's thinking blocks accumulate in the context window. A 20-turn conversation with 8,000-token thinking blocks per turn consumes 160,000 tokens of context for thinking alone.
- Mitigation: Use server-side compaction (
compact_20260112) with a lower trigger threshold when Adaptive Thinking is active. Or explicitly omit thinking output ("thinking_output": "omitted") on turns where continuity is not needed.
Exam Readiness Checklist
You have now built a robust architectural blueprint for an autonomous agent. Before your exam, make sure you can answer:
- ✓ Which features are ZDR-eligible (Adaptive Thinking, Compaction, Citations) and which are not (Batch API, Files API)?
- ✓ How do you use the
CLAUDE.mdhierarchy and.claude/rules/to scope brand voice in Claude Code? - ✓ When should you choose Adaptive Thinking over the deprecated
budget_tokens? (Always for Opus 4.7) - ✓ What are the Structured Outputs limits? (20 strict tools, 24 optional params, 16 union types)
- ✓ What is the Batch API discount and its ZDR eligibility?
Lab Exercise: Compaction & Long-Horizon Reliability
Self-driven lab Module10_Self_Driven_Lab.ipynbObjective: preserve important facts while keeping long-running agent sessions inside context limits.
- Create a persistent case-facts block with non-negotiable data.
- Configure or sketch server-side compaction with instructions to preserve case facts verbatim.
- Simulate a subagent failure report with category, retryability, partial findings, and recommended next action.
- Count or estimate tokens before and after compaction and document the operational threshold.
A long-horizon session management plan for research agents.