Module 3

Advanced Agent Orchestration

This module teaches you how to optimize your autonomous agent for both high intelligence and cost-efficiency using the Advisor tool and the effort parameter. A Sonnet 4.6 executor at medium effort handles the bulk of generation; an Opus 4.7 advisor at high or xhigh effort provides strategic guidance mid-generation without extra round-trips.

Answer key Module3_Complete.ipynb

1. The Core Concept: Tactical vs. Strategic Reasoning

In marketing automation, many tasks are mechanical (e.g., formatting an email or extracting data), but the high-level strategy (e.g., deciding which market segment to target) requires the deepest reasoning.

  • Executor Model: Handles the bulk of token generation at lower rates.
  • Advisor Model: Reads the full transcript and provides a high-level plan or course correction (typically 400–700 text tokens) to keep the executor on track.

2. Model Compatibility

For the architect certification, remember that the advisor must be at least as capable as the executor. Valid pairs include:

  • Executor: Claude Sonnet 4.6 → Advisor: Claude Opus 4.7
  • Executor: Claude Haiku 4.5 → Advisor: Claude Opus 4.7
  • Executor: Claude Opus 4.6 → Advisor: Claude Opus 4.7

3. Mechanics of the Sub-Inference

The entire process happens within a single /v1/messages request, no extra round trips are required on the client side.

  • The executor model emits a server_tool_use block for the advisor.
  • Anthropic runs a separate, server-side inference on the advisor model, passing it the full transcript (system prompt, tools, and history).
  • The advisor's thinking blocks are dropped; only the final advice reaches the executor as an advisor_tool_result.
  • The executor resumes generating, informed by the advisor's strategic plan.

4. Advisor-Side Caching

For long-horizon marketing research involving repeated calls, you should enable advisor-side caching in the tool definition.

  • Break-even Point: Caching costs more than it saves for one or two calls. It generally breaks even at three advisor calls per conversation and improves thereafter.
  • Implementation: Set the caching parameter to {"type": "ephemeral", "ttl": "1h"} to preserve the advisor's transcript across multiple calls.

5. Guiding Throughput with Effort

The effort parameter is soft guidance for how thoroughly Claude reasons across all tokens, text, tool calls, and thinking. The strategic pairing for this pattern:

  • Executor (Sonnet 4.6): Set effort="medium" for balanced cost and quality on high-volume generation tasks.
  • Advisor (Opus 4.7): Set effort="high" or effort="xhigh" for deep strategic reasoning. xhigh is exclusive to Opus 4.7 and recommended for long-horizon agentic and coding tasks.

6. Implementation Task

Add the advisor tool to your strategist's tool configuration with effort tuning.

Beta Header Required: advisor-tool-2026-03-01

Python
# Executor-Advisor pattern with effort tuning
response = client.messages.create(
    model="claude-sonnet-4-6",  # Faster, lower-cost executor
    max_tokens=4096,
    tools=[
        {
            "type": "advisor_20260301",
            "name": "advisor",
            "model": "claude-opus-4-7",          # High-intelligence advisor
            "effort": "xhigh",                   # Deep reasoning; Opus 4.7 only
            "max_uses": 3,
            "caching": {"type": "ephemeral", "ttl": "1h"}  # Breaks even at ~3 calls
        }
    ],
    messages=[{"role": "user", "content": "Develop a complex multi-channel marketing launch for our new AI tool."}]
)

6. Usage and Billing Analysis

  • Separate Iterations: Advisor calls are billed as advisor_message iterations at the advisor model's rates.
  • Top-level usage: The top-level input_tokens and output_tokens in the API response represent only the executor's usage; you must inspect the usage.iterations array for a full cost breakdown.
  • Iteration items are dicts: use .get() for safe access. The list is empty unless a sub-inference (Advisor, Compaction) actually fired.
Python
# Iterations are only populated if a sub-inference (Advisor, Compaction) occurred
if hasattr(response.usage, "iterations") and response.usage.iterations:
    for i, it in enumerate(response.usage.iterations):
        # Use .get() to safely access dictionary keys
        kind = it.get("type", "?")
        input_tokens = it.get("input_tokens", 0)
        output_tokens = it.get("output_tokens", 0)

        print(f"Iteration {i}: Type='{kind}' | Input={input_tokens} | Output={output_tokens}")

        if kind == "advisor_message":
            # Advisor sub-inferences are billed at the advisor model's specific rates
            print(f"  -> Billed at Advisor rates")
        elif kind == "message":
            print(f"  -> Billed at Executor rates")
else:
    print("No iterations recorded.")
Architect Tip for the Exam

When using the advisor tool with streaming, be aware that the advisor sub-inference does not stream. The executor's stream will pause while the advisor runs, and the advisor's result will arrive in a single event once the sub-inference is complete.

Lab Exercise: Executor-Advisor Strategy

Self-driven lab Module3_Self_Driven_Lab.ipynb

Objective: compare tactical generation with advisor-assisted strategic reasoning.

  1. Configure a Sonnet executor with an Opus advisor and the required beta header.
  2. Run the same strategic prompt with and without the advisor tool.
  3. Inspect `usage.iterations` and separate executor costs from advisor costs.
  4. Adjust advisor effort and caching, then explain when the extra cost is justified.
Expected Deliverable

A cost and quality comparison for advisor-assisted orchestration.