{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Module 3 — Advanced Agent Orchestration\n",
    "\n",
    "The **Executor-Advisor pattern**: a cheap, fast executor (Sonnet 4.6 at `effort=\"medium\"`) handles the bulk of generation, while an Opus 4.7 advisor at `xhigh` effort steps in mid-generation for strategic course correction. Both run inside a single `/v1/messages` request — no extra round trips.\n",
    "\n",
    "> **Beta header required:** `advisor-tool-2026-03-01`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import anthropic\n",
    "from dotenv import load_dotenv\n",
    "\n",
    "load_dotenv()\n",
    "client = anthropic.Anthropic()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Executor-Advisor with effort tuning + advisor-side caching\n",
    "\n",
    "The advisor is configured as a server-side tool (`advisor_20260301`). Anthropic runs the advisor inference server-side with the full transcript and returns just the final advice as an `advisor_tool_result`. Advisor thinking blocks are dropped.\n",
    "\n",
    "**Caching note:** advisor-side caching costs more than it saves at one or two calls; it breaks even around three calls per conversation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "response = client.messages.create(\n    model=\"claude-sonnet-4-6\",   # cheap/fast executor\n    max_tokens=4096,\n    tools=[\n        {\n            \"type\": \"advisor_20260301\",\n            \"name\": \"advisor\",\n            \"model\": \"claude-opus-4-7\",          # high-intelligence advisor\n            \"effort\": \"xhigh\",                   # deep reasoning; Opus 4.7 only\n            \"max_uses\": 3,\n            \"caching\": {\"type\": \"ephemeral\", \"ttl\": \"1h\"},  # breaks even at ~3 calls\n        }\n    ],\n    messages=[{\n        \"role\": \"user\",\n        \"content\": \"Develop a complex multi-channel marketing launch for our new AI tool.\",\n    }],\n    extra_headers={\"anthropic-beta\": \"advisor-tool-2026-03-01\"},\n)\n\nprint(\"executor stop_reason:\", response.stop_reason)\nprint(\"executor input/output tokens:\", response.usage.input_tokens, \"/\", response.usage.output_tokens)"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Read advisor billing from `usage.iterations`\n",
    "\n",
    "Top-level `usage.input_tokens` / `usage.output_tokens` show **only the executor's** usage. Advisor calls appear as separate `advisor_message` iterations and are billed at the advisor model's rates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "# Iterations are only populated if a sub-inference (Advisor, Compaction) occurred\nif hasattr(response.usage, \"iterations\") and response.usage.iterations:\n    for i, it in enumerate(response.usage.iterations):\n        # Use .get() to safely access dictionary keys\n        kind = it.get(\"type\", \"?\")\n        input_tokens = it.get(\"input_tokens\", 0)\n        output_tokens = it.get(\"output_tokens\", 0)\n\n        print(f\"Iteration {i}: Type='{kind}' | Input={input_tokens} | Output={output_tokens}\")\n\n        if kind == \"advisor_message\":\n            # Advisor sub-inferences are billed at the advisor model's specific rates\n            print(f\"  -> Billed at Advisor rates\")\n        elif kind == \"message\":\n            print(f\"  -> Billed at Executor rates\")\nelse:\n    print(\"No iterations recorded.\")"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}