{
    "cells":  [
                  {
                      "cell_type":  "markdown",
                      "metadata":  {

                                   },
                      "source":  [
                                     "*Module 5 Self-Driven Lab*\r\n",
                                     "\r\n",
                                     "# Scaling \u0026 Batch Constraints\r\n",
                                     "\r\n",
                                     "**Objective:** manage 50% cost-saving asynchronous workloads and implement targeted failure recovery.\r\n",
                                     "\r\n",
                                     "## Challenge Outline\r\n",
                                     "\r\n",
                                     "Build a complete notebook that demonstrates the following outcomes:\r\n",
                                     "\r\n",
                                     "- **Batch submission:** prepare a `.jsonl` file with 10 requests, each including a unique `custom_id`. Submit the batch and poll for the `\"ended\"` status.\r\n",
                                     "- **Single-turn constraint:** intentionally include a tool-calling loop in a batch request. Observe why the resulting output is unusable for agentic work.\r\n",
                                     "- **Extended output:** use the `output-300k-2026-03-24` header to generate a long-form intelligence report from a batch.\r\n",
                                     "- **Targeted recovery:** simulate a batch where 2 of the 10 requests fail, such as from context overflow. Walk the result stream, identify failures by `custom_id`, and generate a new, smaller batch for only those failed items.\r\n",
                                     "\r\n",
                                     "Your solution should include enough code, output, or written observations to prove each outcome worked. Keep the notebook focused on final behavior and evidence rather than a guided walkthrough.\n"
                                 ]
                  },
                  {
                      "cell_type":  "markdown",
                      "metadata":  {

                                   },
                      "source":  [
                                     "## Student Workspace\n",
                                     "\n",
                                     "Use the sections below to build your solution. Each section maps to one required outcome from the challenge outline.\n"
                                 ]
                  },
                  {
                      "cell_type":  "markdown",
                      "metadata":  {

                                   },
                      "source":  [
                                     "### Part 1: Batch submission\n",
                                     "\n",
                                     "prepare a `.jsonl` file with 10 requests, each including a unique `custom_id`. Submit the batch and poll for the `\"ended\"` status.\n"
                                 ]
                  },
                  {
                      "cell_type":  "code",
                      "execution_count":  null,
                      "metadata":  {

                                   },
                      "outputs":  [

                                  ],
                      "source":  [
                                     "# Part 1: Batch submission\n",
                                     "# Add your implementation, outputs, or notes here.\n"
                                 ]
                  },
                  {
                      "cell_type":  "markdown",
                      "metadata":  {

                                   },
                      "source":  [
                                     "### Part 2: Single-turn constraint\n",
                                     "\n",
                                     "intentionally include a tool-calling loop in a batch request. Observe why the resulting output is unusable for agentic work.\n"
                                 ]
                  },
                  {
                      "cell_type":  "code",
                      "execution_count":  null,
                      "metadata":  {

                                   },
                      "outputs":  [

                                  ],
                      "source":  [
                                     "# Part 2: Single-turn constraint\n",
                                     "# Add your implementation, outputs, or notes here.\n"
                                 ]
                  },
                  {
                      "cell_type":  "markdown",
                      "metadata":  {

                                   },
                      "source":  [
                                     "### Part 3: Extended output\n",
                                     "\n",
                                     "use the `output-300k-2026-03-24` header to generate a long-form intelligence report from a batch.\n"
                                 ]
                  },
                  {
                      "cell_type":  "code",
                      "execution_count":  null,
                      "metadata":  {

                                   },
                      "outputs":  [

                                  ],
                      "source":  [
                                     "# Part 3: Extended output\n",
                                     "# Add your implementation, outputs, or notes here.\n"
                                 ]
                  },
                  {
                      "cell_type":  "markdown",
                      "metadata":  {

                                   },
                      "source":  [
                                     "### Part 4: Targeted recovery\n",
                                     "\n",
                                     "simulate a batch where 2 of the 10 requests fail, such as from context overflow. Walk the result stream, identify failures by `custom_id`, and generate a new, smaller batch for only those failed items.\n"
                                 ]
                  },
                  {
                      "cell_type":  "code",
                      "execution_count":  null,
                      "metadata":  {

                                   },
                      "outputs":  [

                                  ],
                      "source":  [
                                     "# Part 4: Targeted recovery\n",
                                     "# Add your implementation, outputs, or notes here.\n"
                                 ]
                  },
                  {
                      "cell_type":  "markdown",
                      "metadata":  {

                                   },
                      "source":  [
                                     "### Verification Notes\n",
                                     "\n",
                                     "Summarize the evidence that each part worked. Capture API signals, validation outcomes, errors, recovery behavior, cost observations, or comparisons required by this lab.\n"
                                 ]
                  },
                  {
                      "cell_type":  "code",
                      "execution_count":  null,
                      "metadata":  {

                                   },
                      "outputs":  [

                                  ],
                      "source":  [
                                     "# Verification notes\n",
                                     "# Record the evidence that proves each lab outcome worked.\n"
                                 ]
                  },
                  {
                      "cell_type":  "markdown",
                      "metadata":  {

                                   },
                      "source":  [
                                     "---\n",
                                     "\n",
                                     "## Answer Key\n",
                                     "\n",
                                     "The cells below contain the completed reference implementation/content for this module. Use this section only after attempting the self-driven lab."
                                 ]
                  },
                  {
                      "cell_type":  "markdown",
                      "metadata":  {

                                   },
                      "source":  [
                                     "*Module 5 of 12*\n",
                                     "\n",
                                     "# Scaling \u0026 Batch Constraints\n",
                                     "\n",
                                     "The Messages API is synchronous and built for the agentic loop, the model takes a turn, you respond, repeat. The Message Batches API is the opposite: asynchronous, single-shot, and 50% cheaper, but stripped of the multi-turn tool-call machinery agents depend on. This module covers the trade-off and the failure-handling pattern that keeps a 100,000-request batch recoverable.\n",
                                     "\n",
                                     "\n",
                                     "## 1. Synchronous Loops vs. Asynchronous Batches\n",
                                     "\n",
                                     "Batches are not just \"the same API, cheaper.\" They are a fundamentally different execution model, and using them well means knowing when *not* to reach for them.\n",
                                     "\n",
                                     "|  | Synchronous Messages (agentic loop) | Message Batches (async) |\n",
                                     "| --- | --- | --- |\n",
                                     "| Latency | Real-time, per turn | Up to 24h, no guarantees on order |\n",
                                     "| Pricing | Standard | **50% discount** on input + output |\n",
                                     "| Multi-turn tool use | Yes, the loop is the point | **No, single request only** |\n",
                                     "| ZDR-eligible | Yes | No (results stored 29 days) |\n",
                                     "| Use it for | Agents, chats, anything iterative | Fan-out generation, classification, extraction over fixed inputs |\n",
                                     "\n",
                                     "The decision rule: if the work needs the model to call a tool, see the result, and reason again, you need synchronous Messages (or the Agent SDK). If you can express the work as N independent prompts whose results you only need eventually, batches are the right tool and you\u0027ll pay half.\n",
                                     "\n",
                                     "## 2. The Economics of Scale\n",
                                     "\n",
                                     "- **50% Discount:** All usage in a batch is charged at half the standard API price for both input and output tokens.\n",
                                     "- **Throughput:** Batches allow significantly higher concurrency than synchronous requests.\n",
                                     "- **Latency Trade-off:** Most batches complete in under 1 hour; the API guarantees completion within **24 hours**.\n",
                                     "\n",
                                     "## 3. Batch Constraint: No Multi-Turn Tool Calling\n",
                                     "\n",
                                     "Each request inside a batch is a **single, one-shot inference**. If Claude emits a `tool_use` block inside a batched request, there is no second turn, no `tool_result` can be sent back, no `end_turn` follow-up will happen. The result line in the `.jsonl` simply contains the unanswered `tool_use` and the response is effectively unusable.\n",
                                     "\n",
                                     "- **What\u0027s safe in a batch:** direct generation (drafts, summaries, classifications), extraction with structured outputs, anything that finishes in one model turn.\n",
                                     "- **What is *not* safe in a batch:** server-side built-ins that require multiple turns to come back with results (e.g. `web_search` followed by reasoning), client-side custom tools, the Advisor pattern, hub-and-spoke delegation.\n",
                                     "\n",
                                     "Practical implication: do your tool-driven research with synchronous Messages or the Agent SDK first, *then* use a batch to fan that research out into 1,000 personalized emails. Don\u0027t try to do both inside the batch.\n",
                                     "\n",
                                     "## 4. Extended Output: 300k Tokens\n",
                                     "\n",
                                     "To generate \"book-length\" assets, complete technical guides or comprehensive market reports, use the extended output beta header.\n",
                                     "\n",
                                     "- **Beta Header:** Include `output-300k-2026-03-24` in your request headers.\n",
                                     "- **Capacity:** Raises `max_tokens` to **300,000 tokens** per turn on Claude Opus 4.7 and Sonnet 4.6.\n",
                                     "\n",
                                     "\u003e **Tip.** ZDR Warning\n",
                                     "The Message Batches API is **not ZDR-eligible**. Inputs and outputs are stored server-side until the batch completes, with results available for download for **29 days**. Do not use batch processing for requests containing client PHI if ZDR is required.\n",
                                     "\n",
                                     "## 5. Batch Lifecycle \u0026 `custom_id`\n",
                                     "\n",
                                     "Results are returned asynchronously and **not in submission order**. The `custom_id` is the only way to map outputs back to inputs.\n",
                                     "\n",
                                     "- **Creation:** Submit up to 100,000 requests. Each must have a **unique `custom_id`**.\n",
                                     "- **Tracking:** Poll `processing_status` until it reaches `\"ended\"`.\n",
                                     "- **Retrieval:** Results are in **`.jsonl` format**, one line per request (succeeded / errored / canceled / expired).\n",
                                     "\n",
                                     "## 6. Implementation Task"
                                 ]
                  },
                  {
                      "cell_type":  "code",
                      "execution_count":  null,
                      "metadata":  {

                                   },
                      "outputs":  [

                                  ],
                      "source":  [
                                     "import anthropic\n",
                                     "from dotenv import load_dotenv\n",
                                     "from anthropic.types.message_create_params import MessageCreateParamsNonStreaming\n",
                                     "from anthropic.types.messages.batch_create_params import Request\n",
                                     "\n",
                                     "load_dotenv()  # reads ANTHROPIC_API_KEY from your .env file\n",
                                     "\n",
                                     "client = anthropic.Anthropic()\n",
                                     "\n",
                                     "message_batch = client.messages.batches.create(\n",
                                     "    requests=[\n",
                                     "        Request(\n",
                                     "            custom_id=\"prospect-fintech-001\",  # unique ID maps results back to input\n",
                                     "            params=MessageCreateParamsNonStreaming(\n",
                                     "                model=\"claude-sonnet-4-6\",\n",
                                     "                max_tokens=300000,\n",
                                     "                extra_headers={\"anthropic-beta\": \"output-300k-2026-03-24\"},\n",
                                     "                messages=[{\"role\": \"user\", \"content\": \"Write a 200-page AI consulting guide for Fintech CTOs.\"}]\n",
                                     "            )\n",
                                     "        ),\n",
                                     "        Request(\n",
                                     "            custom_id=\"prospect-healthcare-002\",\n",
                                     "            params=MessageCreateParamsNonStreaming(\n",
                                     "                model=\"claude-sonnet-4-6\",\n",
                                     "                max_tokens=1024,\n",
                                     "                messages=[{\"role\": \"user\", \"content\": \"Write a personalized outreach email for a Healthcare CEO.\"}]\n",
                                     "            )\n",
                                     "        )\n",
                                     "    ]\n",
                                     ")"
                                 ]
                  },
                  {
                      "cell_type":  "markdown",
                      "metadata":  {

                                   },
                      "source":  [
                                     "## 7. Task: Retrieving and Mapping Batch Results\n",
                                     "\n",
                                     "Because batches are processed asynchronously, your application must poll for completion before results can be accessed. Once complete, results are streamed to handle up to 100,000 responses without memory overflow.\n",
                                     "\n",
                                     "### Polling for Completion\n",
                                     "\n",
                                     "A batch\u0027s `processing_status` starts as `\"in_progress\"`. Poll the retrieve endpoint until it reaches `\"ended\"`, indicating all requests have finished (succeeded, errored, or expired)."
                                 ]
                  },
                  {
                      "cell_type":  "code",
                      "execution_count":  null,
                      "metadata":  {

                                   },
                      "outputs":  [

                                  ],
                      "source":  [
                                     "import time\n",
                                     "\n",
                                     "# Use the ID captured from your creation call\n",
                                     "BATCH_ID = message_batch.id\n",
                                     "\n",
                                     "while True:\n",
                                     "    status_update = client.messages.batches.retrieve(BATCH_ID)\n",
                                     "\n",
                                     "    if status_update.processing_status == \"ended\":\n",
                                     "        print(\"Batch processing complete!\")\n",
                                     "        break\n",
                                     "\n",
                                     "    counts = status_update.request_counts\n",
                                     "    print(f\"Still processing... (Succeeded: {counts.succeeded}, Errored: {counts.errored})\")\n",
                                     "    time.sleep(60)  # poll every 60 seconds"
                                 ]
                  },
                  {
                      "cell_type":  "markdown",
                      "metadata":  {

                                   },
                      "source":  [
                                     "### Streaming and Mapping Results\n",
                                     "\n",
                                     "Use `.results()` to stream responses. Results arrive in `.jsonl` format and are **not in submission order**, always use `custom_id` to map output back to input."
                                 ]
                  },
                  {
                      "cell_type":  "code",
                      "execution_count":  null,
                      "metadata":  {

                                   },
                      "outputs":  [

                                  ],
                      "source":  [
                                     "for result in client.messages.batches.results(BATCH_ID):\n",
                                     "    request_id = result.custom_id\n",
                                     "\n",
                                     "    if result.result.type == \"succeeded\":\n",
                                     "        content = result.result.message.content[0].text\n",
                                     "        print(f\"Success for {request_id}: {content[:50]}...\")\n",
                                     "\n",
                                     "    elif result.result.type == \"errored\":\n",
                                     "        error_type = result.result.error.error.type\n",
                                     "        print(f\"Error for {request_id}: {error_type}\")\n",
                                     "\n",
                                     "    elif result.result.type == \"expired\":\n",
                                     "        print(f\"Request {request_id} timed out (24-hour limit reached).\")\n",
                                     "\n",
                                     "    elif result.result.type == \"canceled\":\n",
                                     "        print(f\"Request {request_id} was canceled before completion.\")"
                                 ]
                  },
                  {
                      "cell_type":  "markdown",
                      "metadata":  {

                                   },
                      "source":  [
                                     "\u003e **Tip.** Architect Rules for the Exam\n",
                                     "\n",
                                     "**Result Retention:** Batch results are available for download for **29 days** after creation. After that, metadata remains but result files are deleted. \n",
                                     "**Billing Logic:** You are only billed for requests that **succeed**. Errored, expired, and canceled requests are not charged. \n",
                                     "**ID Format:** All valid Batch IDs begin with the `msgbatch_` prefix. \n",
                                     "**Authentication:** The `results_url` is a protected endpoint, provide your `x-api-key` even when downloading directly via curl.\n",
                                     "\n",
                                     "## 8. Handling Failures: Targeted Resubmission by `custom_id`\n",
                                     "\n",
                                     "In a 100,000-request batch, some failures are inevitable: a few inputs will be longer than the model\u0027s context window, a transient error will hit a handful, others will time out. The correct response is **not** to resubmit the entire batch, that doubles your bill and re-runs the 99,000 requests that already succeeded. Instead, walk the result stream, collect the failures by their `custom_id`, fix the underlying issue (chunk the oversized inputs, narrow a query), and resubmit only the failures as a new, much smaller batch.\n",
                                     "\n",
                                     "*build a resubmit batch from failures*"
                                 ]
                  },
                  {
                      "cell_type":  "code",
                      "execution_count":  null,
                      "metadata":  {

                                   },
                      "outputs":  [

                                  ],
                      "source":  [
                                     "# 1. Walk results, separating succeeded from recoverable failures.\n",
                                     "to_resubmit = []   # list of (custom_id, original_prompt, error_type)\n",
                                     "for result in client.messages.batches.results(BATCH_ID):\n",
                                     "    if result.result.type == \"succeeded\":\n",
                                     "        continue\n",
                                     "    original_prompt = original_prompts_by_id[result.custom_id]   # your local lookup\n",
                                     "    err = result.result.error.error.type if result.result.type == \"errored\" else result.result.type\n",
                                     "    to_resubmit.append((result.custom_id, original_prompt, err))\n",
                                     "\n",
                                     "# 2. Repair each failure based on its error type. Most common: context overflow.\n",
                                     "def repair(prompt: str, err: str) -\u003e list[str]:\n",
                                     "    if err == \"invalid_request_error\":          # often: input too long\n",
                                     "        return chunk_into_pieces(prompt, max_chars=80_000)\n",
                                     "    return [prompt]   # transient errors: just resubmit as-is\n",
                                     "\n",
                                     "# 3. Build the resubmit batch. Reuse custom_id (suffix -part-N for chunks)\n",
                                     "#    so you can stitch outputs back to the original record.\n",
                                     "resubmit_requests = []\n",
                                     "for cid, prompt, err in to_resubmit:\n",
                                     "    for i, piece in enumerate(repair(prompt, err)):\n",
                                     "        resubmit_requests.append(\n",
                                     "            Request(\n",
                                     "                custom_id=f\"{cid}\" if len(repair(prompt, err)) == 1 else f\"{cid}-part-{i}\",\n",
                                     "                params=MessageCreateParamsNonStreaming(\n",
                                     "                    model=\"claude-sonnet-4-6\",\n",
                                     "                    max_tokens=4096,\n",
                                     "                    messages=[{\"role\": \"user\", \"content\": piece}],\n",
                                     "                ),\n",
                                     "            )\n",
                                     "        )\n",
                                     "\n",
                                     "if resubmit_requests:\n",
                                     "    repair_batch = client.messages.batches.create(requests=resubmit_requests)\n",
                                     "    print(f\"Resubmitted {len(resubmit_requests)} requests as {repair_batch.id}\")"
                                 ]
                  },
                  {
                      "cell_type":  "markdown",
                      "metadata":  {

                                   },
                      "source":  [
                                     "Lab checkpoint: download or stream the `.jsonl` results, identify every failure by `custom_id`, fix the underlying cause before resubmission, and create a new batch containing only those repaired IDs. For context-limit failures, demonstrate chunking the oversized source document and suffixing the original `custom_id` with `-part-N`.\n",
                                     "\n",
                                     "\u003e **Tip.** Architect Tip for the Exam\n",
                                     "Two things to internalize: (1) `custom_id` is your *only* handle on a failed request, never randomize it, derive it from your source-of-truth ID so you can look the original input back up. (2) Suffix the `custom_id` when you split one request into chunks (`doc-001-part-0`, `doc-001-part-1`) so the merge step downstream stays unambiguous.\n",
                                     "\n",
                                     "## 9. Stacking Discounts with Prompt Caching\n",
                                     "\n",
                                     "Add `cache_control` blocks to identical prefix content (system prompt, shared context) across all requests in the batch. Cache hits are provided on a best-effort basis, include the same breakpoints in every request to maximize hit rates (typically 30% to 98%)."
                                 ]
                  },
                  {
                      "cell_type":  "markdown",
                      "metadata":  {

                                   },
                      "source":  [
                                     "## Targeted Resubmission Drill\n",
                                     "\n",
                                     "Walk a `.jsonl` result stream, identify failures by `custom_id`, fix the underlying issue, and resubmit only the repaired IDs. This simulates chunking an oversized document before resubmission."
                                 ]
                  },
                  {
                      "cell_type":  "code",
                      "execution_count":  null,
                      "metadata":  {

                                   },
                      "outputs":  [

                                  ],
                      "source":  [
                                     "import json\n",
                                     "\n",
                                     "jsonl_results = \u0027{\"custom_id\": \"doc-001\", \"result\": {\"type\": \"succeeded\"}}\\n{\"custom_id\": \"doc-002\", \"result\": {\"type\": \"errored\", \"error\": {\"error\": {\"type\": \"invalid_request_error\", \"message\": \"input exceeds context limit\"}}}}\\n{\"custom_id\": \"doc-003\", \"result\": {\"type\": \"errored\", \"error\": {\"error\": {\"type\": \"api_error\", \"message\": \"transient service error\"}}}}\u0027\n",
                                     "\n",
                                     "original_prompts_by_id = {\n",
                                     "    \u0027doc-001\u0027: \u0027short prospect summary\u0027,\n",
                                     "    \u0027doc-002\u0027: \u0027A\u0027 * 210_000,\n",
                                     "    \u0027doc-003\u0027: \u0027standard classification request\u0027,\n",
                                     "}\n",
                                     "\n",
                                     "def chunk_into_pieces(text, max_chars=80_000):\n",
                                     "    return [text[i:i + max_chars] for i in range(0, len(text), max_chars)]\n",
                                     "\n",
                                     "failures = []\n",
                                     "for line in jsonl_results.splitlines():\n",
                                     "    result = json.loads(line)\n",
                                     "    if result[\u0027result\u0027][\u0027type\u0027] == \u0027succeeded\u0027:\n",
                                     "        continue\n",
                                     "    error_type = result[\u0027result\u0027].get(\u0027error\u0027, {}).get(\u0027error\u0027, {}).get(\u0027type\u0027, result[\u0027result\u0027][\u0027type\u0027])\n",
                                     "    failures.append({\n",
                                     "        \u0027custom_id\u0027: result[\u0027custom_id\u0027],\n",
                                     "        \u0027prompt\u0027: original_prompts_by_id[result[\u0027custom_id\u0027]],\n",
                                     "        \u0027error_type\u0027: error_type,\n",
                                     "    })\n",
                                     "\n",
                                     "resubmit_requests = []\n",
                                     "for failure in failures:\n",
                                     "    pieces = chunk_into_pieces(failure[\u0027prompt\u0027]) if failure[\u0027error_type\u0027] == \u0027invalid_request_error\u0027 else [failure[\u0027prompt\u0027]]\n",
                                     "    for index, piece in enumerate(pieces):\n",
                                     "        custom_id = failure[\u0027custom_id\u0027] if len(pieces) == 1 else f\"{failure[\u0027custom_id\u0027]}-part-{index}\"\n",
                                     "        resubmit_requests.append({\u0027custom_id\u0027: custom_id, \u0027body\u0027: piece})\n",
                                     "\n",
                                     "print(\u0027Failures:\u0027, [failure[\u0027custom_id\u0027] for failure in failures])\n",
                                     "print(\u0027Resubmitted IDs:\u0027, [request[\u0027custom_id\u0027] for request in resubmit_requests])\n",
                                     "assert [failure[\u0027custom_id\u0027] for failure in failures] == [\u0027doc-002\u0027, \u0027doc-003\u0027]\n",
                                     "assert \u0027doc-001\u0027 not in [request[\u0027custom_id\u0027] for request in resubmit_requests]\n",
                                     "assert any(request[\u0027custom_id\u0027].startswith(\u0027doc-002-part-\u0027) for request in resubmit_requests)\n",
                                     "assert {\u0027doc-003\u0027} \u003c= {request[\u0027custom_id\u0027] for request in resubmit_requests}\n"
                                 ]
                  }
              ],
    "metadata":  {
                     "kernelspec":  {
                                        "display_name":  "Python 3",
                                        "language":  "python",
                                        "name":  "python3"
                                    },
                     "language_info":  {
                                           "codemirror_mode":  {
                                                                   "name":  "ipython",
                                                                   "version":  3
                                                               },
                                           "file_extension":  ".py",
                                           "mimetype":  "text/x-python",
                                           "name":  "python",
                                           "nbconvert_exporter":  "python",
                                           "pygments_lexer":  "ipython3",
                                           "version":  "3.11.0"
                                       }
                 },
    "nbformat":  4,
    "nbformat_minor":  5
}