{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Module 7 — Prompt Engineering for Extraction\n",
    "\n",
    "Two pillars of structured outputs:\n",
    "\n",
    "- **`output_config.format`** — forces Claude's final response into a JSON object matching your schema.\n",
    "- **`strict: true` on tools** — guarantees tool inputs match the schema exactly.\n",
    "\n",
    "Limits: **20** strict tools, **24** optional params, **16** union-type params per request. Citations and structured outputs are mutually exclusive — sending both returns a 400."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import anthropic\n",
    "from dotenv import load_dotenv\n",
    "\n",
    "load_dotenv()\n",
    "client = anthropic.Anthropic()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. JSON output + strict tool input\n",
    "\n",
    "The first request a new schema is sent on incurs grammar-compilation latency; compiled grammars are cached for 24 hours from last use. Changing the schema or the set of tools invalidates the cache."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "response = client.messages.create(\n",
    "    model=\"claude-opus-4-7\",\n",
    "    max_tokens=4096,\n",
    "    output_config={\n",
    "        \"format\": {\n",
    "            \"type\": \"json_schema\",\n",
    "            \"schema\": {\n",
    "                \"type\": \"object\",\n",
    "                \"properties\": {\n",
    "                    \"campaign_name\": {\"type\": \"string\"},\n",
    "                    \"budget_allocation\": {\"type\": \"integer\"},\n",
    "                    \"channels\": {\"type\": \"array\", \"items\": {\"type\": \"string\"}},\n",
    "                },\n",
    "                \"required\": [\"campaign_name\", \"budget_allocation\", \"channels\"],\n",
    "                \"additionalProperties\": False,\n",
    "            },\n",
    "        }\n",
    "    },\n",
    "    tools=[{\n",
    "        \"name\": \"search_market_data\",\n",
    "        \"strict\": True,\n",
    "        \"input_schema\": {\n",
    "            \"type\": \"object\",\n",
    "            \"properties\": {\n",
    "                \"region\": {\"type\": \"string\"},\n",
    "                \"industry\": {\"type\": \"string\"},\n",
    "            },\n",
    "            \"required\": [\"region\", \"industry\"],\n",
    "            \"additionalProperties\": False,\n",
    "        },\n",
    "    }],\n",
    "    messages=[{\"role\": \"user\", \"content\": \"Plan a $10k AI consulting launch in London.\"}],\n",
    ")\n",
    "\n",
    "import json\n",
    "for block in response.content:\n",
    "    if block.type == \"text\":\n",
    "        # Final response is guaranteed to parse as JSON.\n",
    "        print(json.dumps(json.loads(block.text), indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Nullable fields prevent hallucination\n",
    "\n",
    "Use a type array `[\"string\", \"null\"]` so missing data comes back as `null` instead of an invented placeholder. Each nullable field counts against the 16-union-type budget per request."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "prospect_schema = {\n",
    "    \"type\": \"object\",\n",
    "    \"properties\": {\n",
    "        \"company_name\":  {\"type\": \"string\"},\n",
    "        \"contact_email\": {\"type\": \"string\"},\n",
    "        \"budget_range\":  {\"type\": [\"string\", \"null\"]},  # may be unknown\n",
    "        \"decision_date\": {\"type\": [\"string\", \"null\"]},  # may be unknown\n",
    "    },\n",
    "    \"required\": [\"company_name\", \"contact_email\", \"budget_range\", \"decision_date\"],\n",
    "    \"additionalProperties\": False,\n",
    "}\n",
    "\n",
    "import json\n",
    "print(json.dumps(prospect_schema, indent=2))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
