Module 3

Tool Design & Tool Choice Strategy

Once you decide to build an agent (Module 2), the model is the one picking which tool to call. This module gives you the levers that control that decision: disambiguating tool descriptions, the tool_choice parameter for forcing prerequisites, strict-mode schema validation, and the agentic loop that ties it all together.

Answer key Module3_Complete.ipynb

1. Concepts: Model-Driven Routing

In an agentic system, the model picks which tool to call, that decision is driven by the tool's name, description, schema, and the surrounding system prompt. Workflows hard-code the call site; agents read your tool surface and route. Four levers shape that routing:

Tool descriptions are the primary selection mechanism. Claude reads the name and description on each tool to choose. If two tools have overlapping purposes (e.g. analyze_content vs. analyze_document), Claude will misroute, write descriptions that disambiguate exactly when each tool is appropriate.
Functional overlap must be removed, not merely explained. If two tools do the same job with different names, rename or split them so each one owns a distinct input shape and outcome.
System prompts can override good descriptions. Keyword-sensitive instructions like "for any document, call analyze_document" can drag routing away from a better tool description. Audit prompt rules for accidental keyword traps.
tool_choice forces or constrains routing on a per-request basis: require any tool call, require a specific tool, or let the model decide.
Strict mode (strict: True) uses grammar-constrained sampling to force tool inputs to match your JSON schema 100%. No trailing commas, no missing required fields, no surprise keys.

1a. Eliminating Functional Overlap

"Tool description" is the most under-engineered part of most agentic systems. A bad description is one that tells Claude what the tool does; a good description tells Claude when to pick it over its siblings. A better design removes the sibling conflict entirely. Compare:

Python (anti-pattern: ambiguous)

# BAD: descriptions overlap, Claude will guess.
tools = [
    {"name": "analyze_content",  "description": "Analyzes content."},
    {"name": "analyze_document", "description": "Analyzes a document."},
]

Python (good: disambiguated)

# GOOD: each description names what the OTHER tool is NOT for.
tools = [
    {
        "name": "analyze_content",
        "description": (
            "Analyzes a single piece of inline text or HTML passed as a string "
            "(blog post body, email draft, social copy). Use this when the caller "
            "has the content in-hand. DO NOT use for files on disk or document IDs, "
            "use analyze_document for those."
        ),
    },
    {
        "name": "analyze_document",
        "description": (
            "Analyzes a stored document referenced by document_id (PDF, Word, "
            "uploaded file). Use this when the caller has only an identifier or "
            "a file path. DO NOT use for raw inline text, use analyze_content."
        ),
    },
]

The pattern: state the input shape, the canonical use case, and an explicit "DO NOT use for X, use sibling_tool instead" clause. That last clause is what eliminates misrouting between similar tools.

Python (lab task: rename the overlapping tool)

# STARTING POINT: both tools sound like generic analysis.
tools = [
    {"name": "analyze_content", "description": "Analyze content and return key points."},
    {"name": "analyze_document", "description": "Analyze a document and return key points."},
]

# TARGET DESIGN: split by function and input source.
tools = [
    {
        "name": "analyze_uploaded_document",
        "description": (
            "Extracts claims and entities from an uploaded PDF/DOCX by document_id. "
            "Use only when the file already exists in document storage. Do not use "
            "for web search results or raw HTML."
        ),
    },
    {
        "name": "extract_web_results",
        "description": (
            "Extracts title, URL, snippet, publication date, and source credibility "
            "from web_search results. Use only after web_search/web_fetch. Do not "
            "use for uploaded files or inline drafts."
        ),
    },
]

1b. Prompt Audit for Keyword-Sensitive Instructions

Even a clean toolbox can misroute if the system prompt contains broad keyword rules. Review instructions for phrases that bind routing to words in the user request instead of the tool's true preconditions.

Prompt audit

Risky: "Whenever the user mentions a document, call analyze_document."
Better: "When the user provides a stored document_id or file path, use analyze_uploaded_document.
When the user asks about search output or fetched web pages, use extract_web_results."

1c. Controlling Routing with `tool_choice`

Descriptions guide the model's preference; tool_choice overrides it. Three values matter for the exam:

{"type": "auto"} (default), Claude decides whether to use a tool, which one, and may also reply with plain text.
{"type": "any"}, Claude must emit a tool call (no plain-text reply allowed). Use this when you want a structured response and any tool is acceptable.
{"type": "tool", "name": "verify_identity"}, Claude must call the named tool on this turn. Use this to enforce a prerequisite, e.g., identity verification before any account action.

Python (forced prerequisite, then any-tool)

import anthropic
from dotenv import load_dotenv

load_dotenv()  # reads ANTHROPIC_API_KEY from your .env file

client = anthropic.Anthropic()

# Turn 1: force identity verification before anything else can happen.
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=2048,
    tools=tools,
    tool_choice={"type": "tool", "name": "verify_identity"},
    messages=[{"role": "user", "content": "Refund my last order."}],
)

# ... append assistant tool_use and the tool_result for verify_identity ...

# Turn 2: identity is confirmed. Force a tool-based response (no chit-chat).
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=2048,
    tools=tools,
    tool_choice={"type": "any"},   # must call SOME tool, model picks which
    messages=messages,
)

Architect Tip for the Exam

The exam loves the prerequisite pattern: tool_choice={"type":"tool","name":...} on the first turn forces a specific call (verify identity, fetch customer, look up account); on subsequent turns you switch to {"type":"any"} or {"type":"auto"}. Combine this with Module 9's hooks for defense in depth, tool_choice nudges the model, hooks guarantee the prerequisite at execution time.

1d. Access Failures vs. Valid Empty Results

When your code executes a tool, distinguish between two outcomes that look superficially similar but must be reported very differently:

A valid empty result is NOT an error. If a query ran successfully and simply matched nothing (no orders for that customer, zero rows), return the empty list as a normal tool_result with isError false. The model should tell the user "no orders found", not retry.
An access failure IS an error. A timeout, a permission denial, or a downstream outage means the tool never produced an answer. Return a structured error with an errorCategory (transient, validation, business, or permission) and an isRetryable flag so the agent knows whether retrying can help.

JSON (valid empty result)

{
  "type": "tool_result",
  "tool_use_id": "toolu_01A",
  "is_error": false,
  "content": "{\"orders\": [], \"query_status\": \"success\"}"
}

JSON (structured access failure)

{
  "type": "tool_result",
  "tool_use_id": "toolu_01B",
  "is_error": true,
  "content": "{\"errorCategory\": \"transient\", \"isRetryable\": true, \"message\": \"Order service timed out after 5s\"}"
}

Exam tip: "The query returned nothing" and "the query could not run" are different signals. Conflating them either makes the agent retry pointlessly on genuinely empty data or give up on a recoverable outage. The full error taxonomy is covered in Modules 7 and 9.

2. Defining the Toolbox

Our scenario is a support/CRM agent with two client-side custom tools: lookup_order reads order history, and add_lead_to_crm writes a new prospect. Client-side means that when Claude emits a tool_use block for either tool, your code is responsible for executing the action and returning the result as a tool_result.

Apply the disambiguation rule from Section 1a: the add_lead_to_crm description states the canonical use case ("a validated prospect") and the precondition ("after industry research is complete"), so Claude won't fire it before the research step has run. Adding strict: True to a tool guarantees, via grammar-constrained sampling, that the tool inputs Claude produces are 100% valid against your JSON schema.

Python

# DEFINE THE TOOLS
# Note: Use 'strict: True' to guarantee schema-valid tool inputs.

tools = [
    {
        "name": "lookup_order",
        "description": (
            "Looks up a customer's order history by customer_id in the order "
            "database. Returns a list of orders (possibly empty). Use this for "
            "any question about existing orders. DO NOT use to create records, "
            "use add_lead_to_crm for new prospects."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "customer_id": {"type": "string"},
            },
            "required": ["customer_id"],
            "additionalProperties": False,
        },
    },
    {
        "name": "add_lead_to_crm",
        "description": "Adds a validated prospect to the enterprise CRM. Use only after industry research is complete.",
        "strict": True,
        "input_schema": {
            "type": "object",
            "properties": {
                "company": {"type": "string"},
                "contact_email": {"type": "string", "format": "email"},
                "budget_range": {"type": ["string", "null"]},   # Nullable to prevent hallucinations
            },
            "required": ["company", "contact_email", "budget_range"],
            "additionalProperties": False,
        },
    },
]

3. The Initial Request & `stop_reason`

Send the user's support request with the toolbox attached. Don't print just the final text, inspect stop_reason and walk the content blocks. On the first turn you should see tool_use (Claude asking your code to run lookup_order), not end_turn.

Python

# THE INITIAL REQUEST
messages = [{
    "role": "user",
    "content": "What did customer C-1042 order last month, and are any orders still open?",
}]

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=8192,
    tools=tools,
    messages=messages,
)

# SHOW OUTPUT
print(f"Stop Reason: {response.stop_reason}")   # Expected: 'tool_use'
for block in response.content:
    if block.type == "tool_use":
        print(f"Claude requested tool: {block.name}")   # Expected: 'lookup_order'
        print(f"Parameters: {block.input}")

4. Handling the Agentic Loop

Both tools are client-side: the API never executes them for you. Claude emits a tool_use block, your code performs the actual lookup or write, and you send the outcome back as a tool_result block before Claude can finish its turn.

Turn 1: Claude emits a tool_use block for lookup_order with the customer_id it extracted from the request.
Turn 2: your application queries the order database and appends a tool_result block (keyed by tool_use_id) on a user turn, then re-calls the API.
Turn 3: Claude reasons over the returned orders. It may request another tool (continuing the loop) or generate the final end_turn response.

Python (client-side loop)

def handle_tool_use(block):
    if block.name == "lookup_order":
        orders = order_db.query(block.input["customer_id"])  # your client-side read
        return {
            "type": "tool_result",
            "tool_use_id": block.id,
            "content": json.dumps({"orders": orders, "query_status": "success"}),
        }
    if block.name == "add_lead_to_crm":
        # crm.write(block.input)  # your client-side write
        return {
            "type": "tool_result",
            "tool_use_id": block.id,
            "content": "ok: lead written",
        }
    raise ValueError(f"unhandled client-side tool: {block.name}")

# Loop until Claude is done: stop_reason drives the control flow.
while response.stop_reason == "tool_use":
    tool_results = [handle_tool_use(b) for b in response.content if b.type == "tool_use"]
    messages.append({"role": "assistant", "content": response.content})
    messages.append({"role": "user", "content": tool_results})
    response = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=8192,
        tools=tools,
        messages=messages,
    )

# stop_reason == "end_turn": the task is complete.

5. Verifying the Final Answer

Once the loop terminates with end_turn, inspect the final text blocks and sanity-check them against the tool_result data your code actually returned. If Claude's summary claims three open orders but the lookup returned two, the discrepancy should be caught here, before the answer reaches the user.

Python

# FINAL OUTPUT VERIFICATION
# Compare the final answer to the tool_result data you returned.
assert response.stop_reason == "end_turn"
final_text = "".join(b.text for b in response.content if b.type == "text")
print("Final Answer:", final_text)

open_orders = [o for o in orders if o["status"] == "open"]
if str(len(open_orders)) not in final_text:
    print("WARNING: answer may not match tool data, review before sending.")

Architectural Summary

Safety: nullable types in the schema (["string", "null"]) prevent Claude from inventing a budget figure when one isn't in the source data.
Selection: tool descriptions, not names alone, drive Claude's routing. Disambiguate any tools whose purposes overlap.

Lab Exercise: Disambiguation & Forced Prerequisites

Self-driven lab Module3_Self_Driven_Lab.ipynb

Objective: eliminate functional overlap between tools and enforce mandatory sequences using tool_choice.

Tool disambiguation: define two tools with overlapping descriptions, such as get_user and query_database. Rewrite their descriptions using the [Input Shape] + [Canonical Use Case] + [Boundary] pattern to eliminate misrouting.
Forced selection: use tool_choice: {"type": "tool", "name": "verify_identity"} to ensure identity verification happens on the first turn.
Strict mode: enable strict: true on a complex JSON schema tool. Intentionally send a malformed input and observe the grammar-constrained sampling enforcement.
Empty vs. failed: make lookup_order return an empty result set for one query and a transient structured error (isRetryable: true) for another. Verify the agent tells the user "no orders found" for the first and retries for the second.

1. Concepts: Model-Driven Routing

1a. Eliminating Functional Overlap

1b. Prompt Audit for Keyword-Sensitive Instructions

1c. Controlling Routing with tool_choice

1d. Access Failures vs. Valid Empty Results

2. Defining the Toolbox

3. The Initial Request & stop_reason

4. Handling the Agentic Loop

5. Verifying the Final Answer

Lab Exercise: Disambiguation & Forced Prerequisites

1c. Controlling Routing with `tool_choice`

3. The Initial Request & `stop_reason`