Module 8

Prompt Engineering for Precision: Criteria, Constraints, and Structured Output

This module provides the advanced prompting techniques required for production-grade extraction and classification, ensuring that Claude's routing and output match a target schema 100% of the time. The goal is to move from fuzzy instructions to deterministic precision: explicit criteria, few-shot reasoning, strict schema constraints, hallucination-resistant nullable fields, and validation-retry loops.

Answer key Module8_Complete.ipynb

1. Explicit Categorical Criteria vs. Vague Instructions

Vague instructions like "be conservative" or "only report high-confidence findings" fail because they do not define objective thresholds. Replace adjectives with numeric or behavioral criteria.

Anti-pattern

Classify lead quality as low, medium, or high. Be conservative.

Precise criteria

HIGH
- Revenue is greater than $10M, AND
- Company has an active AI initiative or open data/ML roles.
Example: "$25M revenue and hiring an ML platform lead."

MEDIUM
- Revenue is $2M-$10M, OR
- Revenue is unknown but company has a clear AI adoption signal.
Example: "$6M revenue and recently published an AI case study."

LOW
- Revenue is below $2M, OR
- No AI adoption signal is present.
Example: "$900k revenue and no technical hiring signal."

If evidence supports two categories, choose the higher category.
If revenue and AI signal are both missing, output needs_clarification.

1a. Behavioral Thresholds and Escape Hatches

Behavioral thresholds make classification reproducible: "production is down for all users" is stronger than "severe outage." Escape hatches like needs_clarification (equivalently named unclear or other in exam scenarios) prevent the model from forcing a fuzzy input into a false category.

2. Few-Shot Examples for Generalization

Few-shot examples are the best way to handle ambiguous cases where two tools, fields, or categories look reasonable.

2a. The Reasoning Column

High-quality few-shot examples include a Why: line. The reasoning teaches the rule behind the pattern, so the model generalizes to novel cases rather than merely matching examples.

Few-shot extraction examples

Example 1
Input: "Acme reports $14M ARR and is hiring a VP of AI Platform."
Output: {"lead_quality": "HIGH", "annual_revenue": "$14M", "ai_signal": "hiring VP of AI Platform"}
Why: Revenue is greater than $10M and the hiring signal confirms active AI investment.

Example 2
Input: "BetaCo launched a chatbot pilot, but revenue is not disclosed."
Output: {"lead_quality": "MEDIUM", "annual_revenue": null, "ai_signal": "chatbot pilot"}
Why: Revenue is missing, but a clear AI adoption signal qualifies for MEDIUM rather than LOW.

Example 3
Input: "Gamma LLC has $800k revenue and no AI-related hiring or projects."
Output: {"lead_quality": "LOW", "annual_revenue": "$800k", "ai_signal": null}
Why: Revenue is below $2M and there is no AI signal.

2b. Negative Triggers

Include examples that distinguish acceptable patterns from genuine issues. Negative examples reduce false positives by teaching what not to flag or extract.

3. Structured Output via Tool Use + JSON Schemas

The structured-output mechanism is tool use: define an extraction tool whose input_schema is exactly the output schema you want, force Claude to call it with tool_choice, and read your structured data from the tool_use block's input. The "tool" never executes anything, its schema simply constrains what the model must produce.

Python (extraction tool as output schema)

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=[{
        "name": "record_prospect",
        "description": "Record the extracted prospect profile.",
        "input_schema": {
            "type": "object",
            "properties": {
                "company_name": {"type": "string"},
                "annual_revenue": {"type": ["string", "null"]},
                "lead_quality": {"type": "string", "enum": ["HIGH", "MEDIUM", "LOW"]}
            },
            "required": ["company_name", "annual_revenue", "lead_quality"],
            "additionalProperties": False
        }
    }],
    tool_choice={"type": "tool", "name": "record_prospect"},  # force the extraction
    messages=[{"role": "user", "content": document_text}]
)

# The structured data is the tool_use block's input
data = next(b for b in response.content if b.type == "tool_use").input

Adding strict: true to the tool definition guarantees that tool inputs conform to your schema exactly.

3a. Choosing the Right `tool_choice`

The tool_choice parameter controls how strongly you force structured output. Know all three modes and when each applies:

{"type": "auto"}: the model decides. It may answer in plain text instead of calling a tool, so this is not sufficient when you need guaranteed structured output.
{"type": "any"}: the model must call some tool, but picks which one. Use when you have multiple extraction schemas (e.g. invoice vs. contract vs. resume) and the document type is unknown, the model routes to the right schema but can never fall back to prose.
{"type": "tool", "name": "..."}: the model must call that specific tool. Use to force a prerequisite extraction step, or whenever exactly one output schema is acceptable.

4. Advanced Schema Design & Safety

Safe schemas prevent hallucinations and API errors in multi-step workflows.

4a. Hallucination Prevention with Nullable Fields

Use nullable fields, such as ["string", "null"], for information that may legitimately be missing from a source document. This lets the model return null instead of fabricating placeholder values to satisfy a required field.

JSON (Prospect Profile schema excerpt)

{
  "type": "object",
  "properties": {
    "company_name": {"type": "string"},
    "annual_revenue": {"type": ["string", "null"]},
    "contact_email": {"type": ["string", "null"]},
    "ai_signal": {"type": ["string", "null"]},
    "budget_range": {"type": ["string", "null"]},
    "decision_date": {"type": ["string", "null"]}
  },
  "required": ["company_name", "annual_revenue", "contact_email", "ai_signal", "budget_range", "decision_date"],
  "additionalProperties": false
}

4b. Extensible Categorization with `other`

Enums are reliable, but production categories evolve. Use a bounded enum with an other escape hatch (the exam also uses the equivalent name "unclear" for this pattern) plus a required detail field so the system stays extensible without losing structure.

JSON (category schema)

{
  "type": "object",
  "properties": {
    "category": {
      "type": "string",
      "enum": ["billing", "technical", "security", "other"]
    },
    "category_detail": {
      "type": ["string", "null"],
      "description": "Required when category is other; otherwise null."
    }
  },
  "required": ["category", "category_detail"],
  "additionalProperties": false
}

Validation rule: if category is "other", category_detail must explain the specific category. If category is a known enum value, category_detail should be null.

5. Validation-Retry Loops

Schema constraints guarantee shape, not truth. A response can be perfectly schema-valid and still semantically wrong, line items that don't sum to the stated total, a decision date before the document date. Catch these with code-level validation, and when validation fails, retry with feedback: send the original document, the failed extraction, and the specific validation errors appended, so the model can see exactly what to correct rather than guessing.

Python (validation-retry loop)

MAX_RETRIES = 2

def extract_with_retry(document_text: str):
    attempt = extract(document_text)                # forced tool_choice extraction
    for _ in range(MAX_RETRIES):
        errors = validate(attempt)                  # returns [] when clean
        if not errors:
            return attempt
        if any(e.kind == "missing_from_source" for e in errors):
            return route_to_human_review(document_text, attempt, errors)
        feedback = (
            f"Document:\n{document_text}\n\n"
            f"Your previous extraction:\n{json.dumps(attempt)}\n\n"
            f"It failed these validation checks:\n"
            + "\n".join(f"- {e.message}" for e in errors)
            + "\nCall record_prospect again with a corrected extraction."
        )
        attempt = extract(feedback)
    return route_to_human_review(document_text, attempt, errors)

The key limit: retries fix format and structural errors, a miscopied number, a field mapped to the wrong key, an invalid enum value. Retries cannot fix information that is absent from the source document. If the contact email simply is not in the text, no amount of retrying will produce it, it will only burn tokens or, worse, pressure the model into fabricating one. Detect the absent-information case (a required semantic field that validation flags as missing and the model returns null for on retry) and route it to human review instead of retrying.

5a. Self-Correction Fields

You can also make the schema itself do validation work. Have the model extract calculated_total (the sum it computes from the line items) alongside stated_total (the total printed on the document) and flag any discrepancy, a mismatch is a built-in semantic check with no extra API call. Similarly, add a conflict_detected boolean the model must set when sources disagree (e.g. two documents state different revenue figures), so inconsistency surfaces as data instead of silently resolving to one arbitrary value.

Lab Exercise: Designing for Deterministic Extraction

Self-driven lab Module8_Self_Driven_Lab.ipynb

Objective: master categorical criteria, union-type constraints, prerequisite forcing, and validation loops.

Explicit classification: create a Lead Quality classification tool using numeric thresholds such as Revenue > $10M; include one-line examples for each category.
Few-shot generalization: provide 2-4 extraction examples. Each example must include a Why: line explaining why specific data was mapped to a field.
Union-type constraints: design a Prospect Profile schema with 5 nullable fields and verify missing data returns null, not placeholder text.
Extensible categorization: design a schema where category is an enum that includes other; require a separate category_detail string when other is selected.
Prerequisite forcing: use tool_choice: {"type": "tool", "name": "extract_metadata"} to ensure metadata extraction runs before enrichment.
Validation-retry loop: check for a semantic error, such as line items not summing to a total. If validation fails, send a follow-up request with the original document, the failed extraction, and the specific validation errors to guide self-correction.
Absent-information detection: run the loop on a document that is genuinely missing a required field (e.g. no contact email anywhere in the text). Verify your code recognizes that retries cannot recover absent information and routes the record to human review instead of retrying.

Architect Tip for the Exam

Precision is not "more prompt." It is objective criteria, examples with reasoning, schemas that allow legitimate missingness, strict tool constraints, and validation feedback that tells the model exactly what semantic invariant failed.