Module 7

Prompt Engineering for Extraction

This module ensures your agent's deliverables, prospect lists, campaign briefs, lead scores, are returned in a valid, parseable format with no hallucinated fields. You will master Structured Outputs for guaranteed JSON, nullable schemas for missing data, and the critical citations incompatibility constraint.

Answer key Module7_Complete.ipynb

1. The Two Pillars of Structured Outputs

Structured outputs constrain Claude's responses to a specific JSON schema, eliminating JSON.parse() errors and missing fields.

  • JSON Outputs (output_config.format): Controls what Claude says in its final response, forcing it to return a JSON object instead of conversational text.
  • Strict Tool Use (strict: true): Guarantees that when Claude calls a tool (e.g., add_lead_to_crm), the inputs follow your schema exactly.

2. Performance: Grammar Compilation & Caching

For the Architect exam, understand how the "engine" works. Structured outputs use constrained sampling with compiled grammar artifacts.

  • Initial Latency: The first request with a new schema has extra latency during grammar compilation.
  • 24-Hour Cache: Compiled grammars are cached for 24 hours from the last use, making subsequent requests much faster.
  • Invalidation: Changing the schema structure or the set of tools in the request invalidates this cache.

3. Critical Constraints

The API enforces several complexity limits to ensure efficient compilation. These are important exam numbers:

  • Strict Tools: Maximum 20 strict tools per request.
  • Optional Parameters: Total of 24 optional parameters across all strict schemas in a single request.
  • Union Types: Total of 16 parameters using anyOf or type arrays (like ["string", "null"]).
  • Property Ordering: Claude orders properties with required properties first, followed by optional properties, regardless of their order in your schema.

4. Implementation Task

Configure the agent to output a structured marketing plan while ensuring its research tool is called with strict parameters.

Python
# Implementing Structured Response and Strict Tool Use
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    # JSON Output: Final response format
    output_config={
        "format": {
            "type": "json_schema",
            "schema": {
                "type": "object",
                "properties": {
                    "campaign_name": {"type": "string"},
                    "budget_allocation": {"type": "integer"},
                    "channels": {"type": "array", "items": {"type": "string"}}
                },
                "required": ["campaign_name", "budget_allocation", "channels"],
                "additionalProperties": False
            }
        }
    },
    # Strict Tool Use: Guaranteed input format
    tools=[{
        "name": "search_market_data",
        "strict": True,  # Enforce schema validation
        "input_schema": {
            "type": "object",
            "properties": {
                "region": {"type": "string"},
                "industry": {"type": "string"}
            },
            "required": ["region", "industry"],
            "additionalProperties": False
        }
    }],
    messages=[{"role": "user", "content": "Plan a $10k AI consulting launch in London."}]
)

5. Hallucination Prevention: Nullable Fields

When extracting prospect data, some fields will legitimately be missing (e.g., budget information not yet disclosed). Define those fields as nullable to prevent Claude from inventing placeholder values.

  • Pattern: Use a type array ["string", "null"] to allow both a real value and null when data is absent.
  • Counts against union limit: Each nullable field uses one of the 16 allowed union-type parameters per request.
JSON (schema excerpt)
{
  "type": "object",
  "properties": {
    "company_name":  {"type": "string"},
    "contact_email": {"type": "string"},
    "budget_range":  {"type": ["string", "null"]},  // nullable, may be unknown
    "decision_date": {"type": ["string", "null"]}   // nullable, may be unknown
  },
  "required": ["company_name", "contact_email", "budget_range", "decision_date"],
  "additionalProperties": false
}
Architect Tip for the Exam

Citations and Structured Outputs are fundamentally incompatible. Citations require interleaving text and citation blocks, which violates strict JSON schema constraints. The API returns a 400 error if both are active in the same request. Architects must choose one or the other based on the use case, citations for grounded research, structured outputs for CRM-ready extraction.

Lab Exercise: Reliable Extraction Schema

Self-driven lab Module7_Self_Driven_Lab.ipynb

Objective: produce structured extraction outputs without forcing Claude to invent missing values.

  1. Design a JSON schema with required nullable fields for uncertain business data.
  2. Run or sketch an extraction call using `output_config.format`.
  3. Add a validation step that returns the original document, failed extraction, and exact validation error for retry.
  4. Explain when citations must be chosen instead of structured output.
Expected Deliverable

A strict extraction workflow with validation and retry behavior.