Module 4

Scaling with Message Batches

The Message Batches API is the right tool for high-volume, non-time-sensitive work, generating thousands of personalized outreach emails, long-form whitepapers, or market analysis reports asynchronously at 50% off standard pricing.

Answer key Module4_Complete.ipynb

1. The Economics of Scale

50% Discount: All usage in a batch is charged at half the standard API price for both input and output tokens.
Throughput: Batches allow significantly higher concurrency than synchronous requests.
Latency Trade-off: Most batches complete in under 1 hour; the API guarantees completion within 24 hours.

2. Extended Output: 300k Tokens

To generate "book-length" assets, complete technical guides or comprehensive market reports, use the extended output beta header.

Beta Header: Include output-300k-2026-03-24 in your request headers.
Capacity: Raises max_tokens to 300,000 tokens per turn on Claude Opus 4.7 and Sonnet 4.6.

ZDR Warning

The Message Batches API is not ZDR-eligible. Inputs and outputs are stored server-side until the batch completes, with results available for download for 29 days. Do not use batch processing for requests containing client PHI if ZDR is required.

3. Batch Lifecycle & `custom_id`

Results are returned asynchronously and not in submission order. The custom_id is the only way to map outputs back to inputs.

Creation: Submit up to 100,000 requests. Each must have a unique custom_id.
Tracking: Poll processing_status until it reaches "ended".
Retrieval: Results are in .jsonl format, one line per request (succeeded / errored / canceled / expired).

4. Implementation Task

Python

from anthropic.types.message_create_params import MessageCreateParamsNonStreaming
from anthropic.types.messages.batch_create_params import Request

message_batch = client.messages.batches.create(
    requests=[
        Request(
            custom_id="prospect-fintech-001",  # unique ID maps results back to input
            params=MessageCreateParamsNonStreaming(
                model="claude-sonnet-4-6",
                max_tokens=300000,
                extra_headers={"anthropic-beta": "output-300k-2026-03-24"},
                messages=[{"role": "user", "content": "Write a 200-page AI consulting guide for Fintech CTOs."}]
            )
        ),
        Request(
            custom_id="prospect-healthcare-002",
            params=MessageCreateParamsNonStreaming(
                model="claude-sonnet-4-6",
                max_tokens=1024,
                messages=[{"role": "user", "content": "Write a personalized outreach email for a Healthcare CEO."}]
            )
        )
    ]
)

5. Task: Retrieving and Mapping Batch Results

Because batches are processed asynchronously, your application must poll for completion before results can be accessed. Once complete, results are streamed to handle up to 100,000 responses without memory overflow.

Polling for Completion

A batch's processing_status starts as "in_progress". Poll the retrieve endpoint until it reaches "ended", indicating all requests have finished (succeeded, errored, or expired).

Python

import time

# Use the ID captured from your creation call
BATCH_ID = message_batch.id

while True:
    status_update = client.messages.batches.retrieve(BATCH_ID)

    if status_update.processing_status == "ended":
        print("Batch processing complete!")
        break

    counts = status_update.request_counts
    print(f"Still processing... (Succeeded: {counts.succeeded}, Errored: {counts.errored})")
    time.sleep(60)  # poll every 60 seconds

Streaming and Mapping Results

Use .results() to stream responses. Results arrive in .jsonl format and are not in submission order, always use custom_id to map output back to input.

Python

for result in client.messages.batches.results(BATCH_ID):
    request_id = result.custom_id

    if result.result.type == "succeeded":
        content = result.result.message.content[0].text
        print(f"Success for {request_id}: {content[:50]}...")

    elif result.result.type == "errored":
        error_type = result.result.error.error.type
        print(f"Error for {request_id}: {error_type}")

    elif result.result.type == "expired":
        print(f"Request {request_id} timed out (24-hour limit reached).")

    elif result.result.type == "canceled":
        print(f"Request {request_id} was canceled before completion.")

Architect Rules for the Exam

Result Retention: Batch results are available for download for 29 days after creation. After that, metadata remains but result files are deleted.
Billing Logic: You are only billed for requests that succeed. Errored, expired, and canceled requests are not charged.
ID Format: All valid Batch IDs begin with the msgbatch_ prefix.
Authentication: The results_url is a protected endpoint, provide your x-api-key even when downloading directly via curl.

6. Stacking Discounts with Prompt Caching

Add cache_control blocks to identical prefix content (system prompt, shared context) across all requests in the batch. Cache hits are provided on a best-effort basis, include the same breakpoints in every request to maximize hit rates (typically 30% to 98%).

Lab Exercise: Batch Campaign Recovery

Self-driven lab Module4_Self_Driven_Lab.ipynb

Objective: use Message Batches for scaled work while preserving traceability and retry behavior.

Create a batch payload with unique `custom_id` values and realistic outreach prompts.
Poll or inspect batch status and map results back to the original records.
Simulate a failed record and build a retry batch containing only the failed `custom_id`.
Document the 50 percent discount and the ZDR/storage trade-off.

Expected Deliverable

A batch workflow that supports out-of-order results and selective retry.

1. The Economics of Scale

2. Extended Output: 300k Tokens

3. Batch Lifecycle & custom_id

4. Implementation Task

5. Task: Retrieving and Mapping Batch Results

Polling for Completion

Streaming and Mapping Results

6. Stacking Discounts with Prompt Caching

Lab Exercise: Batch Campaign Recovery

3. Batch Lifecycle & `custom_id`