Scaling with Message Batches
The Message Batches API is the right tool for high-volume, non-time-sensitive work, generating thousands of personalized outreach emails, long-form whitepapers, or market analysis reports asynchronously at 50% off standard pricing.
1. The Economics of Scale
- 50% Discount: All usage in a batch is charged at half the standard API price for both input and output tokens.
- Throughput: Batches allow significantly higher concurrency than synchronous requests.
- Latency Trade-off: Most batches complete in under 1 hour; the API guarantees completion within 24 hours.
2. Extended Output: 300k Tokens
To generate "book-length" assets, complete technical guides or comprehensive market reports, use the extended output beta header.
- Beta Header: Include
output-300k-2026-03-24in your request headers. - Capacity: Raises
max_tokensto 300,000 tokens per turn on Claude Opus 4.7 and Sonnet 4.6.
The Message Batches API is not ZDR-eligible. Inputs and outputs are stored server-side until the batch completes, with results available for download for 29 days. Do not use batch processing for requests containing client PHI if ZDR is required.
3. Batch Lifecycle & custom_id
Results are returned asynchronously and not in submission order. The custom_id is the only way to map outputs back to inputs.
- Creation: Submit up to 100,000 requests. Each must have a unique
custom_id. - Tracking: Poll
processing_statusuntil it reaches"ended". - Retrieval: Results are in
.jsonlformat, one line per request (succeeded / errored / canceled / expired).
4. Implementation Task
from anthropic.types.message_create_params import MessageCreateParamsNonStreaming
from anthropic.types.messages.batch_create_params import Request
message_batch = client.messages.batches.create(
requests=[
Request(
custom_id="prospect-fintech-001", # unique ID maps results back to input
params=MessageCreateParamsNonStreaming(
model="claude-sonnet-4-6",
max_tokens=300000,
extra_headers={"anthropic-beta": "output-300k-2026-03-24"},
messages=[{"role": "user", "content": "Write a 200-page AI consulting guide for Fintech CTOs."}]
)
),
Request(
custom_id="prospect-healthcare-002",
params=MessageCreateParamsNonStreaming(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a personalized outreach email for a Healthcare CEO."}]
)
)
]
)
5. Task: Retrieving and Mapping Batch Results
Because batches are processed asynchronously, your application must poll for completion before results can be accessed. Once complete, results are streamed to handle up to 100,000 responses without memory overflow.
Polling for Completion
A batch's processing_status starts as "in_progress". Poll the retrieve endpoint until it reaches "ended", indicating all requests have finished (succeeded, errored, or expired).
import time
# Use the ID captured from your creation call
BATCH_ID = message_batch.id
while True:
status_update = client.messages.batches.retrieve(BATCH_ID)
if status_update.processing_status == "ended":
print("Batch processing complete!")
break
counts = status_update.request_counts
print(f"Still processing... (Succeeded: {counts.succeeded}, Errored: {counts.errored})")
time.sleep(60) # poll every 60 seconds
Streaming and Mapping Results
Use .results() to stream responses. Results arrive in .jsonl format and are not in submission order, always use custom_id to map output back to input.
for result in client.messages.batches.results(BATCH_ID):
request_id = result.custom_id
if result.result.type == "succeeded":
content = result.result.message.content[0].text
print(f"Success for {request_id}: {content[:50]}...")
elif result.result.type == "errored":
error_type = result.result.error.error.type
print(f"Error for {request_id}: {error_type}")
elif result.result.type == "expired":
print(f"Request {request_id} timed out (24-hour limit reached).")
elif result.result.type == "canceled":
print(f"Request {request_id} was canceled before completion.")
- Result Retention: Batch results are available for download for 29 days after creation. After that, metadata remains but result files are deleted.
- Billing Logic: You are only billed for requests that succeed. Errored, expired, and canceled requests are not charged.
- ID Format: All valid Batch IDs begin with the
msgbatch_prefix. - Authentication: The
results_urlis a protected endpoint, provide yourx-api-keyeven when downloading directly via curl.
6. Stacking Discounts with Prompt Caching
Add cache_control blocks to identical prefix content (system prompt, shared context) across all requests in the batch. Cache hits are provided on a best-effort basis, include the same breakpoints in every request to maximize hit rates (typically 30% to 98%).
Lab Exercise: Batch Campaign Recovery
Self-driven lab Module4_Self_Driven_Lab.ipynbObjective: use Message Batches for scaled work while preserving traceability and retry behavior.
- Create a batch payload with unique `custom_id` values and realistic outreach prompts.
- Poll or inspect batch status and map results back to the original records.
- Simulate a failed record and build a retry batch containing only the failed `custom_id`.
- Document the 50 percent discount and the ZDR/storage trade-off.
A batch workflow that supports out-of-order results and selective retry.