Published Date : 18/09/2025
As organizations scale their use of generative AI, many workloads require cost-efficient, bulk processing rather than real-time responses. Amazon Bedrock batch inference addresses this need by enabling large datasets to be processed in bulk with predictable performance—at 50% lower cost than on-demand inference. This makes it ideal for tasks such as historical data analysis, large-scale text summarization, and background processing workloads.
In this post, we explore how to monitor and manage Amazon Bedrock batch inference jobs using Amazon CloudWatch metrics, alarms, and dashboards to optimize performance, cost, and operational efficiency.
## New Features in Amazon Bedrock Batch Inference
Batch inference in Amazon Bedrock is constantly evolving, and recent updates bring significant enhancements to performance, flexibility, and cost transparency:
- **Expanded Model Support** – Batch inference now supports additional model families, including Anthropic’s Claude Sonnet 4 and OpenAI OSS models. For the most up-to-date list, refer to Supported Regions and models for batch inference.
- **Performance Enhancements** – Batch inference optimizations on newer Anthropic Claude and OpenAI GPT OSS models now deliver higher batch throughput as compared to previous models, helping you process large workloads more quickly.
- **Job Monitoring Capabilities** – You can now track how your submitted batch jobs are progressing directly in CloudWatch, without the heavy lifting of building custom monitoring solutions. This capability provides AWS account-level visibility into job progress, making it straightforward to manage large-scale workloads.
## Use Cases for Batch Inference
AWS recommends using batch inference in the following use cases:
- Jobs are **not time-sensitive** and can tolerate minutes to hours of delay
- Processing is periodic, such as daily or weekly summarization of large datasets (news, reports, transcripts)
- Bulk or historical data needs to be analyzed, such as archives of call center transcripts, emails, or chat logs
- Knowledge bases need enrichment, including generating embeddings, summaries, tags, or translations at scale
- Content requires large-scale transformation, such as classification, sentiment analysis, or converting unstructured text into structured outputs
- Experimentation or evaluation is needed, for example testing prompt variations or generating synthetic datasets
- Compliance and risk checks must be run on historical content for sensitive data detection or governance
## Launch an Amazon Bedrock Batch Inference Job
You can start a batch inference job in Amazon Bedrock using the AWS Management Console, AWS SDKs, or AWS Command Line Interface (AWS CLI). For detailed instructions, see Create a batch inference job.
To use the console, complete the following steps:
1. On the Amazon Bedrock console, choose **Batch inference** under **Infer** in the navigation pane.
2. Choose **Create batch inference job**.
3. For **Job name**, enter a name for your job.
4. For **Model**, choose the model to use.
5. For **Input data**, enter the location of the Amazon Simple Storage Service (Amazon S3) input bucket (JSONL format).
6. For **Output data**, enter the S3 location of the output bucket.
7. For **Service access**, select your method to authorize Amazon Bedrock.
8. Choose **Create batch inference job**.
## Monitor Batch Inference with CloudWatch Metrics
Amazon Bedrock now automatically publishes metrics for batch inference jobs under the AWS/Bedrock/Batch namespace. You can track batch workload progress at the AWS account level with the following CloudWatch metrics. For current Amazon Bedrock models, these metrics include records pending processing, input and output tokens processed per minute, and for Anthropic Claude models, they also include tokens pending processing.
The following metrics can be monitored by `modelId`:
- **NumberOfTokensPendingProcessing** – Shows how many tokens are still waiting to be processed, helping you gauge backlog size
- **NumberOfRecordsPendingProcessing** – Tracks how many inference requests remain in the queue, giving visibility into job progress
- **NumberOfInputTokensProcessedPerMinute** – Measures how quickly input tokens are being consumed, indicating overall processing throughput
- **NumberOfOutputTokensProcessedPerMinute** – Measures generation speed
To view these metrics using the CloudWatch console, complete the following steps:
1. On the CloudWatch console, choose **Metrics** in the navigation pane.
2. Filter metrics by AWS/Bedrock/Batch.
3. Select your `modelId` to view detailed metrics for your batch job.
To learn more about how to use CloudWatch to monitor metrics, refer to Query your CloudWatch metrics with CloudWatch Metrics Insights.
## Best Practices for Monitoring and Managing Batch Inference
Consider the following best practices for monitoring and managing your batch inference jobs:
- **Cost Monitoring and Optimization** – By monitoring token throughput metrics (`NumberOfInputTokensProcessedPerMinute` and `NumberOfOutputTokensProcessedPerMinute`) alongside your batch job schedules, you can estimate inference costs using information on the Amazon Bedrock pricing page. This helps you understand how fast tokens are being processed, what that means for cost, and how to adjust job size or scheduling to stay within budget while still meeting throughput needs.
- **SLA and Performance Tracking** – The `NumberOfTokensPendingProcessing` metric is useful for understanding your batch backlog size and tracking overall job progress, but it should not be relied on to predict job completion times because they might vary depending on overall inference traffic to Amazon Bedrock. To understand batch processing speed, we recommend monitoring throughput metrics (`NumberOfInputTokensProcessedPerMinute` and `NumberOfOutputTokensProcessedPerMinute`) instead. If these throughput rates fall significantly below your expected baseline, you can configure automated alerts to trigger remediation steps—for example, shifting some jobs to on-demand processing to meet your expected timelines.
- **Job Completion Tracking** – When the metric `NumberOfRecordsPendingProcessing` reaches zero, it indicates that all running batch inference jobs are complete. You can use this signal to trigger stakeholder notifications or start downstream workflows.
## Example of CloudWatch Metrics
In this section, we demonstrate how you can use CloudWatch metrics to set up proactive alerts and automation.
For example, you can create a CloudWatch alarm that sends an Amazon Simple Notification Service (Amazon SNS) notification when the average `NumberOfInputTokensProcessedPerMinute` exceeds 1 million within a 6-hour period. This alert could prompt an Ops team review or trigger downstream data pipelines.
The following screenshot shows that the alert has **In alarm** status because the batch inference job met the threshold. The alarm will trigger the target action, in our case an SNS notification email to the Ops team.
You can also build a CloudWatch dashboard displaying the relevant metrics. This is ideal for centralized operational monitoring and troubleshooting.
## Conclusion
Amazon Bedrock batch inference now offers expanded model support, improved performance, deeper visibility into the progress of your batch workloads, and enhanced cost monitoring.
Get started today by launching an Amazon Bedrock batch inference job, setting up CloudWatch alarms, and building a monitoring dashboard, so you can maximize efficiency and value from your generative AI workloads.
Q: What is Amazon Bedrock batch inference?
A: Amazon Bedrock batch inference is a feature that allows you to process large datasets in bulk with predictable performance at a lower cost compared to on-demand inference. It is ideal for tasks such as historical data analysis, large-scale text summarization, and background processing workloads.
Q: How can I monitor batch inference jobs?
A: You can monitor batch inference jobs using Amazon CloudWatch metrics, alarms, and dashboards. CloudWatch provides metrics such as `NumberOfTokensPendingProcessing`, `NumberOfRecordsPendingProcessing`, `NumberOfInputTokensProcessedPerMinute`, and `NumberOfOutputTokensProcessedPerMinute` to track job progress and performance.
Q: What are the best practices for managing batch inference jobs?
A: Best practices include cost monitoring and optimization by tracking token throughput metrics, SLA and performance tracking by monitoring throughput metrics, and job completion tracking by using the `NumberOfRecordsPendingProcessing` metric to trigger stakeholder notifications or downstream workflows.
Q: How can I set up alerts for batch inference jobs?
A: You can set up CloudWatch alarms to send notifications when specific metrics, such as `NumberOfInputTokensProcessedPerMinute`, exceed a certain threshold. This can help you proactively manage and optimize your batch inference jobs.
Q: What new features does Amazon Bedrock batch inference offer?
A: Recent updates to Amazon Bedrock batch inference include expanded model support, performance enhancements, and job monitoring capabilities. These features help improve the efficiency and cost-effectiveness of your generative AI workloads.