
Update: June updates to listing attribute usage and enumeration values


AWS Graviton processors have improved steadily across generations, with each iteration delivering advances in compute performance, price-performance, and energy efficiency. At re:Invent 2025, we announced Amazon EC2 M9g, the first Graviton5-powered instances, in preview. Since then, customers have tested M9g across a wide range of workloads and shared their results. ClickHouse saw a 36% performance boost compared to M8g, with zero code changes. Honeycomb achieved 36% better throughput per core compared to Graviton4, across a 6-month A/B test of production observability workloads. HubSpot deployed M9g for MySQL databases and saw query duration drop by up to 60%. Today, M9g instances are generally available, alongside the new M9gd instances for customers who need high-speed, low-latency local NVMe SSD storage. Both are powered by Graviton5, the most powerful and most energy efficient processor AWS has ever built.
While many Arm-based instances have been introduced across the industry, no one comes close to the breadth and depth of the AWS Graviton footprint. After five generations of custom silicon and eight years of continuous investment, Graviton powers over 350 instance types serving more than 120,000 customers, from startups to large enterprises, a robust ISV partner ecosystem, and a broad set of managed services. You can use Graviton for a broad variety of workloads, including web applications, microservices, analytics, databases, machine learning (ML) inference, electronic design automation (EDA), gaming, and video encoding. As workloads grow more compute-intensive and data-driven, many have asked for more processing power, along with greater network and storage bandwidth to move more data and complete workloads faster. We’ve also designed these instances to efficiently package compute, memory, and I/O to maximize energy utilization.
As AI shifts from answering questions to taking actions, running code, using tools, evaluating results, and orchestrating multi-step tasks, the demand for CPU compute is growing rapidly. Graviton5 is built for this shift. With 192 cores, a 5x larger L3 cache, up to 33% lower inter-core latency, and DDR5 memory delivering high bandwidth, Graviton5 helps agents spend less time waiting on CPU-bound steps, processing more instructions, handling large numbers of concurrent environments, and keeping accelerators moving.
Meta is deploying Graviton at scale starting with tens of millions of cores to support its agentic AI efforts, making Meta one of the largest Graviton customers in the world. Agentic AI workloads, including real-time reasoning, code generation, and the orchestration of multi-step tasks, are CPU-intensive and benefit from the higher compute performance, larger caches, higher memory bandwidth, and core density in Graviton5.
What’s new in M9g and M9gd
Built on the sixth-generation AWS Nitro System, M9g instances are powered by AWS Graviton5 processors that deliver higher compute performance, larger caches, and improved memory and I/O scalability compared to Graviton4 processors. Graviton5 offers up to 25% better compute performance compared to Graviton4-based instances, with up to 35% faster performance for web applications, up to 35% for machine learning inference, and up to 30% for databases. As the first CPU in the AWS fleet to support the latest generation of PCIe Gen6 and DDR5-8800 memory, AWS Graviton5 instances deliver the fastest memory of any processor instances in the cloud, and 5 times more L3 cache compared to the previous generation. These improvements also come with better energy efficiency, helping you meet sustainability targets without compromising capability.
Networking and storage bandwidth have been expanded to keep pace with compute growth. M9g and M9gd instances offer up to 15% higher network bandwidth and 20% higher Amazon Elastic Block Store (Amazon EBS) bandwidth on average across sizes, with up to twice the network bandwidth for the largest instance size. M9g and M9gd instances also support Instance Bandwidth Configuration (IBC), a feature that helps you adjust the allocation of bandwidth between Amazon EBS and Amazon Virtual Private Cloud (Amazon VPC) networking for an Amazon EC2 instance by up to 25%. IBC can help optimize performance for workloads with specific bandwidth requirements, such as database read and write performance, query processing, and logging. These enhancements support faster data movement and improved throughput for workloads that rely on high I/O performance.
Security and isolation are foundational requirements for running workloads in the cloud. Within the Nitro System, the AWS Nitro Hypervisor is designed to isolate instances from each other as well as AWS operators. With M9g and M9gd instances we are raising the bar on security even further with the introduction of Nitro Isolation Engine. Nitro Isolation Engine is an enhancement to the Nitro System, which enforces isolation of instances and harnesses formal verification to provide assurances of isolation with mathematical precision. Nitro Isolation Engine is a purpose-built component that is responsible for enforcing isolation between virtual machines, including mediation of all access to virtual machine memory, CPU register state, and I/O devices through a minimal set of APIs. Nitro Isolation Engine leverages formal verification, a technique to mathematically demonstrate that the hardware or software behaves as intended, and not just in specific test cases. This intensive verification technique establishes Nitro as the first formally verified cloud hypervisor, pioneering a new standard for mathematically proven cloud security.
M9g instances provide one vCPU for every four GiB of memory and are well suited for a broad range of general-purpose workloads, including application servers, microservices, midsize data stores, gaming servers, caching fleets, containerized applications, large-scale Java applications, code repositories, web applications, and agentic AI.
For workloads that need high-speed, low-latency local storage, M9gd instances provide up to 11.4 TB of NVMe SSD storage and 30% higher IOPS and storage performance compared to Graviton4-based M8gd instances. M9gd instances are well suited for general-purpose workloads that require a balance of compute and memory with high-speed, low-latency local storage, including application servers, microservices, gaming servers, midsize key-value data stores, caching fleets, data logging, media processing, batch and log processing, and applications that need temporary storage such as caches and scratch files.
Here are the key specifications across the family:
| M9g | vCPUs | Memory (GiB) | Network bandwidth (Gbps) | EBS bandwidth (Gbps) |
| medium | 1 | 4 | Up to 15 | Up to 12 |
| large | 2 | 8 | Up to 15 | Up to 12 |
| xlarge | 4 | 16 | Up to 15 | Up to 12 |
| 2xlarge | 8 | 32 | Up to 17 | Up to 12 |
| 4xlarge | 16 | 64 | Up to 17 | Up to 12 |
| 8xlarge | 32 | 128 | 17 | 12 |
| 12xlarge | 48 | 192 | 25 | 18 |
| 16xlarge | 64 | 256 | 34 | 24 |
| 24xlarge | 96 | 384 | 50 | 36 |
| 48xlarge | 192 | 768 | 100 | 72 |
| metal-48xl | 192 | 768 | 100 | 72 |
M9gd instances include local NVMe SSD storage. The table below shows the instance storage for each size. Compute, memory, network, and EBS bandwidth specifications are the same as M9g.
| M9gd | vCPUs | Memory (GiB) | Instance storage (GB) | Network bandwidth (Gbps) | EBS bandwidth (Gbps) |
| medium | 1 | 4 | 1 x 59 NVMe SSD | Up to 15 | Up to 12 |
| large | 2 | 8 | 1 x 118 NVMe SSD | Up to 15 | Up to 12 |
| xlarge | 4 | 16 | 1 x 237 NVMe SSD | Up to 15 | Up to 12 |
| 2xlarge | 8 | 32 | 1 x 475 NVMe SSD | Up to 17 | Up to 12 |
| 4xlarge | 16 | 64 | 1 x 950 NVMe SSD | Up to 17 | Up to 12 |
| 8xlarge | 32 | 128 | 1 x 1900 NVMe SSD | 17 | 12 |
| 12xlarge | 48 | 192 | 3 x 950 NVMe SSD | 25 | 18 |
| 16xlarge | 64 | 256 | 1 x 3800 NVMe SSD | 34 | 24 |
| 24xlarge | 96 | 384 | 3 x 1900 NVMe SSD | 50 | 36 |
| 48xlarge | 192 | 768 | 3 x 3800 NVMe SSD | 100 | 72 |
| metal-48xl | 192 | 768 | 3 x 3800 NVMe SSD | 100 | 72 |
Now available
M9g and M9gd instances are available in the US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Frankfurt) Regions. M9g and M9gd instances are available for purchase through Savings Plans, On-Demand, Spot Instances, Dedicated Instances, or Dedicated Hosts. For more information, visit Amazon EC2 pricing.
To get started with M9g and M9gd instances, several resources are available. The AWS Graviton Getting Started Guide is a technical guide covering how to build, run, and optimize workloads on Graviton-based instances. The Graviton Savings Dashboard helps you track and measure the cost savings from running workloads on Graviton-based instances. And AWS Transform is an AI-powered service that automates code transformations for migrating Java applications from x86 to Graviton-based Amazon EC2 instances, handling compatibility analysis, automated recompilation, dependency updates, and validation.
To learn more about Graviton-based instances, visit AWS Graviton Processors or Level up your compute with AWS Graviton.
— EsraToday, we’re announcing the availability of Claude Fable 5 on Amazon Bedrock and Claude Platform on AWS. Claude Fable 5 makes Mythos-level capabilities available to all customers, with strong safeguards designed to make it safe for broader use. Fable 5 is state-of-the-art on nearly all tested benchmarks and delivers exceptional performance in software engineering, knowledge work tasks, and vision – built for ambitious, long running work.
With Claude Fable 5 on Bedrock, you can build within your existing AWS environment and scale inference workloads. You can also use Claude Fable 5 through the Claude Platform on AWS, giving you Anthropic’s native platform experience.
According to Anthropic, Claude Fable 5 represents a step-change in what you can accomplish with AI models. Here is what makes this model different:
Claude Fable 5 includes safeguards that limit its performance in specific areas where misuse risk is elevated. Harmful prompts related to cybersecurity, biology, chemistry, and health fall back to receive a response from Opus 4.8 instead. Anthropic is able to expand access to nearly all of Claude Fable 5’s state-of-the-art capabilities by developing more powerful safeguards. The same model without these limits is Claude Mythos 5 and it will only be available to a small group of vetted customers.
Claude Fable 5 model in action
You can use Claude Fable 5 in both Amazon Bedrock and Claude Platform on AWS. This post will cover guidance on how to access and use on Amazon Bedrock. For guidance on the Claude Platform on AWS, visit the documentation to learn more.
To get started with Amazon Bedrock, you can only access the model programmatically now using the Anthropic Messages API to call the bedrock-runtime or bedrock-mantle endpoints through Anthropic SDK. You can sole keep using the Invoke and Converse API on bedrock-runtime through the AWS Command Line Interface (AWS CLI) and AWS SDK. The console support is coming soon.
In order to access Claude Fable 5 model, you must opt into data sharing by using the Data Retention API and setting provider_data_sharing before you can invoke the models. There is no console user interface for this setting at launch.
curl -X PUT https://bedrock-mantle.us-east-1.api.aws/v1/data_retention \
-H "x-api-key: <your-bedrock-api-key>" \
-H "Content-Type: application/json" \
-d '{ "mode": "provider_data_share" }'
This mode allows Amazon Bedrock to retain and share your inference data with model providers per their requirements. Anthropic requires 30-day inputs and outputs retention, as well as human review. To learn more, visit the Amazon Bedrock abuse detection.
Let’s start with Anthropic SDK for Python using the Messages API on bedrock-mantle endpoint. Install Anthropic SDK.
pip install anthropic
Here is a sample Python code to call Claude Fable 5 model:
import anthropic
client = anthropic.Anthropic(
base_url="https://bedrock-mantle.us-east-1.api.aws/anthropic",
api_key= <your-bedrock-api-key>
)
message = client.messages.create(
model="anthropic.claude-fable-5",
max_tokens=4096,
messages=[
{ "role": "user",
"content": "Design a distributed architecture on AWS in Python that should support 100k requests per second across multiple geographic regions",
},
],
)
print(message.content[0].text)
To learn more, check out Anthropic Messages API code examples and notebook examples for multiple use cases and a variety of programming languages.
You can also use Claude Fable 5 with the Invoke API and Converse API on bedrock-runtime endpoint. Here’s a example to call Converse API for a unified multi-model experience using the AWS SDK for Python (Boto3):
import boto3
bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")
response = bedrock_runtime.converse(
modelId="us.anthropic.claude-fable-5",
messages=[
{
"role": "user",
"content": [
{
"text": "Design a distributed architecture on AWS in Python that should support 100k requests per second across multiple geographic regions."
}
]
}
],
inferenceConfig={
"maxTokens": 4096
}
)
print(response["output"]["message"]["content"][0]["text"])
To learn more, visit code examples that show how to use Amazon Bedrock Runtime with AWS SDKs.
Things to know
Let me share some important technical details that I think you’ll find useful.
Now available
Anthropic’s Claude Fable 5 model is available today on Amazon Bedrock in the US East (N. Virginia) and Europe (Stockholm) Regions; check the full list of Regions for future updates. Claude Fable 5 is also available on the Claude Platform on AWS in North America, South America, Europe, and Asia Pacific.
Give Claude Fable 5 a try with the Amazon Bedrock APIs, in the Claude Platform on AWS, and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.
— Channy
This week, the AWS IoT Device SDK for Swift reached general availability. As a member of the Swift Server Workgroup (SSWG), this one caught my attention. The SDK brings production-ready MQTT 5 connectivity, Device Shadow, Jobs, and fleet provisioning to Swift developers on macOS, iOS, tvOS, and Linux.
I’m curious to see what you will build with it. Swift on the server has matured over the past few years, and now it reaches IoT devices too. This connects to a broader trend of running Swift at the edge. WendyOS, for example, is an open-source operating system for physical AI that offers first-class Swift support for deploying apps to NVIDIA Jetson and Raspberry Pi hardware. Between server-side Swift, IoT, and edge computing, the language is showing up in places that would have surprised most people a few years ago.
Now, let’s get into this week’s AWS news.
Headlines
Amazon RDS for SQL Server supports Bring Your Own Media — Customers who migrate SQL Server applications from on-premises environments can now reuse their existing Microsoft SQL Server licenses, including Software Assurance, through Microsoft’s License Mobility program on Amazon RDS. BYOM is integrated with AWS License Manager for tracking license usage and compliance. Read more.
Amazon Cognito now supports multi-Region replication — You can now synchronize user and machine identity data, including credentials, user pool configurations, and federation setups, to a secondary user pool in a standby Region in near real-time. In the event of a disruption in the primary Region, signed-in users continue accessing their applications without re-authenticating, and registered users can sign in with their existing credentials. Multi-Region replication is available as an add-on for user pools in Essentials or Plus feature tiers across 16 Regions. Read more.
GPT-5.5, GPT-5.4, and Codex from OpenAI are now generally available on Amazon Bedrock — You can now use GPT-5.5 and GPT-5.4 in production workloads on Amazon Bedrock and build with Codex for AI-powered software development, with the same security, governance, and operational controls you already use across AWS. GPT-5.5 is the most capable model from OpenAI, excelling at agentic coding, data analysis, and multi-step autonomous tasks. Codex is available through the Codex App, the Codex CLI, and IDE integrations with Visual Studio Code, JetBrains, and Xcode. Pricing matches OpenAI first-party rates, and usage counts toward existing AWS commitments. Read more.
Last week’s launches
Here are some launches and updates from this past week that caught my attention:
For a full list of AWS announcements, be sure to keep an eye on the What’s New with AWS page.
Upcoming AWS events
Learn more about AWS, browse and join upcoming AWS-led in-person and virtual events, startup events, and developer-focused events as well as AWS Summits and AWS Community Days. Join the AWS Builder Center to connect with builders, share solutions, and access content that supports your development.
That’s all for this week. Check back next Monday for another Weekly Roundup!
— sebToday, we’re announcing a new console experience in Amazon Bedrock for you to experiment, iterate, and scale with the latest AI models on Amazon Bedrock’s next-generation inference engine built for high performance, reliability, and security. This console has a refreshed workflow optimized for bedrock-mantle endpoint, which supports the latest GPT, Claude, and open-weight models with the OpenAI Responses API, OpenAI Chat Completions API, and the Anthropic Messages API.
The new console experience makes it simple to find the right model and move quickly from evaluation to production.
How to get started
You can try a new experience by choosing Try the Bedrock Mantle Console from within the Amazon Bedrock console, or by using the new console link directly.

You can find a project-based dashboard to show inference requests and error by range of recent dates, recently used models, and the project list. You can create a project, assign models, configure API keys, and start making inference requests in minutes.

A new model catalog shows the latest GPT, Claude, and open-weight models that are supported on the bedrock-mantle engine. You can see the details of features, tokens, pricing, input/output, pricing information, and Regional availability. You can also compare up to 3 models in a single view.

When you choose the project dashboard, you can see the models used in the project, the distribution of your token usage such as total token usage, token usage per minute, inference requests per minute, and tokens per inference request. This can inform your model selection, prompt optimization, and workload consistency decisions.

You can select up to 3 models to start evaluating to compare responses side by side with the same prompt.

To build your application in the project, choose Getting started. You can migrate existing code, build a new app with the Anthropic or OpenAI SDK, or connect an AI coding assistant to Bedrock.
Choose the API & SDK, your SDK (either Anthropic or OpenAI), your preferred programming language, and your authentication method. It shows your environment code to run these in your terminal for a quick test, or save to a .env file for your application. You can also send your first request with sample code snippets to verify your setup.
When you choose Clients, you can select the AI coding agent source such as Claude Code, Cline, Codex, Cursor, or OpenCode that you want to connect to the bedrock-mantle engine. It provides instructions on how to install the AI agent, use your AWS IAM credentials or use a Bedrock API key, set environment variables, and route requests from each AI agent through Bedrock.

To learn about Anthropic- and OpenAI-compatible APIs, choose Live API docs. You can choose Anthropic API Protocol for access to Claude model features like the Messages API or OpenAI API Protocol for access to features like Responses API.
For example, when you choose OpenAI Response API, it retrieves a model response with the given model ID. These API references are automatically prefilled with the project’s selected model ID, Region, bedrock-mantle endpoint URL, and API key reference, and they update in place as you change models or settings.

You can also choose the existing Bedrock console to manage fully-managed features such as Agents, Knowledge Bases, Guardrails, fine-tuning, or the InvokeModel and Converse APIs to run on the bedrock-runtime endpoint.
Now available
The new console experience is available in all AWS Regions where the bedrock-mantle endpoint is offered: US East (N. Virginia, Ohio), US West (Oregon), Asia Pacific (Jakarta, Mumbai, Sydney, Tokyo), Europe (Frankfurt, Ireland, London, Milan, Stockholm), and South America (São Paulo). Check the full list of Regions for future updates.
Give the new console experience a try in the new Amazon Bedrock console and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.
— Channy
As a developer advocate working with web and mobile application developers, I’ve often heard about the need to maintain consistent user authentication in the unlikely event of a regional service interruption. The increasing use of agentic AI, microservices, automation, and service accounts has sparked a similar need for machine-to-machine authentication. Today, I’m excited to share two important updates to Amazon Cognito: multi-Region replication for improved resilience, and support for customer managed keys for more control encryption control.
Many applications rely on Amazon Cognito to handle user and machine-to-machine authentication, and to manage user profiles. When building for high availability, having consistent data across different AWS Regions is a key approach, and until now, achieving that consistency came with significant challenges. Engineering teams spent significant time building and maintaining custom replication solutions to synchronize configurations across Regions. Manual export and import of user data between Regions created security risks from potential data exposure and introduced opportunities for data inconsistencies. During regional transitions, end users experienced disruptions like forced password resets and re-authentication. For machine-to-machine communications, teams had to create new app clients in the secondary region, which meant reconfiguring their applications and updating OAuth-protected resources to accept access tokens issued by the new regional issuer. These challenges made it difficult to maintain uninterrupted operations across Regions.
With multi-Region replication, Amazon Cognito automatically maintains a synchronized copy of your user data and machine secrets in a secondary AWS Region of your choice. The replication flows in one direction, from your primary Region to the secondary Region. This includes user profiles, credentials, and pool configurations. The secondary Region operates in read-only mode, focusing on maintaining authentication capabilities. Existing sessions continue uninterrupted.
When you need to direct traffic to the secondary Region, your existing users can continue signing in with their existing credentials without disruption, and currently signed-in users remain authenticated because both regions recognize access tokens issued by either region. Multi-Region replication supports all authentication methods, including federated sign-in through social providers (Amazon, Google, Apple, Facebook), Security Assertion Markup Language (SAML) and OpenID Connect (OIDC) integrations, and API authorization flows. This approach maintains availability for both customer-facing applications and machine-to-machine communications in your backend services. While authentication continues without interruption, operations like new user registration or profile updates are not available during failover.
Before configuring multi-Region replication, you must configure a multi-Region customer managed key stored in AWS Key Management Service (AWS KMS) to encrypt your user data at rest. These keys provide consistent encryption across Regions while giving you control over your encryption strategy.
How this works in practice
I start this demo with an existing Cognito user pool in the us-west-2 (Oregon) Region. I want to configure replication to us-east-1 (Northern Virginia). I also have a customer managed key replicated in these two Regions.
Configuring multi-Region replication is just three steps. The AWS Management Console guides me through the steps: set up a custom key for encryption, configure multi-region OIDC endpoints, and configure the replication itself.
First, I set up a custom AWS KMS key to encrypt the data at rest.
I select the custom key I created. I also update the key policy to allow Amazon Cognito to access and use the key. The console shows the correct IAM policy statements to add to my key policy.
The console confirms when the custom key is selected and correctly configured.
Second, I follow the console instructions to configure the OIDC issuer type. On Step 2 – optional, I choose Configure.
I make sure to update my client applications with these new endpoints. This is a required change that will need a redeployment of server-side applications and an update submission for mobile apps on the App Store and Google Play. If I don’t update the endpoints, my users will experience disruptions because requests to the old endpoints will no longer be routed correctly.
On the next screen, I select Updated. I take note of the new URLs. I confirm the changes and choose Change issuer type.
Finally, I select the target Region for replication. Only Regions where the custom encryption key is replicated are available for selection. After having chosen the target Region, I choose Create.
.
The service prepares the replication. The time needed depends on the amount of data in the user pool.
When the replicated user pool is ready, I manually Activate it.
The replication status becomes Active. It is ready to direct traffic to the replica.
Additional configurations
The console helps me to keep track of additional configurations I have to plan. When I’m using Lambda functions for custom authentication flows or SMS or email notifications, I must also deploy and configure these resources in the new Region.
Similarly, log streaming or AWS WAF configuration must be manually configured in the target Region before I start directing authentication traffic to it.
Health checks and failover
Both primary and secondary regional endpoints remain active and ready to serve your traffic at all times. To monitor system health and manage failovers, you design a strategy that aligns with your application’s specific requirements and security posture. You can implement health checks to monitor the status of authentication services in your primary Region and define criteria for when to initiate failover. These checks might look for error rates, latency patterns, or specific service alerts.
When your monitoring system detects issues meeting your failover criteria, you can redirect traffic to the secondary Region through DNS updates. This approach gives you control over the failover process while maintaining security. Consider testing your failover strategy during off-peak hours by redirecting a small portion of traffic to verify that authentication continues working as expected in the secondary Region.
When using managed login and federation with custom domains, you can also use the built-in traffic routing feature by providing an Amazon Route 53 health check ID.
Pricing and availability
Multi-Region replication is available today as an add-on feature for Amazon Cognito customers using Essentials and Plus tier. For user authentication, the add-on costs $0.0045 per monthly active user per replica Region for Essentials tier customers and $0.006 per monthly active user per replica region for Plus tier customers. For machine-to-machine (M2M) authentication, the add-on is a 30% charge on top of the standard volume-based pricing for successful tokens issued. For detailed pricing information, see Amazon Cognito pricing.
Multi-Region replication is available in the following Regions: US East (Ohio, N. Virginia), US West (N. California, Oregon), Asia Pacific (Mumbai, Seoul, Singapore, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Paris, Stockholm), and South America (São Paulo).
Any of these Regions can be used as the source or the destination for the replication.
Support for customer managed keys is available for the Essentials and Plus tiers. It is available in the following Regions: US East (Ohio, N. Virginia), US West (N. California, Oregon), Africa (Cape Town), Asia Pacific (Hong Kong, Hyderabad, Jakarta, Malaysia, Melbourne, Mumbai, New Zealand, Osaka, Seoul, Singapore, Sydney, Thailand, Tokyo), Canada (Central), Canada West (Calgary), Europe (Frankfurt, Ireland, London, Milan, Paris, Spain, Stockholm, Zurich), Israel (Tel Aviv), Mexico (Central), South America (São Paulo), and AWS GovCloud (US-East, US-West)
From my conversations with customers, maintaining business continuity during regional incidents while meeting security requirements is a high priority. Multi-Region replication provides the capability to build more resilient applications without managing complex replication logic yourself. The automatic synchronization of user data and configurations reduces operational overhead while maintaining security.
For customers in regulated industries, the new support for customer managed keys provides additional control over data encryption. You can now use your own encryption keys to protect user data at rest, helping you meet regulatory requirements in industries like healthcare and financial services.
To get started with multi-Region replication and customer managed key encryption, visit the Amazon Cognito console or see the documentation for detailed setup instructions. I look forward to hearing how you use this feature to strengthen your application architecture.
— sebAs we previewed in What’s Next with AWS 2026, we’re announcing the general availability of OpenAI GPT-5.5, GPT-5.4 models, and Codex on Amazon Bedrock, giving you access to frontier models and a coding agent for software development.
According to OpenAI, GPT-5.5 and GPT-5.4 models are excellent for coding, reasoning, agentic workflows, and complex professional work. You can use GPT-5.5 for the hardest customer workloads and GPT-5.4 for the best price-performance. You can call them through Responses API on Amazon Bedrock’s next-generation inference engine built for high performance, reliability, and security.
Codex is the OpenAI coding agent for AI-powered software development. According to OpenAI, more than 4 million developers use Codex every week to write, refactor, debug, test, and validate code across large codebases. With GPT-5.5 powering inference, Codex introduces a new class of intelligence optimized for complex, long-horizon developer workflows. You can use the Codex App, the Codex CLI, and IDE integrations with Visual Studio Code, JetBrains, and Xcode, with all model inference routed through the Responses API on Amazon Bedrock.
For customers with data residency requirements, all processing stays within the Bedrock Region you select. You pay per token with no seat licenses and no per-developer commitments.
GPT-5.5 and GPT-5.4 models on Bedrock in action
You can access the model programmatically using the OpenAI Responses API to call the bedrock-mantle endpoints through the OpenAI SDK, command-line tools such as curl.
Let’s start with OpenAI SDK for Python. Install OpenAI SDK.
pip install -U openai
Set the environment variables for authentication.
export OPENAI_BASE_URL="https://bedrock-mantle.us-east-2.api.aws/openai/v1"
export OPENAI_API_KEY="<BEDROCK_API_KEY>"
export BEDROCK_OPENAI_MODEL_ID="openai.gpt-5.5"
Here is a sample Python code to call GPT-5.5 model on Bedrock:
import os
from openai import OpenAI
client = OpenAI(
base_url=os.environ["OPENAI_BASE_URL"],
api_key=os.environ["OPENAI_API_KEY"],
)
response = client.responses.create(
model=os.environ["BEDROCK_OPENAI_MODEL_ID"],
input=[
{
"role": "developer",
"content": "You are a software engineer with excellent AWS cloud knowledge. Be concise and practical.",
},
{
"role": "user",
"content": "Design a distributed architecture on AWS in Python that should support 100k requests per second across multiple geographic regions.",
},
],
reasoning={"effort": "medium"},
text={"verbosity": "low"},
)
print(response.output_text)
You can call directly the model endpoint using curl.
curl "$OPENAI_BASE_URL/responses" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "openai.gpt-5.5",
"input": [
{
"role": "developer",
"content": "You are a software engineer with excellent AWS cloud knowledge."
},
{
"role": "user",
"content": "Design a distributed architecture on AWS in Python that should support 100k requests per second across multiple geographic regions."
}
],
"reasoning": {"effort": "medium"},
"text": {"verbosity": "low"}
}'
You can use the Responses API when you want to use model-managed multi-turn state, need hosted tools, function tools, or richer tool orchestration, and run background or long-running work. To learn more, visit the OpenAI Cookbook Responses examples.
Using OpenAI Codex with GPT-5.5 on Amazon Bedrock
You can download Codex CLI, Codex App or Codex VS Code extension and get started with the Bedrock for model inference. Codex supports two Bedrock authentication pathways: Amazon Bedrock API key or AWS SDK credential chain. If you set AWS_BEARER_TOKEN_BEDROCK, Codex uses it first; otherwise Codex falls back to AWS SDK credential chain.
Set AWS_BEARER_TOKEN_BEDROCK in the environment that Codex will read:
export AWS_BEARER_TOKEN_BEDROCK=<your-bedrock-api-key>
Then, configure your preferred Region and set the model ID to openai.gpt-5.5 in ~/.codex/config.toml, which is required for Bedrock API-key authentication. You can also choose openai.gpt-5.4, openai.gpt-oss-120b, or openai.gpt-oss-20b. For the desktop app or VS Code extension, put any environment variables the app needs in ~/.codex/.env.
model = "openai.gpt-5.5"
model_provider = "amazon-bedrock"
[model_providers.amazon-bedrock.aws]
region = "us-east-2"
Restart the desktop app or VS Code extension after changing ~/.codex/config.toml or ~/.codex/.env. In Codex CLI, you should see a /status tab that looks like this:

In Codex App, you can use GPT-5.5 model through Amazon Bedrock inference.

Things to know
Let me share some important technical details that I think you’ll find useful.
medium effort. Start GPT-5.4 with effort set explicitly rather than relying on its none default.Now available
OpenAI GPT models and Codex on Amazon Bedrock are available today: GPT-5.5 model in the US East (Ohio) Region, GPT-5.4 model in the US East (Ohio) and US West (Oregon) Regions. Check the full list of Regions for future updates. To learn more, visit the OpenAI on Amazon Bedrock page and the Amazon Bedrock pricing page.
Give GPT-5.5, GPT-5.4 models, and Codex on Amazon Bedrock a try today and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.
— Channy
In my last Week in Review post, I shared what I’d been hearing from customers in the AI-Driven Development Lifecycle (AI-DLC) workshops I’ve been delivering. Last week I was back at it, this time in Denver for a two-day AI-DLC workshop, where I helped facilitate 17 teams to deliver nearly 20 separate use cases in just two days. The pace of acceleration that AI-DLC unlocks—especially when paired with tools like Claude Code on Amazon Bedrock—is fundamentally changing how businesses operate. Traditional roles within software development teams are collapsing into smaller, AI-augmented squads, and the paradigm shift is beginning to take place right in front of us. To learn more about how to utilize various AI tools, visit the GitHub repository of AI-DLC workflow.
This shift is also reshaping how AWS account teams (solutions architects, customer solutions managers, and technical account managers) collaborate with customers. It’s becoming less about handing off advisory design documents and more about building alongside them in real time. It’s a genuinely exciting moment to be in the middle of the change, and this week’s headline launch — Anthropic’s most capable model yet, now on AWS — is going to push that pace even further.
Now, let’s get into this week’s AWS news…
Headlines
Claude Opus 4.8 on AWS — Anthropic’s most capable generally available model is now accessible through both Amazon Bedrock and the Claude Platform on AWS. Opus 4.8 is built for agentic coding, knowledge work, and extended autonomous task execution — it sustains longer autonomous sessions with deeper reasoning, recovers from errors, and synthesizes information across lengthy documents. For coding workloads, it reads codebases like an engineer, plans before it edits, and holds context across long sessions. On Amazon Bedrock, you get AWS-managed features like Guardrails, Knowledge Bases, and data residency; on the Claude Platform on AWS, you get Anthropic’s native APIs unified with AWS billing. To learn more, visit the deep-dive blog post.
Last week’s launches
Here are some launches and updates from this past week that caught my attention:
For a full list of AWS announcements, be sure to keep an eye on the What’s New with AWS page.
Other AWS news
Here are some additional posts and resources that you might find interesting:
For a full list of AWS blog posts, be sure to keep an eye on the AWS Blogs page.
Learn more about AWS, browse and join upcoming AWS-led in-person and virtual events, startup events, and developer-focused events as well as AWS Summits and AWS Community Days. Join the AWS Builder Center to connect with builders, share solutions, and access content that supports your development.
That’s all for this week. Check back next Monday for another Weekly Roundup!
-Micah
Today, we’re announcing the next generation of AWS Resilience Hub with a significantly expanded experience that brings together a new application model, dependency discovery assessment, generative AI-powered failure mode analysis, modular resilience policies, and organization-wide reporting.
Organizations running hundreds of applications share a common challenge: availability is a top concern, yet there is no consistent way to set resilience goals, measure progress, or prove compliance across a portfolio. Teams set different standards, use different tools, and struggle to exchange information about whether applications actually meet expectations.
The next generation of AWS Resilience Hub changes this by giving Site Reliability Engineers (SREs) and development teams a structured way to align on resilience policy expectations, help application teams achieve them, and demonstrate compliance through testing. With integration into AWS Organizations, teams can now evaluate resilience at scale, identify failure modes, discover hidden dependencies, and report on progress across the enterprise.
The next generation of Resilience Hub walks you through your resilience journey and to help you there are the following concepts built into it.
The next generation of AWS Resilience Hub in action
To get started, you configure a resilience policy, set up your first system and service, run a failure mode assessment, review the results, and implement the findings.
Before you begin, you should set up the invoker IAM role, which grants Resilience Hub read-only access to your AWS resources, cross-account roles (if not using AWS Organizations), or service-linked roles (SLRs) with AWS Organizations. Resilience Hub also integrates with AWS Organizations to enable organization-wide resilience management from a single delegated administrator account. This eliminates the need to log in to individual accounts to assess resilience posture across your enterprise. To learn more, visit For prerequisite details in the AWS Resilience Hub User Guide.
To configure a resilience policy, choose Create policy in the Policies menu through the AWS Resilience Hub console. Enter a policy name, description, and choose resilience requirements. For example, you can create a reusable policy for multi-Region disaster recovery used in financial applications—including 99.95% availability SLO, 15-minutes RTO, 5-minutes RPO for multi-Region disaster recovery, and disaster recovery approach that aligns with your RTO and RPO requirements.
If you choose data recovery requirements, you can define the data recovery time objective for restoring from backups for each service associated with this policy.

To create your first system representing your business application, choose Create a system in the Systems menu. Optionally, you can enable AWS Organizations account access for this system.

Now you can create a service that represents a deployable unit, like one of your microservices, and associate it with your system, and tell Resilience Hub where to find your resources. Enter a service name, for example, stock-exchange-service, choose your resilience policy and invoker AWS IAM role name. You can choose service Regions, service resources such as your resource tags, AWS CloudFormation stack, Terraform state file location, or Amazon EKS cluster and namespace.
When you enable dependency discovery for this service, AWS examines your VPC query logs for the VPCs associated with the resources in your service. You can disable this feature anytime from the dependency discovery settings in the service details page.

Now, you can run your first assessment with the service creation complete and a policy applied. Choose Run failure mode assessment in your service page and wait for the assessment to complete.

During the assessment, Resilience Hub assumes your invoker role, reads resources from your configured input sources, identifies parent-child relationships, queries the application topology service to map connections between resources, and builds a topology showing data flow, containment, and permissions.
By choosing Service topology, you can see service resources grouped by service functions in the graph, table, or JSON format.

By choosing Failure mode guidance, you can add assertions used to guide the agents while performing the failure mode assessment. Assertions are either generated by the agent or added by users. You can update them to improve assessment accuracy.

Once the assessment is complete, you can review findings and recommendations in the Assessment tab of your service page. Each finding tells you what the failure mode is, why it matters for your architecture, how to fix it, and which policy requirement it relates to.

You can choose Mark as resolved to implement the recommendation or Mark as irrelevant if the finding doesn’t apply to your use case.
If you’re an existing Resilience Hub customer, Resilience Hub provides migration APIs to simplify the transition of your previous applications. These APIs convert your previous assessment policies to new resilience policies, map your previous applications to the new model, such as multiple related applications to one system with multiple services.
For more information about new features, visit the AWS Resilience Hub User Guide.
Now available
The next generation of AWS Resilience Hub is now generally available in AWS commercial Regions where Resilience Hub is available. For Regional availability and the future roadmap, visit the AWS Capabilities by Region.
Resilience Hub uses a new service-based pricing model. Pricing includes two failure mode assessments per month for services, and optionally automated dependency assessment. You can try AWS Resilience Hub free. For pricing details, visit the AWS Resilience Hub pricing page.
Give the new AWS Resilience Hub a try in the Resilience Hub console and send feedback to AWS re:Post for Resilience Hub or through your usual AWS Support contacts.
— Channy
Today, we’re announcing the next generation of Amazon OpenSearch Serverless, a fully managed search and vector engine designed for customers building AI agents. The next generation of OpenSearch Serverless scales from zero to thousands of requests per second and back to zero when idle, offering up to 60% cost savings compared to the cost of OpenSearch Service clusters provisioned for peak capacity.
The next generation of OpenSearch Serverless creates resources in seconds and scales capacity up to 20 times faster than the previous generation. With instant resource creation and native integrations with AI development platforms like Vercel and Kiro, you can deploy production-ready search and vector backends for your AI agents in minutes without managing infrastructure.
The next generation of OpenSearch Serverless in action
To get started with the next generation of OpenSearch Serverless, choose Create collection in the Serverless menu in the Amazon OpenSearch Service console.

Create NextGen collection with instant auto scaling and scale-to-zero for cost optimization. At launch, we support full-text search and vector search only for the collection type. If you want to use the existing OpenSearch Serverless infrastructure, choose Switch to Classic.
Choose Express create, the fastest way to create collection. No configuration is required—the default settings and matching security policies are applied automatically. Some configuration options can be changed later.

When you choose Create collection, OpenSearch Serverless will provision resources in seconds.
You can also create a collection of OpenSearch Serverless with AWS Command Line Interface (AWS CLI) or AWS SDKs. Here is a sample CLI command to create a collection group.
aws opensearchserverless create-collection-group \
--name channy-nextgen-group \
--standby-replicas ENABLED \
--generation NEXTGEN \
--description "My NextGen collection group" \
--capacity-limits '{
"maxIndexingCapacityInOCU": 10,
"maxSearchCapacityInOCU": 10,
"minIndexingCapacityInOCU": 0,
"minSearchCapacityInOCU": 0
}' \
--region "us-east-1"
Now, you can create a collection that inherits the generation from its parent collection group. Supported collection types: SEARCH and VECTORSEARCH.
aws opensearchserverless create-collection \
--name channy-nextgen-collection \
--type SEARCH \
--collection-group-name channy-nextgen-group \
--standby-replicas ENABLED \
--description "My collection in NextGen group" \
--region "us-east-1"
To learn more about managing the next generation of OpenSearch Serverless, visit the Amazon OpenSearch Serverless documentation.
Building your agents faster with OpenSearch Serverless
To support building production-ready agent applications in Vercel, you can now create a new OpenSearch collection or connect your existing OpenSearch Serverless collection within the Vercel console. Create a search backend in seconds and add features on-demand as your application grows. To learn more, visit AWS for Vercel.

You can go from idea to working prototype in minutes using Claude Code, Cursor, and Kiro. OpenSearch Agent Skills provide a repository of skills that bring OpenSearch intelligence directly into your agent. Each skill encapsulates domain knowledge, best practices, and multi-step execution logic for a specific workflow–so your agent not only gets results, but understands how they were achieved. You can also use the OpenSearch Launchpad in Kiro Powers to accelerate search applications with guided, end-to-end architecture planning.

Now available
The next generation of Amazon OpenSearch Serverless is generally available today and is available in all AWS commercial Regions where Amazon OpenSearch Serverless is currently available.
The next generation of OpenSearch Serverless charges for the compute you use in OpenSearch Compute Units (OCUs) for indexing, search, and GPU acceleration. You are charged separately for storage in GB-month. For more information, see Amazon OpenSearch Service Pricing.
Give it a try and send feedback to the AWS re:Post for Amazon OpenSearch Service or through your usual AWS Support contacts.
— Channy