Wednesday, June 10, 2026

Now available: Amazon EC2 M9g and M9gd instances powered by new AWS Graviton5 processors

AWS Graviton processors have improved steadily across generations, with each iteration delivering advances in compute performance, price-performance, and energy efficiency. At re:Invent 2025, we announced Amazon EC2 M9g, the first Graviton5-powered instances, in preview. Since then, customers have tested M9g across a wide range of workloads and shared their results. ClickHouse saw a 36% performance boost compared to M8g, with zero code changes. Honeycomb achieved 36% better throughput per core compared to Graviton4, across a 6-month A/B test of production observability workloads. HubSpot deployed M9g for MySQL databases and saw query duration drop by up to 60%. Today, M9g instances are generally available, alongside the new M9gd instances for customers who need high-speed, low-latency local NVMe SSD storage. Both are powered by Graviton5, the most powerful and most energy efficient processor AWS has ever built.

While many Arm-based instances have been introduced across the industry, no one comes close to the breadth and depth of the AWS Graviton footprint. After five generations of custom silicon and eight years of continuous investment, Graviton powers over 350 instance types serving more than 120,000 customers, from startups to large enterprises, a robust ISV partner ecosystem, and a broad set of managed services. You can use Graviton for a broad variety of workloads, including web applications, microservices, analytics, databases, machine learning (ML) inference, electronic design automation (EDA), gaming, and video encoding. As workloads grow more compute-intensive and data-driven, many have asked for more processing power, along with greater network and storage bandwidth to move more data and complete workloads faster. We’ve also designed these instances to efficiently package compute, memory, and I/O to maximize energy utilization.

As AI shifts from answering questions to taking actions, running code, using tools, evaluating results, and orchestrating multi-step tasks, the demand for CPU compute is growing rapidly. Graviton5 is built for this shift. With 192 cores, a 5x larger L3 cache, up to 33% lower inter-core latency, and DDR5 memory delivering high bandwidth, Graviton5 helps agents spend less time waiting on CPU-bound steps, processing more instructions, handling large numbers of concurrent environments, and keeping accelerators moving.

Meta is deploying Graviton at scale starting with tens of millions of cores to support its agentic AI efforts, making Meta one of the largest Graviton customers in the world. Agentic AI workloads, including real-time reasoning, code generation, and the orchestration of multi-step tasks, are CPU-intensive and benefit from the higher compute performance, larger caches, higher memory bandwidth, and core density in Graviton5.

What’s new in M9g and M9gd
Built on the sixth-generation AWS Nitro System, M9g instances are powered by AWS Graviton5 processors that deliver higher compute performance, larger caches, and improved memory and I/O scalability compared to Graviton4 processors. Graviton5 offers up to 25% better compute performance compared to Graviton4-based instances, with up to 35% faster performance for web applications, up to 35% for machine learning inference, and up to 30% for databases. As the first CPU in the AWS fleet to support the latest generation of PCIe Gen6 and DDR5-8800 memory, AWS Graviton5 instances deliver the fastest memory of any processor instances in the cloud, and 5 times more L3 cache compared to the previous generation. These improvements also come with better energy efficiency, helping you meet sustainability targets without compromising capability.

Networking and storage bandwidth have been expanded to keep pace with compute growth. M9g and M9gd instances offer up to 15% higher network bandwidth and 20% higher Amazon Elastic Block Store (Amazon EBS) bandwidth on average across sizes, with up to twice the network bandwidth for the largest instance size. M9g and M9gd instances also support Instance Bandwidth Configuration (IBC), a feature that helps you adjust the allocation of bandwidth between Amazon EBS and Amazon Virtual Private Cloud (Amazon VPC) networking for an Amazon EC2 instance by up to 25%. IBC can help optimize performance for workloads with specific bandwidth requirements, such as database read and write performance, query processing, and logging. These enhancements support faster data movement and improved throughput for workloads that rely on high I/O performance.

Security and isolation are foundational requirements for running workloads in the cloud. Within the Nitro System, the AWS Nitro Hypervisor is designed to isolate instances from each other as well as AWS operators. With M9g and M9gd instances we are raising the bar on security even further with the introduction of Nitro Isolation Engine. Nitro Isolation Engine is an enhancement to the Nitro System, which enforces isolation of instances and harnesses formal verification to provide assurances of isolation with mathematical precision. Nitro Isolation Engine is a purpose-built component that is responsible for enforcing isolation between virtual machines, including mediation of all access to virtual machine memory, CPU register state, and I/O devices through a minimal set of APIs. Nitro Isolation Engine leverages formal verification, a technique to mathematically demonstrate that the hardware or software behaves as intended, and not just in specific test cases. This intensive verification technique establishes Nitro as the first formally verified cloud hypervisor, pioneering a new standard for mathematically proven cloud security.

M9g instances provide one vCPU for every four GiB of memory and are well suited for a broad range of general-purpose workloads, including application servers, microservices, midsize data stores, gaming servers, caching fleets, containerized applications, large-scale Java applications, code repositories, web applications, and agentic AI.

For workloads that need high-speed, low-latency local storage, M9gd instances provide up to 11.4 TB of NVMe SSD storage and 30% higher IOPS and storage performance compared to Graviton4-based M8gd instances. M9gd instances are well suited for general-purpose workloads that require a balance of compute and memory with high-speed, low-latency local storage, including application servers, microservices, gaming servers, midsize key-value data stores, caching fleets, data logging, media processing, batch and log processing, and applications that need temporary storage such as caches and scratch files.

Here are the key specifications across the family:

M9g vCPUs Memory (GiB) Network bandwidth (Gbps) EBS bandwidth (Gbps)
medium 1 4 Up to 15 Up to 12
large 2 8 Up to 15 Up to 12
xlarge 4 16 Up to 15 Up to 12
2xlarge 8 32 Up to 17 Up to 12
4xlarge 16 64 Up to 17 Up to 12
8xlarge 32 128 17 12
12xlarge 48 192 25 18
16xlarge 64 256 34 24
24xlarge 96 384 50 36
48xlarge 192 768 100 72
metal-48xl 192 768 100 72

M9gd instances include local NVMe SSD storage. The table below shows the instance storage for each size. Compute, memory, network, and EBS bandwidth specifications are the same as M9g.

M9gd vCPUs Memory (GiB) Instance storage (GB) Network bandwidth (Gbps) EBS bandwidth (Gbps)
medium 1 4 1 x 59 NVMe SSD Up to 15 Up to 12
large 2 8 1 x 118 NVMe SSD Up to 15 Up to 12
xlarge 4 16 1 x 237 NVMe SSD Up to 15 Up to 12
2xlarge 8 32 1 x 475 NVMe SSD Up to 17 Up to 12
4xlarge 16 64 1 x 950 NVMe SSD Up to 17 Up to 12
8xlarge 32 128 1 x 1900 NVMe SSD 17 12
12xlarge 48 192 3 x 950 NVMe SSD 25 18
16xlarge 64 256 1 x 3800 NVMe SSD 34 24
24xlarge 96 384 3 x 1900 NVMe SSD 50 36
48xlarge 192 768 3 x 3800 NVMe SSD 100 72
metal-48xl 192 768 3 x 3800 NVMe SSD 100 72

Now available
M9g and M9gd instances are available in the US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Frankfurt) Regions. M9g and M9gd instances are available for purchase through Savings Plans, On-Demand, Spot Instances, Dedicated Instances, or Dedicated Hosts. For more information, visit Amazon EC2 pricing.

To get started with M9g and M9gd instances, several resources are available. The AWS Graviton Getting Started Guide is a technical guide covering how to build, run, and optimize workloads on Graviton-based instances. The Graviton Savings Dashboard helps you track and measure the cost savings from running workloads on Graviton-based instances. And AWS Transform is an AI-powered service that automates code transformations for migrating Java applications from x86 to Graviton-based Amazon EC2 instances, handling compatibility analysis, automated recompilation, dependency updates, and validation.

To learn more about Graviton-based instances, visit AWS Graviton Processors or Level up your compute with AWS Graviton.

— Esra

from AWS News Blog https://ift.tt/pdM9o18
via IFTTT

Tuesday, June 9, 2026

Anthropic Claude Fable 5 on AWS: Mythos-class capabilities with built-in safeguards now available

Today, we’re announcing the availability of Claude Fable 5 on Amazon Bedrock and Claude Platform on AWS. Claude Fable 5 makes Mythos-level capabilities available to all customers, with strong safeguards designed to make it safe for broader use. Fable 5 is state-of-the-art on nearly all tested benchmarks and delivers exceptional performance in software engineering, knowledge work tasks, and vision – built for ambitious, long running work.

With Claude Fable 5 on Bedrock, you can build within your existing AWS environment and scale inference workloads. You can also use Claude Fable 5 through the Claude Platform on AWS, giving you Anthropic’s native platform experience.

According to Anthropic, Claude Fable 5 represents a step-change in what you can accomplish with AI models. Here is what makes this model different:

  • Long-running, asynchronous execution — Claude Fable 5 handles complex tasks that previous models could not sustain, executing coding and knowledge work tasks for extended periods without intervention.
  • Advanced vision capabilities — Claude Fable 5 understands diagrams, charts, and tables nested in files and PDFs. This opens up research and document-heavy work in finance, legal, analytics, architecture, and gaming. In coding, the model implements designs with high fidelity and uses vision to critique its output against goals.
  • Proactive self-verification — The model self-updates skills based on learnings, develops its own harnesses and evaluations.

Claude Fable 5 includes safeguards that limit its performance in specific areas where misuse risk is elevated. Harmful prompts related to cybersecurity, biology, chemistry, and health fall back to receive a response from Opus 4.8 instead. Anthropic is able to expand access to nearly all of Claude Fable 5’s state-of-the-art capabilities by developing more powerful safeguards. The same model without these limits is Claude Mythos 5 and it will only be available to a small group of vetted customers.

Claude Fable 5 model in action
You can use Claude Fable 5 in both Amazon Bedrock and Claude Platform on AWS. This post will cover guidance on how to access and use on Amazon Bedrock. For guidance on the Claude Platform on AWS, visit the documentation to learn more.

To get started with Amazon Bedrock, you can only access the model programmatically now using the Anthropic Messages API to call the bedrock-runtime or bedrock-mantle endpoints through Anthropic SDK. You can sole keep using the Invoke and Converse API on bedrock-runtime through the AWS Command Line Interface (AWS CLI) and AWS SDK. The console support is coming soon.

In order to access Claude Fable 5 model, you must opt into data sharing by using the Data Retention API and setting provider_data_sharing before you can invoke the models. There is no console user interface for this setting at launch.

curl -X PUT https://bedrock-mantle.us-east-1.api.aws/v1/data_retention \
  -H "x-api-key: <your-bedrock-api-key>" \ 
  -H "Content-Type: application/json" \
  -d '{ "mode": "provider_data_share" }'

This mode allows Amazon Bedrock to retain and share your inference data with model providers per their requirements. Anthropic requires 30-day inputs and outputs retention, as well as human review. To learn more, visit the Amazon Bedrock abuse detection.

Let’s start with Anthropic SDK for Python using the Messages API on bedrock-mantle endpoint. Install Anthropic SDK.

pip install anthropic

Here is a sample Python code to call Claude Fable 5 model:

import anthropic

client = anthropic.Anthropic(
    base_url="https://bedrock-mantle.us-east-1.api.aws/anthropic",
    api_key= <your-bedrock-api-key>
)

message = client.messages.create( 
     model="anthropic.claude-fable-5", 
	 max_tokens=4096, 
	 messages=[ 
	     { "role": "user", 
		   "content": "Design a distributed architecture on AWS in Python that should support 100k requests per second across multiple geographic regions", 
		 }, 
	 ], 
)

print(message.content[0].text)

To learn more, check out Anthropic Messages API code examples and notebook examples for multiple use cases and a variety of programming languages.

You can also use Claude Fable 5 with the Invoke API and Converse API on bedrock-runtime endpoint. Here’s a example to call Converse API for a unified multi-model experience using the AWS SDK for Python (Boto3):

import boto3 
bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1") 
response = bedrock_runtime.converse( 
    modelId="us.anthropic.claude-fable-5", 
    messages=[ 
        { 
            "role": "user", 
            "content": [ 
                { 
                    "text": "Design a distributed architecture on AWS in Python that should support 100k requests per second across multiple geographic regions." 
                } 
            ] 
        } 
    ], 
    inferenceConfig={ 
        "maxTokens": 4096 
    } 
) 
print(response["output"]["message"]["content"][0]["text"]) 

To learn more, visit code examples that show how to use Amazon Bedrock Runtime with AWS SDKs.

Things to know
Let me share some important technical details that I think you’ll find useful.

  • Model access — Claude Fable 5 access is gradually expanding for all AWS accounts. If your account doesn’t have access yet, it will be enabled soon depending on your Bedrock usage. If you want to get access to this model quickly, contact your usual AWS Support.
  • Pricing — When a harmful prompt is routed to Opus 4.8 instead of Fable 5, you pay only Opus prices. If a request is blocked mid-conversation, initial tokens are charged at Fable rates and subsequent tokens at Opus rates. To learn more, visit the Amazon Bedrock pricing page.
  • Data retention — For Fable 5, Mythos 5, and future models on Bedrock with similar or higher capability levels, Anthropic will require 30-day retention for all traffic on Mythos-class models. Retaining data for a limited period allows Anthropic to detect patterns of misuse that are not visible from a single exchange. Once you opt into data retention, your data will leave AWS’s data and security boundary.
  • Claude Mythos 5 on Bedrock (Limited Preview) – You can also use Anthropic’s most capable model for cybersecurity and life sciences, including vulnerability discovery, drug design, and biodefense screening. Access is currently limited due to the dual-use nature of these domains. To learn more, visit the model card documentation.

Now available
Anthropic’s Claude Fable 5 model is available today on Amazon Bedrock in the US East (N. Virginia) and Europe (Stockholm) Regions; check the full list of Regions for future updates. Claude Fable 5 is also available on the Claude Platform on AWS in North America, South America, Europe, and Asia Pacific.

Give Claude Fable 5 a try with the Amazon Bedrock APIs, in the Claude Platform on AWS, and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

Channy



from AWS News Blog https://ift.tt/kFhpPZm
via IFTTT

Monday, June 8, 2026

AWS Weekly Roundup: BYOM for Amazon RDS for SQL Server, AWS IoT Device SDK for Swift, and more (June 8, 2026)

This week, the AWS IoT Device SDK for Swift reached general availability. As a member of the Swift Server Workgroup (SSWG), this one caught my attention. The SDK brings production-ready MQTT 5 connectivity, Device Shadow, Jobs, and fleet provisioning to Swift developers on macOS, iOS, tvOS, and Linux.

Swift on IoT and Edge devices, an AI generated illustration

I’m curious to see what you will build with it. Swift on the server has matured over the past few years, and now it reaches IoT devices too. This connects to a broader trend of running Swift at the edge. WendyOS, for example, is an open-source operating system for physical AI that offers first-class Swift support for deploying apps to NVIDIA Jetson and Raspberry Pi hardware. Between server-side Swift, IoT, and edge computing, the language is showing up in places that would have surprised most people a few years ago.

Now, let’s get into this week’s AWS news.

Headlines
Amazon RDS for SQL Server supports Bring Your Own Media — Customers who migrate SQL Server applications from on-premises environments can now reuse their existing Microsoft SQL Server licenses, including Software Assurance, through Microsoft’s License Mobility program on Amazon RDS. BYOM is integrated with AWS License Manager for tracking license usage and compliance. Read more.

Amazon Cognito now supports multi-Region replication — You can now synchronize user and machine identity data, including credentials, user pool configurations, and federation setups, to a secondary user pool in a standby Region in near real-time. In the event of a disruption in the primary Region, signed-in users continue accessing their applications without re-authenticating, and registered users can sign in with their existing credentials. Multi-Region replication is available as an add-on for user pools in Essentials or Plus feature tiers across 16 Regions. Read more.

GPT-5.5, GPT-5.4, and Codex from OpenAI are now generally available on Amazon Bedrock — You can now use GPT-5.5 and GPT-5.4 in production workloads on Amazon Bedrock and build with Codex for AI-powered software development, with the same security, governance, and operational controls you already use across AWS. GPT-5.5 is the most capable model from OpenAI, excelling at agentic coding, data analysis, and multi-step autonomous tasks. Codex is available through the Codex App, the Codex CLI, and IDE integrations with Visual Studio Code, JetBrains, and Xcode. Pricing matches OpenAI first-party rates, and usage counts toward existing AWS commitments. Read more.

Last week’s launches
Here are some launches and updates from this past week that caught my attention:

For a full list of AWS announcements, be sure to keep an eye on the What’s New with AWS page.

Upcoming AWS events
Learn more about AWS, browse and join upcoming AWS-led in-person and virtual events, startup events, and developer-focused events as well as AWS Summits and AWS Community Days. Join the AWS Builder Center to connect with builders, share solutions, and access content that supports your development.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— seb

from AWS News Blog https://ift.tt/0ozwdZE
via IFTTT

Friday, June 5, 2026

Try the new console experience in Amazon Bedrock, optimized for Anthropic- and OpenAI-compatible APIs

Today, we’re announcing a new console experience in Amazon Bedrock for you to experiment, iterate, and scale with the latest AI models on Amazon Bedrock’s next-generation inference engine built for high performance, reliability, and security. This console has a refreshed workflow optimized for bedrock-mantle endpoint, which supports the latest GPT, Claude, and open-weight models with the OpenAI Responses API, OpenAI Chat Completions API, and the Anthropic Messages API.

The new console experience makes it simple to find the right model and move quickly from evaluation to production.

  • New model card – You can browse the full model catalog, compare them side by side on capabilities, modality support, context window, and applicable service quotas in a single view, removing the need to stitch together documentation, and limit calculators.
  • Project-based work – You can make a project to run evaluations and review usage insights in one streamlined workflow that mirrors the lifecycle of building a generative AI application.
  • Live documentation – You can use project-aware live documentation: code samples, SDK snippets, and API references are automatically prefilled with your project variables. You can copy a snippet straight from the console into your application and run it without modification.

How to get started
You can try a new experience by choosing Try the Bedrock Mantle Console from within the Amazon Bedrock console, or by using the new console link directly.

You can find a project-based dashboard to show inference requests and error by range of recent dates, recently used models, and the project list. You can create a project, assign models, configure API keys, and start making inference requests in minutes.

A new model catalog shows the latest GPT, Claude, and open-weight models that are supported on the bedrock-mantle engine. You can see the details of features, tokens, pricing, input/output, pricing information, and Regional availability. You can also compare up to 3 models in a single view.

When you choose the project dashboard, you can see the models used in the project, the distribution of your token usage such as total token usage, token usage per minute, inference requests per minute, and tokens per inference request. This can inform your model selection, prompt optimization, and workload consistency decisions.

You can select up to 3 models to start evaluating to compare responses side by side with the same prompt.

To build your application in the project, choose Getting started. You can migrate existing code, build a new app with the Anthropic or OpenAI SDK, or connect an AI coding assistant to Bedrock.

Choose the API & SDK, your SDK (either Anthropic or OpenAI), your preferred programming language, and your authentication method. It shows your environment code to run these in your terminal for a quick test, or save to a .env file for your application. You can also send your first request with sample code snippets to verify your setup.

When you choose Clients, you can select the AI coding agent source such as Claude Code, Cline, Codex, Cursor, or OpenCode that you want to connect to the bedrock-mantle engine. It provides instructions on how to install the AI agent, use your AWS IAM credentials or use a Bedrock API key, set environment variables, and route requests from each AI agent through Bedrock.

To learn about Anthropic- and OpenAI-compatible APIs, choose Live API docs. You can choose Anthropic API Protocol for access to Claude model features like the Messages API or OpenAI API Protocol for access to features like Responses API.

For example, when you choose OpenAI Response API, it retrieves a model response with the given model ID. These API references are automatically prefilled with the project’s selected model ID, Region, bedrock-mantle endpoint URL, and API key reference, and they update in place as you change models or settings.

You can also choose the existing Bedrock console to manage fully-managed features such as Agents, Knowledge Bases, Guardrails, fine-tuning, or the InvokeModel and Converse APIs to run on the bedrock-runtime endpoint.

Now available
The new console experience is available in all AWS Regions where the bedrock-mantle endpoint is offered: US East (N. Virginia, Ohio), US West (Oregon), Asia Pacific (Jakarta, Mumbai, Sydney, Tokyo), Europe (Frankfurt, Ireland, London, Milan, Stockholm), and South America (São Paulo). Check the full list of Regions for future updates.

Give the new console experience a try in the new Amazon Bedrock console and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

Channy



from AWS News Blog https://ift.tt/WrgSaJo
via IFTTT

Wednesday, June 3, 2026

Improve your application resilience with Amazon Cognito multi-Region replication

As a developer advocate working with web and mobile application developers, I’ve often heard about the need to maintain consistent user authentication in the unlikely event of a regional service interruption. The increasing use of agentic AI, microservices, automation, and service accounts has sparked a similar need for machine-to-machine authentication. Today, I’m excited to share two important updates to Amazon Cognito: multi-Region replication for improved resilience, and support for customer managed keys for more control encryption control.

Many applications rely on Amazon Cognito to handle user and machine-to-machine authentication, and to manage user profiles. When building for high availability, having consistent data across different AWS Regions is a key approach, and until now, achieving that consistency came with significant challenges. Engineering teams spent significant time building and maintaining custom replication solutions to synchronize configurations across Regions. Manual export and import of user data between Regions created security risks from potential data exposure and introduced opportunities for data inconsistencies. During regional transitions, end users experienced disruptions like forced password resets and re-authentication. For machine-to-machine communications, teams had to create new app clients in the secondary region, which meant reconfiguring their applications and updating OAuth-protected resources to accept access tokens issued by the new regional issuer. These challenges made it difficult to maintain uninterrupted operations across Regions.

With multi-Region replication, Amazon Cognito automatically maintains a synchronized copy of your user data and machine secrets in a secondary AWS Region of your choice. The replication flows in one direction, from your primary Region to the secondary Region. This includes user profiles, credentials, and pool configurations. The secondary Region operates in read-only mode, focusing on maintaining authentication capabilities. Existing sessions continue uninterrupted.

When you need to direct traffic to the secondary Region, your existing users can continue signing in with their existing credentials without disruption, and currently signed-in users remain authenticated because both regions recognize access tokens issued by either region. Multi-Region replication supports all authentication methods, including federated sign-in through social providers (Amazon, Google, Apple, Facebook), Security Assertion Markup Language (SAML) and OpenID Connect (OIDC) integrations, and API authorization flows. This approach maintains availability for both customer-facing applications and machine-to-machine communications in your backend services. While authentication continues without interruption, operations like new user registration or profile updates are not available during failover.

Before configuring multi-Region replication, you must configure a multi-Region customer managed key stored in AWS Key Management Service (AWS KMS) to encrypt your user data at rest. These keys provide consistent encryption across Regions while giving you control over your encryption strategy.

How this works in practice
I start this demo with an existing Cognito user pool in the us-west-2 (Oregon) Region. I want to configure replication to us-east-1 (Northern Virginia). I also have a customer managed key replicated in these two Regions.

Configuring multi-Region replication is just three steps. The AWS Management Console guides me through the steps: set up a custom key for encryption, configure multi-region OIDC endpoints, and configure the replication itself.

First, I set up a custom AWS KMS key to encrypt the data at rest.

Cognito Multi-Region replication - initial state

I select the custom key I created. I also update the key policy to allow Amazon Cognito to access and use the key. The console shows the correct IAM policy statements to add to my key policy.

Cognito Multi-Region replication - select CMK

The console confirms when the custom key is selected and correctly configured.

Cognito Multi-Region replication - confirm CMK

Second, I follow the console instructions to configure the OIDC issuer type. On Step 2 – optional, I choose Configure.

Cognito Multi-Region replication - configure multi region OIDC 1

I make sure to update my client applications with these new endpoints. This is a required change that will need a redeployment of server-side applications and an update submission for mobile apps on the App Store and Google Play. If I don’t update the endpoints, my users will experience disruptions because requests to the old endpoints will no longer be routed correctly.

On the next screen, I select Updated. I take note of the new URLs. I confirm the changes and choose Change issuer type.

Cognito Multi-Region replication - configure multi region OIDC 2Finally, I select the target Region for replication. Only Regions where the custom encryption key is replicated are available for selection. After having chosen the target Region, I choose Create.Cognito Multi-Region replication - start the replication process.

The service prepares the replication. The time needed depends on the amount of data in the user pool.

When the replicated user pool is ready, I manually Activate it.

Cognito Multi-Region replication - replication process is complete

The replication status becomes Active. It is ready to direct traffic to the replica.

Cognito Multi-Region replication - active

Additional configurations
The console helps me to keep track of additional configurations I have to plan. When I’m using Lambda functions for custom authentication flows or SMS or email notifications, I must also deploy and configure these resources in the new Region.

Similarly, log streaming or AWS WAF configuration must be manually configured in the target Region before I start directing authentication traffic to it.

Cognito Multi-Region replication - task list

Health checks and failover
Both primary and secondary regional endpoints remain active and ready to serve your traffic at all times. To monitor system health and manage failovers, you design a strategy that aligns with your application’s specific requirements and security posture. You can implement health checks to monitor the status of authentication services in your primary Region and define criteria for when to initiate failover. These checks might look for error rates, latency patterns, or specific service alerts.

When your monitoring system detects issues meeting your failover criteria, you can redirect traffic to the secondary Region through DNS updates. This approach gives you control over the failover process while maintaining security. Consider testing your failover strategy during off-peak hours by redirecting a small portion of traffic to verify that authentication continues working as expected in the secondary Region.

When using managed login and federation with custom domains, you can also use the built-in traffic routing feature by providing an Amazon Route 53 health check ID.

Pricing and availability
Multi-Region replication is available today as an add-on feature for Amazon Cognito customers using Essentials and Plus tier. For user authentication, the add-on costs $0.0045 per monthly active user per replica Region for Essentials tier customers and $0.006 per monthly active user per replica region for Plus tier customers. For machine-to-machine (M2M) authentication, the add-on is a 30% charge on top of the standard volume-based pricing for successful tokens issued. For detailed pricing information, see Amazon Cognito pricing.

Multi-Region replication is available in the following Regions: US East (Ohio, N. Virginia), US West (N. California, Oregon), Asia Pacific (Mumbai, Seoul, Singapore, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Paris, Stockholm), and South America (São Paulo).

Any of these Regions can be used as the source or the destination for the replication.

Support for customer managed keys is available for the Essentials and Plus tiers. It is available in the following Regions: US East (Ohio, N. Virginia), US West (N. California, Oregon), Africa (Cape Town), Asia Pacific (Hong Kong, Hyderabad, Jakarta, Malaysia, Melbourne, Mumbai, New Zealand, Osaka, Seoul, Singapore, Sydney, Thailand, Tokyo), Canada (Central), Canada West (Calgary), Europe (Frankfurt, Ireland, London, Milan, Paris, Spain, Stockholm, Zurich), Israel (Tel Aviv), Mexico (Central), South America (São Paulo), and AWS GovCloud (US-East, US-West)

From my conversations with customers, maintaining business continuity during regional incidents while meeting security requirements is a high priority. Multi-Region replication provides the capability to build more resilient applications without managing complex replication logic yourself. The automatic synchronization of user data and configurations reduces operational overhead while maintaining security.

For customers in regulated industries, the new support for customer managed keys provides additional control over data encryption. You can now use your own encryption keys to protect user data at rest, helping you meet regulatory requirements in industries like healthcare and financial services.

To get started with multi-Region replication and customer managed key encryption, visit the Amazon Cognito console or see the documentation for detailed setup instructions. I look forward to hearing how you use this feature to strengthen your application architecture.

— seb

from AWS News Blog https://ift.tt/fAOgLsD
via IFTTT

Monday, June 1, 2026

Get started with OpenAI GPT-5.5, GPT-5.4 models, and Codex on Amazon Bedrock

As we previewed in What’s Next with AWS 2026, we’re announcing the general availability of OpenAI GPT-5.5, GPT-5.4 models, and Codex on Amazon Bedrock, giving you access to frontier models and a coding agent for software development.

According to OpenAI, GPT-5.5 and GPT-5.4 models are excellent for coding, reasoning, agentic workflows, and complex professional work. You can use GPT-5.5 for the hardest customer workloads and GPT-5.4 for the best price-performance. You can call them through Responses API on Amazon Bedrock’s next-generation inference engine built for high performance, reliability, and security.

Codex is the OpenAI coding agent for AI-powered software development. According to OpenAI, more than 4 million developers use Codex every week to write, refactor, debug, test, and validate code across large codebases. With GPT-5.5 powering inference, Codex introduces a new class of intelligence optimized for complex, long-horizon developer workflows. You can use the Codex App, the Codex CLI, and IDE integrations with Visual Studio Code, JetBrains, and Xcode, with all model inference routed through the Responses API on Amazon Bedrock.

For customers with data residency requirements, all processing stays within the Bedrock Region you select. You pay per token with no seat licenses and no per-developer commitments.

GPT-5.5 and GPT-5.4 models on Bedrock in action
You can access the model programmatically using the OpenAI Responses API to call the bedrock-mantle endpoints through the OpenAI SDK, command-line tools such as curl.

Let’s start with OpenAI SDK for Python. Install OpenAI SDK.

pip install -U openai

Set the environment variables for authentication.

export OPENAI_BASE_URL="https://bedrock-mantle.us-east-2.api.aws/openai/v1"
export OPENAI_API_KEY="<BEDROCK_API_KEY>"
export BEDROCK_OPENAI_MODEL_ID="openai.gpt-5.5"

Here is a sample Python code to call GPT-5.5 model on Bedrock:

import os
from openai import OpenAI
 
client = OpenAI(
    base_url=os.environ["OPENAI_BASE_URL"],
    api_key=os.environ["OPENAI_API_KEY"],
)
 
response = client.responses.create(
    model=os.environ["BEDROCK_OPENAI_MODEL_ID"],
    input=[
        {
            "role": "developer",
            "content": "You are a software engineer with excellent AWS cloud knowledge. Be concise and practical.",
        },
        {
            "role": "user",
            "content": "Design a distributed architecture on AWS in Python that should support 100k requests per second across multiple geographic regions.",
        },
    ],
    reasoning={"effort": "medium"},
    text={"verbosity": "low"},
)
 
print(response.output_text)

You can call directly the model endpoint using curl.

curl "$OPENAI_BASE_URL/responses" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "openai.gpt-5.5",
    "input": [
      {
        "role": "developer",
        "content": "You are a software engineer with excellent AWS cloud knowledge."
      },
      {
        "role": "user",
        "content": "Design a distributed architecture on AWS in Python that should support 100k requests per second across multiple geographic regions."
      }
    ],
    "reasoning": {"effort": "medium"},
    "text": {"verbosity": "low"}
  }'

You can use the Responses API when you want to use model-managed multi-turn state, need hosted tools, function tools, or richer tool orchestration, and run background or long-running work. To learn more, visit the OpenAI Cookbook Responses examples.

Using OpenAI Codex with GPT-5.5 on Amazon Bedrock
You can download Codex CLI, Codex App or Codex VS Code extension and get started with the Bedrock for model inference. Codex supports two Bedrock authentication pathways: Amazon Bedrock API key or AWS SDK credential chain. If you set AWS_BEARER_TOKEN_BEDROCK, Codex uses it first; otherwise Codex falls back to AWS SDK credential chain.

Set AWS_BEARER_TOKEN_BEDROCK in the environment that Codex will read:

export AWS_BEARER_TOKEN_BEDROCK=<your-bedrock-api-key>

Then, configure your preferred Region and set the model ID to openai.gpt-5.5 in ~/.codex/config.toml, which is required for Bedrock API-key authentication. You can also choose openai.gpt-5.4, openai.gpt-oss-120b, or openai.gpt-oss-20b. For the desktop app or VS Code extension, put any environment variables the app needs in ~/.codex/.env.

model = "openai.gpt-5.5"
model_provider = "amazon-bedrock"
[model_providers.amazon-bedrock.aws]
region = "us-east-2"

Restart the desktop app or VS Code extension after changing ~/.codex/config.toml or ~/.codex/.env. In Codex CLI, you should see a /status tab that looks like this:

In Codex App, you can use GPT-5.5 model through Amazon Bedrock inference.

Things to know
Let me share some important technical details that I think you’ll find useful.

  • Model latency: OpenAI model information positions GPT-5.5 as fast and GPT-5.4 as medium speed, but customer-perceived latency depends on reasoning effort, output length, tool calls, background mode, Region, quotas, throttling, prompt size, and cache hits. Start GPT-5.5 at medium effort. Start GPT-5.4 with effort set explicitly rather than relying on its none default.
  • Scaling and capacity: Bedrock’s new inference engine is designed to rapidly provision and serve capacity across many different models. When accepting requests, we prioritize keeping steady state workloads running, and ramp usage and capacity rapidly in response to changes in demand. During periods of high demand, requests are queued, rather than rejected.

Now available
OpenAI GPT models and Codex on Amazon Bedrock are available today: GPT-5.5 model in the US East (Ohio) Region, GPT-5.4 model in the US East (Ohio) and US West (Oregon) Regions. Check the full list of Regions for future updates. To learn more, visit the OpenAI on Amazon Bedrock page and the Amazon Bedrock pricing page.

Give GPT-5.5, GPT-5.4 models, and Codex on Amazon Bedrock a try today and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

Channy



from AWS News Blog https://ift.tt/tO38mBx
via IFTTT

AWS Weekly Roundup: Claude Opus 4.8 on AWS, Aurora MySQL with Kiro Powers, and more (June 1, 2026)

In my last Week in Review post, I shared what I’d been hearing from customers in the AI-Driven Development Lifecycle (AI-DLC) workshops I’ve been delivering. Last week I was back at it, this time in Denver for a two-day AI-DLC workshop, where I helped facilitate 17 teams to deliver nearly 20 separate use cases in just two days. The pace of acceleration that AI-DLC unlocks—especially when paired with tools like Claude Code on Amazon Bedrock—is fundamentally changing how businesses operate. Traditional roles within software development teams are collapsing into smaller, AI-augmented squads, and the paradigm shift is beginning to take place right in front of us. To learn more about how to utilize various AI tools, visit the GitHub repository of AI-DLC workflow.

This shift is also reshaping how AWS account teams (solutions architects, customer solutions managers, and technical account managers) collaborate with customers. It’s becoming less about handing off advisory design documents and more about building alongside them in real time. It’s a genuinely exciting moment to be in the middle of the change, and this week’s headline launch — Anthropic’s most capable model yet, now on AWS — is going to push that pace even further.

Now, let’s get into this week’s AWS news…

Headlines
Claude Opus 4.8 on AWS — Anthropic’s most capable generally available model is now accessible through both Amazon Bedrock and the Claude Platform on AWS. Opus 4.8 is built for agentic coding, knowledge work, and extended autonomous task execution — it sustains longer autonomous sessions with deeper reasoning, recovers from errors, and synthesizes information across lengthy documents. For coding workloads, it reads codebases like an engineer, plans before it edits, and holds context across long sessions. On Amazon Bedrock, you get AWS-managed features like Guardrails, Knowledge Bases, and data residency; on the Claude Platform on AWS, you get Anthropic’s native APIs unified with AWS billing. To learn more, visit the deep-dive blog post.

Last week’s launches
Here are some launches and updates from this past week that caught my attention:

  • Introducing the next generation of AWS Resilience Hub — A reimagined Resilience Hub gives SREs and developers a unified framework to define resilience standards, evaluate applications against them, and demonstrate compliance across an entire portfolio. It introduces modular resilience policies (covering service-level objectives (SLOs), multi-AZ/Region DR, and data recovery), business-oriented application modeling, generative AI-powered assessments aligned with the Well-Architected and Resilience Analysis Frameworks, and automatic dependency discovery via DNS query log analysis. Integration with AWS Organizations enables organization-wide resilience management from a single delegated administrator account.
  • Introducing the next generation of Amazon OpenSearch Serverless for building agentic AI applications — Amazon OpenSearch Serverless is now a fully managed search and vector engine purpose-built for agentic AI applications. It scales from zero to thousands of requests per second—roughly 20x faster than the prior generation—delivers up to 60% cost savings versus peak-provisioned clusters, and adds GPU acceleration plus new SEARCH and VECTORSEARCH collection types. Native integrations with Vercel, Kiro, Claude Code, and Cursor through OpenSearch Agent Skills make it straightforward to plug into your agent stack.
  • New assessment capabilities in AWS Transform — AWS Transform expands with new tools to help you build migration business cases and evaluate TCO before moving workloads to AWS. You can ingest data from RVTools exports, CMDB data, the AWS Transform discovery tool, and third-party discovery tools, then run what-if scenarios across region, utilization, and service mapping for EC2, FSx, S3, SQL Server on EC2, and virtual desktops. The release also adds Agentic Readiness Analysis (ARA) and Modernization Analysis (MODA), which scan code repositories in 5 to 30 minutes per repo to surface severity-tagged findings with file-level evidence and AWS-mapped remediation guidance.
  • Amazon Aurora MySQL with Kiro Powers — Aurora MySQL now integrates with Kiro Powers, drawing from a curated repository of pre-packaged MCP servers, steering files, and hooks validated by Kiro partners. Developers can execute both data plane tasks (queries, schema management) and control plane tasks (cluster management) in natural language, with dynamic guidance for Aurora MySQL Serverless scaling, RDS-to-Aurora migration, and replication setup. The companion Database Blog post explains how the agent produces the API calls, SQL, and configuration for you to review and run — available via one-click install from the Kiro IDE or webpage.
  • Amazon WorkSpaces Applications now supports Windows Desktop OS — You can now bring your own Windows Desktop licenses to Amazon WorkSpaces Applications and stream full Windows desktops and applications from AWS-hosted dedicated hardware. BYOL eliminates OS fees (you pay only for compute and streaming infrastructure), supports eligible Microsoft 365 Apps for enterprise, and gives users a matching experience between local and remote environments — same workflows, shortcuts, and navigation in both.

For a full list of AWS announcements, be sure to keep an eye on the What’s New with AWS page.

Other AWS news
Here are some additional posts and resources that you might find interesting:

For a full list of AWS blog posts, be sure to keep an eye on the AWS Blogs page.

Learn more about AWS, browse and join upcoming AWS-led in-person and virtual events, startup events, and developer-focused events as well as AWS Summits and AWS Community Days. Join the AWS Builder Center to connect with builders, share solutions, and access content that supports your development.

That’s all for this week. Check back next Monday for another Weekly Roundup!

-Micah



from AWS News Blog https://ift.tt/u2blE91
via IFTTT

Thursday, May 28, 2026

Introducing the next generation of AWS Resilience Hub for generative AI-based SRE resilience journey

Today, we’re announcing the next generation of AWS Resilience Hub with a significantly expanded experience that brings together a new application model, dependency discovery assessment, generative AI-powered failure mode analysis, modular resilience policies, and organization-wide reporting.

Organizations running hundreds of applications share a common challenge: availability is a top concern, yet there is no consistent way to set resilience goals, measure progress, or prove compliance across a portfolio. Teams set different standards, use different tools, and struggle to exchange information about whether applications actually meet expectations.

The next generation of AWS Resilience Hub changes this by giving Site Reliability Engineers (SREs) and development teams a structured way to align on resilience policy expectations, help application teams achieve them, and demonstrate compliance through testing. With integration into AWS Organizations, teams can now evaluate resilience at scale, identify failure modes, discover hidden dependencies, and report on progress across the enterprise.

The next generation of Resilience Hub walks you through your resilience journey and to help you there are the following concepts built into it.

  • Resilience policy: You can define your resilience expectations through modular, composable requirements. Rather than choosing a single rigid policy type, you construct policies by selecting the requirements that matter to your application, such as service level objective (SLO), multi-AZ and multi-Region disaster recovery, and data recovery requirements.
  • Business-level understanding: You can use new application modeling through critical end-user paths that map directly to business outcomes. Systems represent a business application, user journeys describe critical business paths, and services are the deployable units comprising AWS resources, code, and observability. Resilience Hub automatically discovers and maps them into a topology showing how resources connect.
  • AI failure mode assessments: You can run generative AI-powered assessments that analyze your services against your defined resilience policies, AWS Well-Architected best practices, and the AWS Resilience Analysis Framework. These assessments identify potential failure modes and provide actionable recommendations.
  • Dependency discovery assessment: You can automatically discover AWS services, internal endpoints, and third-party endpoints that your services depend on. This dependency assessment uses DNS query log analysis to identify dependencies you may not know about—including unexpected cross-region calls or critical third-party dependencies.

The next generation of AWS Resilience Hub in action
To get started, you configure a resilience policy, set up your first system and service, run a failure mode assessment, review the results, and implement the findings.

Before you begin, you should set up the invoker IAM role, which grants Resilience Hub read-only access to your AWS resources, cross-account roles (if not using AWS Organizations), or service-linked roles (SLRs) with AWS Organizations. Resilience Hub also integrates with AWS Organizations to enable organization-wide resilience management from a single delegated administrator account. This eliminates the need to log in to individual accounts to assess resilience posture across your enterprise. To learn more, visit For prerequisite details in the AWS Resilience Hub User Guide.

To configure a resilience policy, choose Create policy in the Policies menu through the AWS Resilience Hub console. Enter a policy name, description, and choose resilience requirements. For example, you can create a reusable policy for multi-Region disaster recovery used in financial applications—including 99.95% availability SLO, 15-minutes RTO, 5-minutes RPO for multi-Region disaster recovery, and disaster recovery approach that aligns with your RTO and RPO requirements.

If you choose data recovery requirements, you can define the data recovery time objective for restoring from backups for each service associated with this policy.

To create your first system representing your business application, choose Create a system in the Systems menu. Optionally, you can enable AWS Organizations account access for this system.

Now you can create a service that represents a deployable unit, like one of your microservices, and associate it with your system, and tell Resilience Hub where to find your resources. Enter a service name, for example, stock-exchange-service, choose your resilience policy and invoker AWS IAM role name. You can choose service Regions, service resources such as your resource tags, AWS CloudFormation stack, Terraform state file location, or Amazon EKS cluster and namespace.

When you enable dependency discovery for this service, AWS examines your VPC query logs for the VPCs associated with the resources in your service. You can disable this feature anytime from the dependency discovery settings in the service details page.

Now, you can run your first assessment with the service creation complete and a policy applied. Choose Run failure mode assessment in your service page and wait for the assessment to complete.

During the assessment, Resilience Hub assumes your invoker role, reads resources from your configured input sources, identifies parent-child relationships, queries the application topology service to map connections between resources, and builds a topology showing data flow, containment, and permissions.

By choosing Service topology, you can see service resources grouped by service functions in the graph, table, or JSON format.

By choosing Failure mode guidance, you can add assertions used to guide the agents while performing the failure mode assessment. Assertions are either generated by the agent or added by users. You can update them to improve assessment accuracy.

Once the assessment is complete, you can review findings and recommendations in the Assessment tab of your service page. Each finding tells you what the failure mode is, why it matters for your architecture, how to fix it, and which policy requirement it relates to.

You can choose Mark as resolved to implement the recommendation or Mark as irrelevant if the finding doesn’t apply to your use case.

If you’re an existing Resilience Hub customer, Resilience Hub provides migration APIs to simplify the transition of your previous applications. These APIs convert your previous assessment policies to new resilience policies, map your previous applications to the new model, such as multiple related applications to one system with multiple services.

For more information about new features, visit the AWS Resilience Hub User Guide.

Now available
The next generation of AWS Resilience Hub is now generally available in AWS commercial Regions where Resilience Hub is available. For Regional availability and the future roadmap, visit the AWS Capabilities by Region.

Resilience Hub uses a new service-based pricing model. Pricing includes two failure mode assessments per month for services, and optionally automated dependency assessment. You can try AWS Resilience Hub free. For pricing details, visit the AWS Resilience Hub pricing page.

Give the new AWS Resilience Hub a try in the Resilience Hub console and send feedback to AWS re:Post for Resilience Hub or through your usual AWS Support contacts.

Channy



from AWS News Blog https://ift.tt/43ysDrI
via IFTTT

Introducing the next generation of Amazon OpenSearch Serverless for building your agentic AI applications

Today, we’re announcing the next generation of Amazon OpenSearch Serverless, a fully managed search and vector engine designed for customers building AI agents. The next generation of OpenSearch Serverless scales from zero to thousands of requests per second and back to zero when idle, offering up to 60% cost savings compared to the cost of OpenSearch Service clusters provisioned for peak capacity.

The next generation of OpenSearch Serverless creates resources in seconds and scales capacity up to 20 times faster than the previous generation. With instant resource creation and native integrations with AI development platforms like Vercel and Kiro, you can deploy production-ready search and vector backends for your AI agents in minutes without managing infrastructure.

The next generation of OpenSearch Serverless in action
To get started with the next generation of OpenSearch Serverless, choose Create collection in the Serverless menu in the Amazon OpenSearch Service console.

Create NextGen collection with instant auto scaling and scale-to-zero for cost optimization. At launch, we support full-text search and vector search only for the collection type. If you want to use the existing OpenSearch Serverless infrastructure, choose Switch to Classic.

Choose Express create, the fastest way to create collection. No configuration is required—the default settings and matching security policies are applied automatically. Some configuration options can be changed later.

When you choose Create collection, OpenSearch Serverless will provision resources in seconds.

You can also create a collection of OpenSearch Serverless with AWS Command Line Interface (AWS CLI) or AWS SDKs. Here is a sample CLI command to create a collection group.

aws opensearchserverless create-collection-group \
    --name channy-nextgen-group \
    --standby-replicas ENABLED \
    --generation NEXTGEN \
    --description "My NextGen collection group" \
    --capacity-limits '{
        "maxIndexingCapacityInOCU": 10,
        "maxSearchCapacityInOCU": 10,
        "minIndexingCapacityInOCU": 0,
        "minSearchCapacityInOCU": 0
    }' \
    --region "us-east-1"

Now, you can create a collection that inherits the generation from its parent collection group. Supported collection types: SEARCH and VECTORSEARCH.

aws opensearchserverless create-collection \
    --name channy-nextgen-collection \
    --type SEARCH \
    --collection-group-name channy-nextgen-group \
    --standby-replicas ENABLED \
    --description "My collection in NextGen group" \
    --region "us-east-1"

To learn more about managing the next generation of OpenSearch Serverless, visit the Amazon OpenSearch Serverless documentation.

Building your agents faster with OpenSearch Serverless
To support building production-ready agent applications in Vercel, you can now create a new OpenSearch collection or connect your existing OpenSearch Serverless collection within the Vercel console. Create a search backend in seconds and add features on-demand as your application grows. To learn more, visit AWS for Vercel.

You can go from idea to working prototype in minutes using Claude Code, Cursor, and Kiro. OpenSearch Agent Skills provide a repository of skills that bring OpenSearch intelligence directly into your agent. Each skill encapsulates domain knowledge, best practices, and multi-step execution logic for a specific workflow–so your agent not only gets results, but understands how they were achieved. You can also use the OpenSearch Launchpad in Kiro Powers to accelerate search applications with guided, end-to-end architecture planning.

Now available
The next generation of Amazon OpenSearch Serverless is generally available today and is available in all AWS commercial Regions where Amazon OpenSearch Serverless is currently available.

The next generation of OpenSearch Serverless charges for the compute you use in OpenSearch Compute Units (OCUs) for indexing, search, and GPU acceleration. You are charged separately for storage in GB-month. For more information, see Amazon OpenSearch Service Pricing.

Give it a try and send feedback to the AWS re:Post for Amazon OpenSearch Service or through your usual AWS Support contacts.

Channy



from AWS News Blog https://ift.tt/ign2UFO
via IFTTT