Monday, August 11, 2025

AWS Weekly Roundup: OpenAI models, Automated Reasoning checks, Amazon EVS, and more (August 11, 2025)

AWS Summits in the northern hemisphere have mostly concluded but the fun and learning hasn’t yet stopped for those of us in other parts of the globe. The community, customers, partners, and colleagues enjoyed a day of learning and networking last week at the AWS Summit Mexico City and the AWS Summit Jakarta.

Last week’s launches
These are the launches from last week that caught my attention:

  • OpenAI open weight models on AWSOpenAI open weight models (gpt-oss-120b and gpt-oss-20b) are now available on AWS. These open weight models excel at coding, scientific analysis, and mathematical reasoning, with performance comparable to leading alternatives.
  • Amazon Elastic VMware Service — Amazon Elastic VMware Service (Amazon EVS), a new AWS service that lets you run VMware Cloud Foundation (VCF) environments directly within your Amazon Virtual Private Cloud (Amazon VPC), is now generally available.
  • Automated Reasoning checks — Automated Reasoning checks, a new Amazon Bedrock Guardrails policy that was previewed during AWS re:Invent, is now generally available. Automated Reasoning checks helps you validate the accuracy of content generated by foundation models (FMs) against a domain knowledge. Read more in Danilo’s post on how this can help prevent factual errors that can be caused by AI hallucinations.
  • Multi-Region application recovery service — In this post, Sébastien writes about the announcement of Amazon Application Recovery Controller (ARC) Region switch, a fully managed, highly available capability that enables organizations to plan, practice, and orchestrate Region switches with confidence, eliminating the uncertainty around cross-Region recovery operations.

Additional updates
I thought these projects, blog posts, and news items were also interesting:

Upcoming AWS events
Keep a look out and be sure to sign up for these upcoming events:

AWS re:Invent 2025 (December 1-5, 2025, Las Vegas) — AWS’s flagship annual conference offering collaborative innovation through peer-to-peer learning, expert-led discussions, and invaluable networking opportunities.

AWS Summits — Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Coming up soon are the summits at São Paulo (August 13) and Johannesburg (August 20).

AWS Community Days — Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Australia (August 15), Adria (September 5), Baltic (September 10), Aotearoa (September 18), and South Africa (September 20).

Join the AWS Builder Center to learn, build, and connect with builders in the AWS community. Browse here for upcoming in-person and virtual developer-focused events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

Veliswa.



from AWS News Blog https://ift.tt/W0Iw4Az
via IFTTT

Wednesday, August 6, 2025

Minimize AI hallucinations and deliver up to 99% verification accuracy with Automated Reasoning checks: Now available

Today, I’m happy to share that Automated Reasoning checks, a new Amazon Bedrock Guardrails policy that we previewed during AWS re:Invent, is now generally available. Automated Reasoning checks helps you validate the accuracy of content generated by foundation models (FMs) against a domain knowledge. This can help prevent factual errors due to AI hallucinations. The policy uses mathematical logic and formal verification techniques to validate accuracy, providing definitive rules and parameters against which AI responses are checked for accuracy.

This approach is fundamentally different from probabilistic reasoning methods which deal with uncertainty by assigning probabilities to outcomes. In fact, Automated Reasoning checks delivers up to 99% verification accuracy, providing provable assurance in detecting AI hallucinations while also assisting with ambiguity detection when the output of a model is open to more than one interpretation.

With general availability, you get the following new features:

  • Support for large documents in a single build, up to 80K tokens – Process extensive documentation; we found this can add up to 100 pages of content
  • Simplified policy validation – Save your validation tests and run them repeatedly, making it easier to maintain and verify your policies over time
  • Automated scenario generation – Create test scenarios automatically from your definitions, saving time and effort while helping make coverage more comprehensive
  • Enhanced policy feedback – Provide natural language suggestions for policy changes, simplifying the way you can improve your policies
  • Customizable validation settings – Adjust confidence score thresholds to match your specific needs, giving you more control over validation strictness

Let’s see how this works in practice.

Creating Automated Reasoning checks in Amazon Bedrock Guardrails
To use Automated Reasoning checks, you first encode rules from your knowledge domain into an Automated Reasoning policy, then use the policy to validate generated content. For this scenario, I’m going to create a mortgage approval policy to safeguard an AI assistant evaluating who can qualify for a mortgage. It is important that the predictions of the AI system do not deviate from the rules and guidelines established for mortgage approval. These rules and guidelines are captured in a policy document written in natural language.

In the Amazon Bedrock console, I choose Automated Reasoning from the navigation pane to create a policy.

I enter name and description of the policy and upload the PDF of the policy document. The name and description are just metadata and do not contribute in building the Automated Reasoning policy. I describe the source content to add context on how it should be translated into formal logic. For example, I explain how I plan to use the policy in my application, including sample Q&A from the AI assistant.

Consoel screenshot.

When the policy is ready, I land on the overview page, showing the policy details and a summary of the tests and definitions. I choose Definitions from the dropdown to examine the Automated Reasoning policy, made of rules, variables, and types that have been created to translate the natural language policy into formal logic.

The Rules describe how variables in the policy are related and are used when evaluating the generated content. For example, in this case, which are the thresholds to apply and how some of the decisions are taken. For traceability, each rule has its own unique ID.

Console screenshot.

The Variables represent the main concepts at play in the original natural language documents. Each variable is involved in one or more rules. Variables allow complex structures to be easier to understand. For this scenario, some of the rules need to look at the down payment or at the credit score.

Console screenshot.

Custom Types are created for variables that are neither boolean nor numeric. For example, for variables that can only assume a limited number of values. In this case, there are two type of mortgage described in the policy, insured and conventional.

Console screenshot.

Now we can assess the quality of the initial Automated Reasoning policy through testing. I choose Tests from the dropdown. Here I can manually enter a test, consisting of input (optional) and output, such as a question and its possible answer from the interaction of a customer with the AI assistant. I then set the expected result from the Automated Reasoning check. The expected result can be valid (the answer is correct), invalid (the answer is not correct), or satisfiable (the answer could be true or false depending on specific assumptions). I can also assign a confidence threshold for the translation of the query/content pair from natural language to logic.

Before I enter tests manually, I use the option to automatically generate a scenario from the definitions. This is the easiest way to validate a policy and (unless you’re an expert in logic) should be the first step after the creation of the policy.

For each generated scenario, I provide an expected validation to say if it is something that can happen (satisfiable) or not (invalid). If not, I can add an annotation that can then be used to update the definitions. For a more advanced understanding of the generated scenario, I can show the formal logic representation of a test using SMT-LIB syntax.

Console screenshot.

After using the generate scenario option, I enter a few tests manually. For these tests, I set different expected results: some are valid, because they follow the policy, some are invalid, because they flout the policy, and some are satisfiable, because their result depends on specific assumptions.

Console screenshot.

Then, I choose Validate all tests to see the results. All tests passed in this case. Now, when I update the policy, I can use these tests to validate that the changes didn’t introduce errors.

Console screenshot.

For each test, I can look at the findings. If a test doesn’t pass, I can look at the rules that created the contradiction that made the test fail and go against the expected result. Using this information, I can understand if I should add an annotation, to improve the policy, or correct the test.

Console screenshot.

Now that I’m satisfied with the tests, I can create a new Amazon Bedrock guardrail (or update an existing one) to use up to two Automated Reasoning policies to check the validity of the responses of the AI assistant. All six policies offered by Guardrails are modular, and can be used together or separately. For example, Automated Reasoning checks can be used with other safeguards such as content filtering and contextual grounding checks. The guardrail can be applied to models served by Amazon Bedrock or with any third-party model (such as OpenAI and Google Gemini) via the ApplyGuardrail API. I can also use the guardrail with an agent framework such as Strands Agents, including agents deployed using Amazon Bedrock AgentCore.

Console screenshot.

Now that we saw how to set up a policy, let’s look at how Automated Reasoning checks are used in practice.

Customer case study – Utility outage management systems
When the lights go out, every minute counts. That’s why utility companies are turning to AI solutions to improve their outage management systems. We collaborated on a solution in this space together with PwC. Using Automated Reasoning checks, utilities can streamline operations through:

  • Automated protocol generation – Creates standardized procedures that meet regulatory requirements
  • Real-time plan validation – Ensures response plans comply with established policies
  • Structured workflow creation – Develops severity-based workflows with defined response targets

At its core, this solution combines intelligent policy management with optimized response protocols. Automated Reasoning checks are used to assess AI-generated responses. When a response is found to be invalid or satisfiable, the result of the Automated Reasoning check is used to rewrite or enhance the answer.

This approach demonstrates how AI can transform traditional utility operations, making them more efficient, reliable, and responsive to customer needs. By combining mathematical precision with practical requirements, this solution sets a new standard for outage management in the utility sector. The result is faster response times, improved accuracy, and better outcomes for both utilities and their customers.

In the words of Matt Wood, PwC’s Global and US Commercial Technology and Innovation Officer:

“At PwC, we’re helping clients move from AI pilot to production with confidence—especially in highly regulated industries where the cost of a misstep is measured in more than dollars. Our collaboration with AWS on Automated Reasoning checks is a breakthrough in responsible AI: mathematically assessed safeguards, now embedded directly into Amazon Bedrock Guardrails. We’re proud to be AWS’s launch collaborator, bringing this innovation to life across sectors like pharma, utilities, and cloud compliance—where trust isn’t a feature, it’s a requirement.”

Things to know
Automated Reasoning checks in Amazon Bedrock Guardrails is generally available today in the following AWS Regions: US East (Ohio, N. Virginia), US West (Oregon), and Europe (Frankfurt, Ireland, Paris).

With Automated Reasoning checks, you pay based on the amount of text processed. For more information, see Amazon Bedrock pricing.

To learn more, and build secure and safe AI applications, see the technical documentation and the GitHub code samples. Follow this link for direct access to the Amazon Bedrock console.

The videos in this playlist include an introduction to Automated Reasoning checks, a deep dive presentation, and hands-on tutorials to create, test, and refine a policy. This is the second video in the playlist, where my colleague Wale provides a nice intro to the capability.

Danilo



from AWS News Blog https://ift.tt/xboBI1y
via IFTTT

Tuesday, August 5, 2025

OpenAI open weight models now available on AWS

AWS is committed to bringing you the most advanced foundation models (FMs) in the industry, continuously expanding our selection to include groundbreaking models from leading AI innovators so that you always have access to the latest advancements to drive your business forward.

Today, I am happy to announce the availability of two new OpenAI models with open weights in Amazon Bedrock and Amazon SageMaker JumpStart. OpenAI gpt-oss-120b and gpt-oss-20b models are designed for text generation and reasoning tasks, offering developers and organizations new options to build AI applications with complete control over their infrastructure and data.

These open weight models excel at coding, scientific analysis, and mathematical reasoning, with performance comparable to leading alternatives. Both models support a 128K context window and provide adjustable reasoning levels (low/medium/high) to match your specific use case requirements. The models support external tools to enhance their capabilities and can be used in an agentic workflow, for example, using a framework like Strands Agents.

With Amazon Bedrock and Amazon SageMaker JumpStart, AWS gives you the freedom to innovate with access to hundreds of FMs from leading AI companies, including OpenAI open weight models. With our comprehensive selection of models, you can match your AI workloads to the perfect model every time.

Through Amazon Bedrock, you can seamlessly experiment with different models, mix and match capabilities, and switch between providers without rewriting code—turning model choice into a strategic advantage that helps you continuously evolve your AI strategy as new innovations emerge. At launch, these new models are available in Bedrock via an OpenAI compatible endpoint. You can point the OpenAI SDK to this endpoint or use the Bedrock InvokeModel and Converse API.

With SageMaker JumpStart, you can quickly evaluate, compare, and customize models for your use case. You can then deploy the original or the customized model in production with the SageMaker AI console or using the SageMaker Python SDK.

Let’s see how these work in practice.

Getting started with OpenAI open weight models in Amazon Bedrock
In the Amazon Bedrock console, I choose Model access from the Configure and learn section of the navigation pane. Then, I navigate to the two listed OpenAI models on this page and request access.

Console screenshot

Now that I have access, I use the Chat/Test playground to test and evaluate the models. I select OpenAI as the category and then the gpt-oss-120b model.

Console screenshot

Using this model, I run the following sample prompt:

A family has $5,000 to save for their vacation next year. They can place the money in a savings account earning 2% interest annually or in a certificate of deposit earning 4% interest annually but with no access to the funds until the vacation. If they need $1,000 for emergency expenses during the year, how should they divide their money between the two options to maximize their vacation fund?

This prompt generates an output that includes the chain of thought used to produce the result.

I can use these models with the OpenAI SDK by configuring the API endpoint (base URL) and using an Amazon Bedrock API key for authentication. For example, I set this environment variables to use the US West (Oregon) AWS Region endpoint (us-west-2) and my Amazon Bedrock API key:

export OPENAI_API_KEY="<my-bedrock-api-key>"
export OPENAI_BASE_URL="https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1"

Now I invoke the model using the OpenAI Python SDK.

client = OpenAI()

response = client.chat.completion.create(
    messages=[{
        "role": "user",
        "content": "Hello, how are you?"
    }],
    model="openai.gpt-oss-120b-1:0",
    stream=True
)

for item in response:
    print(item)

To build an AI agent, I can choose any framework that supports the Amazon Bedrock API or the OpenAI API. For example, here’s the starting code for Strands Agents using the Amazon Bedrock API:

from strands import Agent
from strands.models import BedrockModel
from strands_tools import calculator

model = BedrockModel(
    model_id="openai.gpt-oss-120b-1:0"
)
agent = Agent(
    model=model,
    tools=[calculator]
)

agent("Tell me the square root of 42 ^ 3")

I save the code (app.py file), install the dependencies, and run the agent locally:

pip install strands-agents strands-agents-tools
python app.py

When I am satisfied with the agent, I can deploy in production using the capabilities offered by Amazon Bedrock AgentCore, including a fully managed serverless runtime and memory and identity management.

Getting started with OpenAI open weight models in Amazon SageMaker JumpStart
In the Amazon SageMaker AI console, you can use OpenAI open weight models in the SageMaker Studio. The first time I do this, I need to set up a SageMaker domain. There are options to set it up for a single user (simpler) or an organization. For these tests, I use a single user setup.

In the SageMaker JumpStart model view, I have access to a detailed description of the gpt-oss-120b or gpt-oss-20b model.

I choose the gpt-oss-20b model and then deploy the model. In the next steps, I select the instance type and the initial instance count. After a few minutes, the deployment creates an endpoint that I can then invoke in SageMaker Studio and using any AWS SDKs.

To learn more, visit GPT OSS models from OpenAI are now available on SageMaker JumpStart in the AWS Artificial Intelligence Blog.

Things to know
The new OpenAI open weight models are now available in Amazon Bedrock in the US West (Oregon) AWS Region, while Amazon SageMaker JumpStart supports these models in US East (Ohio, N. Virginia) and Asia Pacific (Mumbai, Tokyo).

Each model comes equipped with full chain-of-thought output capabilities, providing you with detailed visibility into the model’s reasoning process. This transparency is particularly valuable for applications requiring high levels of interpretability and validation. These models give you the freedom to modify, adapt, and customize them to your specific needs. This flexibility allows you to fine-tune the models for your unique use cases, integrate them into your existing workflows, and even build upon them to create new, specialized models tailored to your industry or application.

Security and safety are built into the core of these models, with comprehensive evaluation processes and safety measures in place. The models maintain compatibility with the standard GPT-4 tokenizer.

Both models can be used in your preferred environment, whether that’s through the serverless experience of Amazon Bedrock or the extensive machine learning (ML) development capabilities of SageMaker JumpStart. For information about the costs associated with using these models and services, visit the Amazon Bedrock pricing and Amazon SageMaker AI pricing pages.

To learn more, see the parameters for the models and the chat completions API in the Amazon Bedrock documentation.

Get started today with OpenAI open weight models on AWS in the Amazon Bedrock console or in Amazon SageMaker AI console.

Danilo



from AWS News Blog https://ift.tt/E5qPUQ7
via IFTTT

Introducing Amazon Elastic VMware Service for running VMware Cloud Foundation on AWS

Today, we’re announcing the general availability of Amazon Elastic VMware Service (Amazon EVS), a new AWS service that lets you run VMware Cloud Foundation (VCF) environments directly within your Amazon Virtual Private Cloud (Amazon VPC). With Amazon EVS, you can deploy fully functional VCF environments in just hours using a guided workflow, while running your VMware workloads on qualified Amazon Elastic Compute Cloud (Amazon EC2) bare metal instances and seamlessly integrating with AWS services such as Amazon FSx for NetApp ONTAP.

Many organizations running VMware workloads on premises want to move to the cloud to benefit from improved scalability, reliability, and access to cloud services, but migrating these workloads often requires substantial changes to applications and infrastructure configurations. Amazon EVS lets customers continue using their existing VMware expertise and tools without having to re-architect applications or change established practices, thereby simplifying the migration process while providing access to AWS’s scale, reliability, and broad set of services.

With Amazon EVS, you can run VMware workloads directly in your Amazon VPC. This gives you full control over your environments while being on AWS infrastructure. You can extend your on-premises networks and migrate workloads without changing IP addresses or operational runbooks, reducing complexity and risk.

Key capabilities and features

Amazon EVS delivers a comprehensive set of capabilities designed to streamline your VMware workload migration and management experience. The service enables seamless workload migration without the need for replatforming or changing hypervisors, which means you can maintain your existing infrastructure investments while moving to AWS. Through an intuitive, guided workflow on the AWS Management Console, you can efficiently provision and configure your EVS environments, significantly reducing the complexity to migrate your workloads to AWS.

With Amazon EVS, you can deploy a fully functional VCF environment running on AWS in a few hours. This process eliminates many of the manual steps and potential configuration errors that often occur during traditional deployments. Furthermore, with Amazon EVS you can optimize your virtualization stack on AWS. Given the VCF environment runs inside your VPC, you have full full administrative access to the environment and the associated management appliances. You also have the ability to integrate third-party solutions, from external storage such as Amazon FSx for NetApp ONTAP or Pure Cloud Block Store or backup solutions such as Veeam Backup and Replication.

The service also gives you the ability to self-manage or work with AWS Partners to build, manage, and operate your environments. This provides you with flexibility to match your approach with your overall goals.

Setting up a new VCF environment

Organizations can streamline their setup process by ensuring they have all the necessary pre-requisites in place ahead of creating a new VCF environment. These prerequisites include having an active AWS account, configuring the appropriate AWS Identity and Access Management (IAM) permissions, and setting up a Amazon VPC with sufficient CIDR space and two Route Server endpoints, with each endpoint having its own peer. Additionally, customers will need to have their VMware Cloud Foundation license keys ready, secure Amazon EC2 capacity reservations specifically for i4i.metal instances, and prepare their VLAN subnet information planning.

To help ensure a smooth deployment process, we’ve provided a Getting started hub, which you can access from the EVS homepage as well as a comprehensive guide in our documentation. By following these preparation steps, you can avoid potential setup delays and ensure a successful environment creation.

Screenshots of EVS onboarding

Let’s walk through the process of setting up a new VCF environment using Amazon EVS.

Screenshots of EVS onboarding

You will need to provide your Site ID, which is allocated by Broadcom when purchasing VCF licenses, along with your license keys. To ensure a successful initial deployment, you should verify you have sufficient licensing coverage for a minimum of 256 cores. This translates to at least four i4i.metal instances, with each instance providing 64 physical cores.

This licensing requirement helps you maintain optimal performance and ensures your environment meets the necessary infrastructure specifications. By confirming these requirements upfront, you can avoid potential deployment delays and ensure a smooth setup process.

Screenshots of EVS onboarding

Once you have provided all the required details, you will be prompted to specify your host details. These are the underlying Amazon EC2 instances that your VCF environment will get deployed in.

Screenshots of EVS onboarding

Once you have filled out details for each of your host instances, you will need to configure your networking and management appliance DNS details. For further information on how to create a new VCF environment on Amazon EVS, follow the documentation here.

Screenshots of EVS onboarding

After you have created your VCF environment, you will be able to look over all of the host and configuration details through the AWS Console.

Additional things to know

Amazon EVS currently supports VCF version 5.2.1 and runs on i4i.metal instances. Future releases will expand VCF versions, licensing options, and more instance type support to provide even more flexibility for your deployments.

Amazon EVS provides flexible storage options. Your Amazon EVS local Instance storage is powered by VMware’s vSAN solution, which pools local disks across multiple ESXi hosts into a single distributed datastore. To scale your storage, you can leverage external Network File System (NFS) or iSCSI-based storage solutions. For example, Amazon FSx for NetApp ONTAP is particularly well-suited for use as an NFS datastore or shared block storage over iSCSI.

Additionally, Amazon EVS makes connecting your on-premises environments to AWS simple. You can connect from on-premises vSphere environment into Amazon EVS using a Direct Connect connection or a VPN that terminates into a transit gateway. Amazon EVS also manages the underlying connectivity from your VLAN subnets into your VMs.

AWS provides comprehensive support for all AWS services deployed by Amazon EVS, handling direct customer support while engaging with Broadcom for advanced support needs. Customers must maintain AWS Business Support on accounts running the service.

Availability and pricing

Amazon EVS is now generally available in US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Frankfurt), Europe (Ireland), and Asia Pacific (Tokyo) AWS Regions, with additional Regions coming soon. Pricing is based on the Amazon EC2 instances and AWS resources you use, with no minimum fees or upfront commitments.

To learn more, visit the Amazon EVS product page.



from AWS News Blog https://ift.tt/PX3DKLa
via IFTTT

Monday, August 4, 2025

AWS Weekly Roundup: Amazon DocumentDB, AWS Lambda, Amazon EC2, and more (August 4, 2025)

This week brings an array of innovations spanning from generative AI capabilities to enhancements of foundational services. Whether you’re building AI-powered applications, managing databases, or optimizing your cloud infrastructure, these updates help build more advanced, robust, and flexible applications.

Last week’s launches
Here are the launches that got my attention this week:

Additional updates
Here are some additional projects, blog posts, and news items that I found interesting:

Upcoming AWS events
Check your calendars so that you can sign up for these upcoming events:

AWS re:Invent 2025 (December 1-5, 2025, Las Vegas) — AWS’s flagship annual conference offering collaborative innovation through peer-to-peer learning, expert-led discussions, and invaluable networking opportunities.

AWS Summits — Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: Mexico City (August 6) and Jakarta (August 7).

AWS Community Days — Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Australia (August 15), Adria (September 5), Baltic (September 10), and Aotearoa (September 18).

Join the AWS Builder Center to learn, build, and connect with builders in the AWS community. Browse here upcoming in-person and virtual developer-focused events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

Danilo



from AWS News Blog https://ift.tt/nzrCcw7
via IFTTT

Friday, August 1, 2025

Introducing Amazon Application Recovery Controller Region switch: A multi-Region application recovery service

As a developer advocate at AWS, I’ve worked with many enterprise organizations who operate critical applications across multiple AWS Regions. A key concern they often share is the lack of confidence in their Region failover strategy—whether it will work when needed, whether all dependencies have been identified, and whether their teams have practiced the procedures enough. Traditional approaches often leave them uncertain about their readiness for Regional switch.

Today, I’m excited to announce Amazon Application Recovery Controller (ARC) Region switch, a fully managed, highly available capability that enables organizations to plan, practice, and orchestrate Region switches with confidence, eliminating the uncertainty around cross-Region recovery operations. Region switch helps you orchestrate recovery for your multi-Region applications on AWS. It gives you a centralized solution to coordinate and automate recovery tasks across AWS services and accounts when you need to switch your application’s operations from one AWS Region to another.

Many customers deploy business-critical applications across multiple AWS Regions to meet their availability requirements. When an operational event impacts an application in one Region, switching operations to another Region involves coordinating multiple steps across different AWS services, such as compute, databases, and DNS. This coordination typically requires building and maintaining complex scripts that need regular testing and updates as applications evolve. Additionally, orchestrating and tracking the progress of Region switches across multiple applications and providing evidence of successful recovery for compliance purposes often involves manual data gathering.

Region switch is built on a Regional data plane architecture, where Region switch plans are executed from the Region being activated. This design eliminates dependencies on the impacted Region during the switch, providing a more resilient recovery process since the execution is independent of the Region you’re switching from.

Building a recovery plan with ARC Region switch
With ARC Region switch, you can create recovery plans that define the specific steps needed to switch your application between Regions. Each plan contains execution blocks that represent actions on AWS resources. At launch, Region switch supports nine types of execution blocks:

  • ARC Region switch plan execution block–let you orchestrate the order in which multiple applications switch to the Region you want to activate by referencing other Region switch plans.
  • Amazon EC2 Auto Scaling execution block–Scales Amazon EC2 compute resources in your target Region by matching a specified percentage of your source Region’s capacity.
  • ARC routing controls execution block–Changes routing control states to redirect traffic using DNS health checks.
  • Amazon Aurora global database execution block–Performs database failover with potential data loss or switchover with zero data loss for Aurora Global Database.
  • Manual approval execution block–Adds approval checkpoints in your recovery workflow where team members can review and approve before proceeding.
  • Custom Action AWS Lambda execution block–Adds custom recovery steps by executing Lambda functions in either the activating or deactivating Region.
  • Amazon Route 53 health check execution block–Let you to specify which Regions your application’s traffic will be redirected to during failover. When executing your Region switch plan, the Amazon Route 53 health check state is updated and traffic is redirected based on your DNS configuration.
  • Amazon Elastic Kubernetes Service (Amazon EKS) resource scaling execution block–Scales Kubernetes pods in your target Region during recovery by matching a specified percentage of your source Region’s capacity.
  • Amazon Elastic Container Service (Amazon ECS) resource scaling execution block–Scales ECS tasks in your target Region by matching a specified percentage of your source Region’s capacity.

Region switch continually validates your plans by checking resource configurations and AWS Identity and Access Management (IAM) permissions every 30 minutes. During execution, Region switch monitors the progress of each step and provides detailed logs. You can view execution status through the Region switch dashboard and at the bottom of the execution details page.

To help you balance cost and reliability, Region switch offers flexibility in how you prepare your standby resources. You can configure the desired percentage of compute capacity to target in your destination Region during recovery using Region switch scaling execution blocks. For critical applications expecting surge traffic during recovery, you might choose to scale beyond 100 percent capacity, and setting a lower percentage can help achieve faster overall execution times. However, it’s important to note that using one of the scaling execution blocks does not guarantee capacity, and actual resource availability depends on the capacity in the destination Region at the time of recovery. To facilitate the best possible outcomes, we recommend regularly testing your recovery plans and maintaining appropriate Service Quotas in your standby Regions.

ARC Region switch includes a global dashboard you can use to monitor the status of Region switch plans across your enterprise and Regions. Additionally, there’s a Regional executions dashboard that only displays executions within the current console Region. This dashboard is designed to be highly available across each Region so it can be used during operational events.

Region switch allows resources to be hosted in an account that is separate from the account that contains the Region switch plan. If the plan uses resources from an account that is different from the account that hosts the plan, then Region switch uses the executionRole to assume the crossAccountRole to access those resources. Additionally, Region switch plans can be centralized and shared across multiple accounts using AWS Resource Access Manager (AWS RAM), enabling efficient management of recovery plans across your organization.

Let’s see how it works
Let me show you how to create and execute a Region switch plan. There are three parts in this demo. First, I create a Region switch plan. Then, I define a workflow. Finally, I configure the triggers.

Step 1: Create a plan

I navigate to the Application Recovery Controller section of the AWS Management Console. I choose Region switch in the left navigation menu. Then, I choose Create Region switch plan.

ARC Region switch - 1

After I give a name to my plan, I specify a Multi-Region recovery approach (active/passive or active/active). In Active/Passive mode, two application replicas are deployed into two Regions, with traffic routed into the active Region only. The replica in the passive Region can be activated by executing the Region switch plan.

Then, I select the Primary Region and Standby Region. Optionally, I can enter a Desired recovery time objective (RTO). The service will use this value to provide insight into how long Region switch plan executions take in relation to my desired RTO.

ARC Region switch - create plan

I enter the Plan execution IAM role. This is the role that allows Region switch to call AWS services during execution. I make sure the role I choose has permissions to be invoked by the service and contains the minimum set of permissions allowing ARC to operate. Refer to the IAM permissions section of the documentation for the details.

ARC Region switch - create plan 2Step 2: Create a workflow

When the two Plan evaluation status notifications are green, I create a workflow. I choose Build workflows to get started.


ARC Region switch - status

Plans enable you to build specific workflows that will recover your applications using Region switch execution blocks. You can build workflows with execution blocks that run sequentially or in parallel to orchestrate the order in which multiple applications or resources recover into the activating Region. A plan is made up of these workflows that allow you to activate or deactivate a specific Region.

For this demo, I use the graphical editor to create the workflow. But you can also define the workflow in JSON. This format is better suited for automation or when you want to store your workflow definition in a source code management system (SCMS) and your infrastructure as code (IaC) tools, such as AWS CloudFormation.

ARC - define workflows

I can alternate between the Design and the Code views by selecting the corresponding tab next to the Workflow builder title. The JSON view is read-only. I designed the workflow with the graphical editor and I copied the JSON equivalent to store it alongside my IaC project files.

ARC - define workflows as code

Region switch launches an evaluation to validate your recovery strategy every 30 minutes. It regularly checks that all actions defined in your workflows will succeed when executed. This proactive validation assesses various elements, including IAM permissions and resource states across accounts and Regions. By continually monitoring these dependencies, Region switch helps ensure your recovery plans remain viable and identifies potential issues before they impact your actual switch operations.

However, just as an untested backup is not a reliable backup, an untested recovery plan cannot be considered truly validated. While continuous evaluation provides a strong foundation, we strongly recommend regularly executing your plans in test scenarios to verify their effectiveness, understand actual recovery times, and ensure your teams are familiar with the recovery procedures. This hands-on testing is essential for maintaining confidence in your disaster recovery strategy.

Step 3: Create a trigger

A trigger defines the conditions to activate the workflows just created. It’s expressed as a set of CloudWatch alarms. Alarm-based triggers are optional. You can also use Region switch with manual triggers.

From the Region switch page in the console, I choose the Triggers tab and choose Add triggers.

ARC - Trigger

For each Region defined in my plan, I choose Add trigger to define the triggers that will activate the Region.ARC - Trigger 2Finally, I choose the alarms and their state (OK or Alarm) that Region switch will use to trigger the activation of the Region.

ARC - Trigger 3

I’m now ready to test the execution of the plan to switch Regions using Region switch. It’s important to execute the plan from the Region I’m activating (the target Region of the workflow) and use the data plane in that specific Region.

Here is how to execute a plan using the AWS Command Line Interface (AWS CLI):

aws arc-region-switch start-plan-execution \
--plan-arn arn:aws:arc-region-switch::111122223333:plan/resource-id \
--target-region us-west-2 \
--action activate

Pricing and availability
Region switch is available in all commercial AWS Regions at $70 per month per plan. Each plan can include up to 100 execution blocks, or you can create parent plans to orchestrate up to 25 child plans.

Having seen firsthand the engineering effort that goes into building and maintaining multi-Region recovery solutions, I’m thrilled to see how Region switch will help automate this process for our customers. To get started with ARC Region switch, visit the ARC console and create your first Region switch plan. For more information about Region switch, visit the Amazon Application Recovery Controller (ARC) documentation. You can also reach out to your AWS account team with questions about using Region switch for your multi-Region applications.

I look forward to hearing about how you use Region switch to strengthen your multi-Region applications’ resilience.

— seb

from AWS News Blog https://ift.tt/C0Owtya
via IFTTT