AI with Claude on AWS: From Code to Orchestration

1. Why Bedrock? The enterprise case for routing Claude through AWS

When you call claude from your terminal, requests go to Anthropic’s API by default. For personal or small-team use that is fine. For enterprise or regulated environments it often is not, for three reasons:

Data residency and security. Bedrock keeps inference traffic inside your AWS VPC. Data never leaves your account boundary, which satisfies most data-protection requirements without any custom proxy work.

IAM governance. Every Bedrock call is an IAM action. That means you get per-user CloudTrail audit logs, service-control policies, and cost allocation by team — the same controls you already apply to every other AWS service.

Unified billing. One AWS invoice. No separate Anthropic subscription to manage, no token quotas siloed from your cloud budget.

The trade-off is a few milliseconds of extra latency (typically 2–5 s for Sonnet on a simple query vs. the Anthropic API directly) and the overhead of managing AWS credentials. For most enterprise scenarios, that is an easy trade.

2. Switching Claude Code to Bedrock

Claude Code speaks to Bedrock through two environment variables. No code changes, no SDKs to swap.

export CLAUDE_CODE_USE_BEDROCK=1
export AWS_REGION=us-east-1

Or persist them in ~/.claude/settings.json so every session picks them up automatically:

{
  "awsAuthRefresh": "aws login",
  "env": {
    "AWS_REGION": "us-east-1",
    "CLAUDE_CODE_USE_BEDROCK": "1"
  }
}

awsAuthRefresh tells Claude Code to re-run aws login whenever credentials expire — useful with IAM Identity Center (SSO) sessions that rotate every few hours.

Authentication options

Method	When to use
`aws login` (AWS CLI v2.32+)	Local development, interactive
`aws sso login --profile <name>`	IAM Identity Center / federated identity
Instance profile / ECS task role	Lambda functions, ECS containers, EC2
`AWS_BEARER_TOKEN_BEDROCK` env var	Quick tests; no per-user CloudTrail audit

IAM permissions required

Your IAM principal needs at minimum:

{
  "Effect": "Allow",
  "Action": [
    "bedrock:InvokeModel",
    "bedrock:InvokeModelWithResponseStream"
  ],
  "Resource": "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-*"
}

InvokeModelWithResponseStream is required for streaming responses. Claude Code will silently freeze without it.

Choosing a model tier

Use /model inside Claude Code to switch on the fly. For programmatic defaults:

{
  "env": {
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "us.anthropic.claude-sonnet-4-5-20250929-v1:0",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL":  "us.anthropic.claude-haiku-4-5-20251001-v1:0"
  }
}

Haiku is 10–12× cheaper than Sonnet and handles routine edits, linting, and short Q&A well. Sonnet is the default sweet spot. Opus for architecture decisions and complex multi-file refactors. A dual-model strategy — Haiku for background tasks, Sonnet for interactive work — can cut token costs by 40–60% in practice.

3. Team conventions: CLAUDE.md and Agent Skills

Switching to Bedrock gives you infrastructure. The next step is encoding your team’s knowledge so every developer benefits from the same institutional context.

CLAUDE.md at scale

Claude Code reads CLAUDE.md from the project root (and parent directories up to ~). For a team on Bedrock, a shared project CLAUDE.md becomes the single source of truth for how Claude should behave on this codebase:

# CLAUDE.md

## AWS Environment
- Region: us-east-1 (primary), us-west-2 (DR)
- Account IDs: prod=123456789012 / staging=234567890123
- Deploy via CDK: `cdk deploy --all`

## Conventions
- Lambda functions: Python 3.12, arm64, 512 MB default
- IaC: CDK TypeScript (no CloudFormation raw templates)
- Secrets: Secrets Manager only — never env vars in Lambda config
- Naming: `{team}-{service}-{env}` (e.g. platform-auth-prod)

## Must-not-touch
- Do not modify the shared VPC stack (vpc-stack.ts) without infra-team review

Commit this file. Every developer who clones the repo gets the same Claude context automatically, without any personal setup.

Agent Skills

Agent Skills are modular, reusable capability files that live in .claude/skills/ (or a shared package your team installs). Where CLAUDE.md holds project context, Skills hold domain expertise — deep knowledge of a particular technology or workflow packaged for reuse across projects.

A typical AWS team might maintain Skills for:

CDK patterns — preferred constructs, tagging standards, stack naming
Bedrock AgentCore — how to register, version and invoke agents
Serverless cost ops — how to profile Lambda cold starts, set memory correctly, use Graviton

Skills compose with CLAUDE.md: CLAUDE.md says what the project is, Skills provide expertise on how to build things correctly within it. Together they replace the implicit knowledge that usually lives in a senior engineer’s head.

How CLAUDE.md (project context), Agent Skills (domain expertise), and Bedrock (infrastructure) layer to give Claude a team-aware, enterprise-governed workspace.

4. Calling Claude from Lambda: the two APIs

Once Claude Code is handling developer workflows, the next step is putting Claude into production AWS workloads. Lambda is the most common entry point.

The Converse API (recommended)

The Bedrock Converse API is model-agnostic: the same code works with Claude, Titan, Llama, and any other Bedrock model. Use it for new code.

import boto3, json

client = boto3.client('bedrock-runtime', region_name='us-east-1')

response = client.converse(
    modelId='anthropic.claude-sonnet-4-5-20250929-v1:0',
    messages=[{
        'role': 'user',
        'content': [{'text': 'Summarize this support ticket: ' + ticket_text}]
    }],
    inferenceConfig={'maxTokens': 512, 'temperature': 0.1}
)

answer = response['output']['message']['content'][0]['text']

The InvokeModel API (Claude-specific)

invoke_model gives you direct access to Claude’s native request format. Use it when you need features that haven’t landed in Converse yet (extended thinking, certain tool-use patterns).

response = client.invoke_model(
    modelId='anthropic.claude-sonnet-4-5-20250929-v1:0',
    body=json.dumps({
        'anthropic_version': 'bedrock-2023-05-31',
        'messages': [{'role': 'user', 'content': ticket_text}],
        'max_tokens': 512,
        'thinking': {'type': 'enabled', 'budget_tokens': 2000}
    })
)
result = json.loads(response['body'].read())

Lambda configuration for Bedrock calls

Key settings for a production Lambda that calls Bedrock:

Setting	Recommended value	Reason
Runtime	Python 3.12, arm64	Cost, performance
Memory	512 MB (tune up if slow)	Bedrock calls are I/O-bound, not CPU-bound
Timeout	60–300 s	Long reasoning chains can take 30+ s
Execution role	`bedrock:InvokeModel` + `bedrock:InvokeModelWithResponseStream`	Minimum required
Retry policy	Exponential backoff, max 3	Bedrock throttles at burst limits

Check your understanding

2 questions · your answers are saved in this browser only

1. Which IAM permission is commonly missed and causes Claude Code to silently freeze when connected to Bedrock?

Claude Code uses streaming responses by default. Without bedrock:InvokeModelWithResponseStream the CLI receives no data and appears to hang. bedrock:InvokeModel alone is not sufficient.
2. What is the primary advantage of the Bedrock Converse API over InvokeModel?

The Converse API abstracts model-specific request formats. The same code works with Claude, Titan, Llama, and other Bedrock models — useful when you want to swap models without rewriting integration code.

5. Orchestration with Step Functions

A single Lambda call handles stateless, single-turn Claude interactions well. When you need multi-step workflows — document ingestion, human review gates, parallel analysis branches, retry on failure — AWS Step Functions is the right tool.

Why Step Functions beats custom retry logic in Lambda

Concern	DIY in Lambda	Step Functions
Retry on throttle	Custom sleep loop	Built-in `Retry` with exponential backoff
Parallel tasks	Thread pools, complexity	`Parallel` state, managed
Human approval gate	Polling loop or SQS	`Wait for task token` built-in
Audit trail	CloudWatch logs	Full execution history in console
Timeout per step	One Lambda timeout	Per-state timeouts

A minimal document-analysis state machine

{
  "Comment": "Analyse a document with Claude via Bedrock",
  "StartAt": "FetchDocument",
  "States": {
    "FetchDocument": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456:function:fetch-doc",
      "Next": "AnalyseWithClaude"
    },
    "AnalyseWithClaude": {
      "Type": "Task",
      "Resource": "arn:aws:states:::bedrock:invokeModel",
      "Parameters": {
        "ModelId": "anthropic.claude-sonnet-4-5-20250929-v1:0",
        "Body": {
          "anthropic_version": "bedrock-2023-05-31",
          "messages": [{"role": "user", "content.$": "States.Format('Analyse this document: {}', $.documentText)"}],
          "max_tokens": 1024
        }
      },
      "Retry": [{"ErrorEquals": ["States.TaskFailed"], "IntervalSeconds": 5, "MaxAttempts": 3, "BackoffRate": 2}],
      "Next": "StoreResult"
    },
    "StoreResult": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456:function:store-result",
      "End": true
    }
  }
}

Step Functions has a native arn:aws:states:::bedrock:invokeModel integration — you can call Claude directly from a state definition without writing a Lambda wrapper for that step.

When to choose what

Single Lambda — one-turn response, user-facing latency matters, stateless
Lambda + Step Functions — multi-step pipeline, retries, parallel branches, human gates
Bedrock Agents — open-ended tool-use where the model itself decides which steps to run and in what order

Check your understanding

3 questions · your answers are saved in this browser only

1. You need Claude to analyse 200 documents in parallel and fan the results into a single summary. Which AWS pattern fits best?

Step Functions Parallel (or Map) states manage fan-out natively, collecting results before transitioning to the next state. A for-loop in Lambda is limited by that function's timeout and has no built-in fan-in.
2. Step Functions has a native integration resource for Bedrock (arn:aws:states:::bedrock:invokeModel). What does this mean in practice?

The optimised Bedrock integration in Step Functions lets you invoke Claude directly from a Task state. No Lambda wrapper is needed for that step, which reduces cold-start latency and infrastructure to maintain.
3. Which component is most appropriate when you need Claude to autonomously decide which tools to call and in what order across an open-ended task?

Bedrock Agents let Claude reason about which action to take next at each step. Lambda and Step Functions implement fixed, pre-determined sequences; Agents are the right choice when the flow is dynamic and model-driven.

6. Cost control and production readiness

Running Claude at scale on Bedrock requires attention to a few cost and reliability levers.

Prompt caching

Bedrock enables prompt caching by default for Claude models. Repeated system prompts (CLAUDE.md content, tool definitions) can reduce input token costs by up to 90% when the same prefix is reused across calls in the same session.

.claudeignore

Just like .gitignore, .claudeignore tells Claude Code which files to skip when indexing a project. Excluding node_modules/, dist/, *.log, and large data files keeps context windows focused and tokens cheap.

# .claudeignore
node_modules/
dist/
cdk.out/
*.log
*.parquet
__pycache__/

CloudWatch alarms for token spend

Bedrock surfaces usage metrics in CloudWatch. Set an alarm on InvocationLatency and on estimated token spend (via Cost Explorer’s daily budget alerts) so runaway agent loops don’t generate surprise bills.

Dual-model strategy

Assign model tiers to task types in your team CLAUDE.md:

## Model selection policy
- Routine edits, lint fixes, docs: use Haiku (cheapest)
- Interactive coding sessions: use Sonnet (default)
- Architecture review, complex refactors: use Opus

Teams that enforce this consciously typically cut token costs 40–60% compared to always using Sonnet.

Build it yourself

Follow these exact steps to reproduce it yourself · estimated time: ~25 min

Prerequisites

AWS CLI v2.32+ installed and configured (`aws --version`)
Claude Code installed (`claude --version`)
An IAM user or role with `bedrock:InvokeModel` and `bedrock:InvokeModelWithResponseStream` permissions
Amazon Bedrock enabled in your chosen region (us-east-1 or us-west-2 recommended)
A code project to experiment in

Step 1 — Enable model access in Bedrock

Open the Amazon Bedrock console, go to Model access, and request access to the Claude models you plan to use (Sonnet is a good starting point). Access is usually granted within a few minutes.

Step 2 — Configure Claude Code for Bedrock

Create or update ~/.claude/settings.json:

{
  "awsAuthRefresh": "aws sso login --profile your-profile-name",
  "env": {
    "AWS_REGION": "us-east-1",
    "CLAUDE_CODE_USE_BEDROCK": "1"
  }
}

Replace awsAuthRefresh with the command you use to refresh credentials (aws login for the new auth flow, or omit if you use long-lived access keys).

Verify it works:

claude --version
# Start a session and run a quick test
claude
# Inside the session:
# What AWS region am I configured to use?

Step 3 — Bootstrap project memory

Navigate to your code project and initialise CLAUDE.md:

cd your-project
claude

Inside the session:

/init

Review the generated CLAUDE.md, then add AWS-specific context:

## AWS Environment
- Region: us-east-1
- Deploy: cdk deploy (CDK TypeScript)
- Lambda runtime: python3.12, arm64
- Secrets: Secrets Manager only

Commit the file so all teammates get the same context.

Step 4 — Add a .claudeignore

cat > .claudeignore << 'EOF'
node_modules/
dist/
cdk.out/
*.log
__pycache__/
.venv/
EOF

Step 5 — Call Claude from a Lambda (test locally first)

Create test_bedrock.py:

import boto3, json

client = boto3.client('bedrock-runtime', region_name='us-east-1')

response = client.converse(
    modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
    messages=[{
        'role': 'user',
        'content': [{'text': 'Say hello and confirm you are running on Amazon Bedrock.'}]
    }],
    inferenceConfig={'maxTokens': 128}
)

print(response['output']['message']['content'][0]['text'])

python3 test_bedrock.py

Expected output: a short greeting confirming Bedrock access. If you see AccessDeniedException, check that both bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream are in your IAM policy.

Step 6 — Create a minimal Step Functions workflow

Save workflow.json:

{
  "Comment": "Hello Bedrock from Step Functions",
  "StartAt": "AskClaude",
  "States": {
    "AskClaude": {
      "Type": "Task",
      "Resource": "arn:aws:states:::bedrock:invokeModel",
      "Parameters": {
        "ModelId": "us.anthropic.claude-sonnet-4-5-20250929-v1:0",
        "Body": {
          "anthropic_version": "bedrock-2023-05-31",
          "messages": [{"role": "user", "content": "What is the capital of France? Answer in one word."}],
          "max_tokens": 16
        }
      },
      "End": true
    }
  }
}

Create and run the state machine:

aws stepfunctions create-state-machine \
  --name hello-bedrock \
  --definition file://workflow.json \
  --role-arn arn:aws:iam::YOUR_ACCOUNT:role/StepFunctionsBedrockRole \
  --region us-east-1

aws stepfunctions start-execution \
  --state-machine-arn arn:aws:states:us-east-1:YOUR_ACCOUNT:stateMachine:hello-bedrock \
  --region us-east-1

Check the execution result in the Step Functions console. You should see Claude’s response in the output of the AskClaude state.

Expected result: a working end-to-end path from developer laptop → Bedrock-powered Claude Code → production Lambda/Step Functions workflow, all governed by IAM and billed to a single AWS account. If any step fails, check IAM permissions first — they are the most common blocker at each layer.

Where to go next

Watch the session recording from Code w/ Claude 2026 (London) — Antonio Rodriguez walks through a live Bedrock setup.
Read the Claude Code on Amazon Bedrock quick setup guide on AWS Builder Center for the latest model IDs and auth options.
Explore the aws-skills GitHub repo for community-built Agent Skills covering CDK, serverless, and Bedrock AgentCore.
Continue with Mastering Claude Code to go deeper on CLAUDE.md, slash commands, and headless automation.

AI with Claude on AWS: From Code to Orchestration

1. Why Bedrock? The enterprise case for routing Claude through AWS

2. Switching Claude Code to Bedrock

Authentication options

IAM permissions required

Choosing a model tier

3. Team conventions: CLAUDE.md and Agent Skills

CLAUDE.md at scale

Agent Skills

4. Calling Claude from Lambda: the two APIs

The Converse API (recommended)

The InvokeModel API (Claude-specific)

Lambda configuration for Bedrock calls

Check your understanding

5. Orchestration with Step Functions

Why Step Functions beats custom retry logic in Lambda

A minimal document-analysis state machine

When to choose what

Check your understanding

6. Cost control and production readiness

Prompt caching

.claudeignore

CloudWatch alarms for token spend

Dual-model strategy

Build it yourself

Step 1 — Enable model access in Bedrock

Step 2 — Configure Claude Code for Bedrock

Step 3 — Bootstrap project memory

Step 4 — Add a .claudeignore

Step 5 — Call Claude from a Lambda (test locally first)

Step 6 — Create a minimal Step Functions workflow

Where to go next

Related lessons

Stop Babysitting Your Agents: From Approval Mode to Orchestration

What's New in Claude Code: Routines, Agent View, Auto Mode, and More

Agent Battle: Build the Best Diamond-Mining Agent

1. Why Bedrock? The enterprise case for routing Claude through AWS

2. Switching Claude Code to Bedrock

Authentication options

IAM permissions required

Choosing a model tier

3. Team conventions: CLAUDE.md and Agent Skills

CLAUDE.md at scale

Agent Skills

4. Calling Claude from Lambda: the two APIs

The Converse API (recommended)

The InvokeModel API (Claude-specific)

Lambda configuration for Bedrock calls

🧠 Check your understanding

5. Orchestration with Step Functions

Why Step Functions beats custom retry logic in Lambda

A minimal document-analysis state machine

When to choose what

🧠 Check your understanding

6. Cost control and production readiness

Prompt caching

.claudeignore

CloudWatch alarms for token spend

Dual-model strategy

🛠️ Build it yourself

Step 1 — Enable model access in Bedrock

Step 2 — Configure Claude Code for Bedrock

Step 3 — Bootstrap project memory

Step 4 — Add a .claudeignore

Step 5 — Call Claude from a Lambda (test locally first)

Step 6 — Create a minimal Step Functions workflow

Where to go next

Related lessons

Stop Babysitting Your Agents: From Approval Mode to Orchestration

What's New in Claude Code: Routines, Agent View, Auto Mode, and More

Agent Battle: Build the Best Diamond-Mining Agent

Check your understanding

Check your understanding

Build it yourself