AI Learning
intermediate ⏱️ 11 min read · 🎬 ~1 min video

Introducing Code Review by Claude Code

Code Review dispatches a team of agents on every pull request to catch the bugs that skims miss. When a PR is opened, agents search for bugs in parallel, verify each one to filter out false positives, and post only confirmed issues as inline comments.

This lesson is original educational writing based on this video by Anthropic (published March 9, 2026). All credit for the original content goes to the creators.

#claude-code #code-review #agents
Video thumbnail: Introducing Code Review by Claude Code
Original video β€” all credit to the creators. Watch the original on YouTube β†—

1. The problem with how code review actually works

Pull request review is one of the most important quality gates in software development and one of the most consistently underdone. In theory, reviewers carefully read every changed line, trace execution paths, check edge cases, verify security assumptions, and catch subtle logic bugs. In practice, most PR reviews are fast scans that catch obvious issues β€” the kind of mistakes that are easy to spot when you are looking for something. The bugs that slip through are the ones that require tracing through call chains, holding multiple pieces of context in mind simultaneously, or understanding security implications of an API usage pattern.

The reason human reviewers miss these bugs is not lack of skill β€” it is the economics of review time. A reviewer who spends twenty minutes deeply analyzing every PR will create a bottleneck that slows the team to a crawl. The sustainable approach is the quick scan: check the structure, read the obvious parts, approve if nothing jumps out. This is rational but imperfect, and the imperfection compounds over thousands of PRs into the security vulnerabilities, performance regressions, and logic bugs that become production incidents.

Code Review by Claude Code is designed for exactly this gap. It does not try to replace the human reviewer β€” human reviewers understand product intent, team conventions, architectural direction, and organizational context that no automated system fully captures. What it does is take on the expensive, parallelizable work of systematically searching the changed code for bugs, so that human reviewers can spend their time on the higher-level questions that require human judgment.

2. The multi-agent architecture: search, verify, post

PR Openedtrigger: on pull_requestSecurityagentLogic BugsagentPerformanceagentEdge CasesagentVerification Stepfilters false positives β€” posts only confirmed bugs
Code Review dispatches multiple agents to search for bugs in parallel, then runs a verification step to filter false positives before posting only confirmed issues as inline PR comments.

When a pull request is opened, Code Review triggers automatically and dispatches multiple specialized agents to examine the changed code in parallel. Each agent focuses on a different class of issues: security vulnerabilities, logic bugs, performance problems, edge case failures. Running these searches in parallel rather than sequentially means the entire review completes in roughly the same time as one agent would take β€” the parallelism is what makes thorough automated review feasible on every PR without creating a time bottleneck.

The agents do not simply pattern-match against known bad code signatures the way linters do. They read the changed code in context, follow data flows, trace function calls into the broader codebase, and reason about what the code does and whether it does it correctly. This context-aware analysis is what allows Code Review to catch bugs that require holding multiple pieces of information simultaneously β€” the kind of reasoning that is fast for humans who are deeply familiar with a codebase but slow and inconsistent for humans who are reviewing code they wrote just hours ago.

The verification step is the most important architectural decision in the feature. After the search agents produce their candidate findings, a separate verification step evaluates each one independently before anything gets posted to the PR. The verification agent asks: is this actually a bug, or is this a false positive triggered by a pattern that looks suspicious but is actually correct in this context? Only findings that pass verification get posted.

This two-stage approach β€” search, then verify β€” is what makes the feature useful in practice. Automated code analysis tools have a history of generating so many false positives that developers learn to ignore them. If Code Review posted every potential issue that search agents flagged, reviewers would stop reading the comments within a week. By filtering aggressively before posting, the feature maintains the signal quality that keeps humans engaged with the output.

3. What Code Review finds that other tools miss

Linters are good at enforcing rules: this function should have a return type annotation, this variable name violates naming conventions, this import is unused. These rules are mechanical and can be checked with pattern matching. They catch a real class of issues but not the interesting ones.

Code Review targets a different class: semantic bugs that require understanding program behavior, not just structure. Security vulnerabilities are a canonical example: an SQL injection vulnerability is not a syntactic pattern (the code may be perfectly valid JavaScript), it is a semantic error where user input reaches a database query without sanitization. Detecting it requires tracing data flow from an input source through function calls to a database call β€” exactly the kind of multi-step reasoning that agent-based analysis handles well.

Logic bugs are another class linters cannot reach. A function that returns the wrong value under a specific combination of conditions looks perfectly valid syntactically. Catching it requires reasoning about what the function is supposed to do (intent), what it actually does (behavior), and whether those match (correctness). Code Review agents can read the surrounding context β€” docstrings, variable names, calling code, tests β€” to build a model of intent and compare it against the implementation.

Performance problems that require cross-function analysis are a third category: for example, a database query issued inside a loop that was called from several levels up in the call stack. No single function looks wrong in isolation; the problem only appears when you trace the execution path. Agents that search the broader codebase context around the changed code are well-positioned to find these.

4. Setting up and integrating Code Review

Integration with existing PR workflows is designed to be minimal. Code Review hooks into standard CI/CD through a GitHub Action (or equivalent for other CI systems): add the action to your workflow file, configure your Anthropic API credentials, and define the scope of review (which file types, which directories, which bug classes to check). When a PR is opened, the action fires automatically.

The output is inline PR comments, which means findings appear directly on the relevant lines in GitHub’s or your code review tool’s standard PR interface. Reviewers see Code Review’s findings alongside their own observations and can respond to them, dismiss them with a note, or create issues from them β€” using the same workflow they already use for human reviewer comments.

Configuration tuning is where teams often invest time after initial setup. The default configuration catches a broad range of issues, but teams frequently want to focus on specific bug classes (prioritizing security issues for a security-sensitive product), exclude specific directories (third-party code, generated code), or adjust the threshold for what gets posted. The goal is calibrating signal quality for your specific codebase and team norms.

Check your understanding

4 questions Β· your answers are saved in this browser only

  1. 1. Why does Code Review use a two-stage architecture (search agents + verification step) rather than just posting everything the search agents find?

  2. 2. What is the key technical advantage of running multiple specialized agents in parallel rather than sequentially?

  3. 3. What class of bugs can Code Review find that traditional linters cannot?

  4. 4. How do Code Review findings appear in a developer's existing workflow?

Build it yourself

Follow these exact steps to reproduce it yourself

  1. Check access requirements β€” Code Review by Claude Code requires a Claude API key or a Claude Code subscription with access to the feature. Verify your account has access before starting setup.
  2. Add the GitHub Action β€” Create or update .github/workflows/code-review.yml with the Claude Code Review action. The basic configuration triggers on pull_request events and passes your Anthropic API key as a secret.
  3. Configure your API secret β€” Add your Anthropic API key to your repository’s GitHub Actions secrets as ANTHROPIC_API_KEY. Never hardcode API keys in workflow files.
  4. Define review scope β€” Configure which files and directories to include. Start broad (all changed files) and narrow if you find you are getting noise from areas where automated review adds less value (auto-generated code, test fixtures, migrations).
  5. Run on a real PR β€” Open a PR with a meaningful code change and observe the Code Review output. Note which findings are genuine bugs vs false positives β€” this calibration feedback informs configuration tuning.
  6. Tune for your codebase β€” Adjust the bug class focus and posting threshold based on your initial calibration. Security-sensitive codebases may want to prioritize security findings; performance-critical services may want more emphasis on the performance agent.
  7. Establish team norms β€” Decide how your team treats Code Review comments: required to address before merge, advisory only, must dismiss with a reason if not fixing. Clear norms prevent comments from being silently ignored.

Related lessons

advanced 🎬 Anthropic · ~9 min

Agent Battle: Build the Best Diamond-Mining Agent

An Anthropic workshop where participants build diamond-mining agents in 45 minutes and compete on a live leaderboard. Learn agent configuration, eval-driven improvement, and what separates winning architectures.

#agents #evaluation #claude-code
advanced 🎬 Anthropic · ~26 min

Agentic Analytics: How Omni Built a Production Harness with Claude Code

How Omni's CTO Chris Merrick designed a multi-agent analytics system powered entirely by Claude β€” covering coordinator architecture, tool sizing, the semantic layer as CLAUDE.md analogy, evaluation with LLM-as-Judge, and the critical design pivots that drove 86x token growth.

#agents #claude-code #evaluation
intermediate 🎬 Anthropic · ~19 min

AI with Claude on AWS: From Code to Orchestration

Stand up Claude Code on Amazon Bedrock, teach it your team's conventions with CLAUDE.md and Agent Skills, then graduate to full multi-step orchestration with Lambda and Step Functions.

#agents #claude-code #productivity