Nova Labs is currently on pause. New product purchases are unavailable. The blog remains live as an archive of the experiment.
Back to blog

How to automate code reviews with Claude Code

April 12, 2026 10 min read

Code review is one of the most time-consuming parts of software development. It requires consistency, attention to detail across multiple categories, and the discipline to apply the same standards on the tenth PR of the day as on the first. Claude Code is good at all three of those things.

This is not about replacing the human reviewer. It is about offloading the mechanical layer of review, the pattern checks, the security gotchas, the style violations, so that humans can focus on the judgment calls: architecture decisions, team conventions, product direction. Claude handles what scales poorly. You handle what requires context.

Here is a complete workflow for setting this up, from defining your review standards to running parallel reviews on large pull requests to catching issues before they leave your machine.

Start with your review standards

Claude Code follows rules you define. Without explicit standards, it will review code against general best practices, which is better than nothing but not tailored to your project. With a review rules file, it applies your team's actual standards on every run.

Create a file at .claude/rules/review.md. This file lives in your repository alongside your other Claude rules. Structure it by review category:

# Code Review Standards

## Security
- Flag any SQL queries built with string concatenation or f-strings (use parameterized queries)
- Flag any secrets, tokens, or API keys hardcoded in source files
- Flag use of subprocess with shell=True
- Flag disabled SSL/TLS verification (verify=False, rejectUnauthorized: false)
- Check that user input is validated before use in file paths or system calls

## Performance
- Flag N+1 query patterns in ORM code (queries inside loops)
- Flag missing database indexes on columns used in WHERE clauses or JOIN conditions
- Flag loading full objects when only a subset of fields is needed
- Flag synchronous calls inside async functions where await is appropriate

## Patterns and architecture
- Flag circular imports
- Flag functions longer than 50 lines without clear justification
- Flag repeated logic that should be extracted into a shared utility
- Flag direct mutation of props or function arguments
- Check that error handling is present for all external calls (API, database, filesystem)

## Style and consistency
- Check that new functions have docstrings or JSDoc comments if the rest of the file uses them
- Flag any TODO or FIXME comments added in this diff
- Check that test coverage exists for new public functions
- Flag test files with no assertions

Keep the rules concrete. "Avoid N+1 queries" gives Claude something to look for. "Write clean code" does not. Every rule should be something Claude can either find or not find in the diff.

Add a pointer to this file in your CLAUDE.md so it is loaded at the start of every session:

# CLAUDE.md (excerpt)

## Code Review
When asked to review code, always read .claude/rules/review.md first and apply
all rules in that file. Report findings by category. For each issue, include the
file path, line reference, the rule it violates, and a concrete fix.

With this in place, you can ask Claude to review any file or diff and get back structured, consistent output every time.

Running a review on a pull request

For a basic review of staged changes, run Claude with a specific prompt:

Review the changes in this pull request against our review standards in
.claude/rules/review.md. The diff is below.

Focus on:
1. Security issues (blocking)
2. Performance problems (blocking if they affect production paths)
3. Pattern violations (warn)
4. Style inconsistencies (inform only)

For each finding, state: file, approximate line, rule violated, suggested fix.
If there are no issues in a category, say so explicitly.

[paste git diff or describe the PR]

The explicit priority order matters. It tells Claude which findings to flag as blocking versus informational. Without it, a missing docstring gets the same weight as a SQL injection vector.

You can also point Claude at a branch directly:

Review the diff between main and feature/user-auth. Apply .claude/rules/review.md.
Focus on the authentication flow in src/auth/ and any database queries that touch
the users table.

Parallel reviews for larger pull requests

When a PR touches multiple layers of the stack, a single linear review gets messy. The reviewer switches context between API logic, database queries, and frontend components, and details get missed in the transitions.

Parallel subagents fix this. Each subagent focuses on one layer and applies the relevant subset of your review rules. The main agent collects findings and produces a unified report.

Here is how to set this up for a full-stack PR:

This PR adds a new user authentication flow. It touches:
- src/api/auth.py (backend API endpoints)
- src/db/migrations/0012_add_sessions_table.sql (database migration)
- src/frontend/components/LoginForm.tsx (frontend component)

Spawn three parallel read-only subagents:

Subagent 1 — Backend security review:
Read src/api/auth.py. Apply the Security and Patterns sections of
.claude/rules/review.md. Focus specifically on: token handling, session
management, input validation, and error handling for auth failure cases.
Return findings as JSON: {file, line, severity, rule, suggestion}.

Subagent 2 — Database review:
Read src/db/migrations/0012_add_sessions_table.sql and any models that
reference the sessions table. Apply the Performance section of
.claude/rules/review.md. Check: indexes on foreign keys, query patterns that
will hit the sessions table, and whether the migration is reversible.
Return findings as JSON: {file, line, severity, rule, suggestion}.

Subagent 3 — Frontend review:
Read src/frontend/components/LoginForm.tsx. Apply the Security and Style
sections of .claude/rules/review.md. Check: client-side validation, token
storage approach (localStorage vs httpOnly cookies), error state handling.
Return findings as JSON: {file, line, severity, rule, suggestion}.

After all three finish, combine their findings. Group by severity: blocking,
warn, inform. Deduplicate if the same issue appears in multiple layers.

Each subagent starts with a clean context and focuses on its domain. The main agent sees the full picture only at the aggregation step, which keeps the review sharp rather than diluted.

The read-only constraint matters here. You want the subagents to report findings, not start making changes mid-review.

Pre-commit hooks for catch-it-before-it-ships

Reviewing after a PR is open is fine. Catching issues before they leave your local machine is better. Claude Code's hooks system lets you run a review at commit time.

Hooks live in your .claude/settings.json file. The PreToolUse hook runs before Claude takes an action. You can use it to intercept the commit step and run a review first:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "python .claude/scripts/pre_commit_review.py"
          }
        ]
      }
    ]
  }
}

The script at .claude/scripts/pre_commit_review.py does the actual work:

import subprocess
import sys

def get_staged_diff():
    result = subprocess.run(
        ["git", "diff", "--cached", "--unified=3"],
        capture_output=True, text=True
    )
    return result.stdout

def run_review(diff):
    # Read review rules
    with open(".claude/rules/review.md") as f:
        rules = f.read()

    # Write a focused review prompt to a temp file for Claude to pick up
    prompt = f"""
You are doing a pre-commit security and pattern check.
Review the following staged diff against these rules:

{rules}

STAGED DIFF:
{diff}

Output ONLY issues with severity BLOCKING. If there are none, output exactly:
REVIEW PASSED

For each blocking issue: file | line | rule | fix
"""
    with open(".tmp/pre_commit_prompt.txt", "w") as f:
        f.write(prompt)

    return prompt

if __name__ == "__main__":
    diff = get_staged_diff()
    if not diff.strip():
        sys.exit(0)  # Nothing staged, skip

    run_review(diff)
    # Output signals to the hook runner whether to proceed

This is intentionally lightweight. The pre-commit hook should only block on hard issues, not warn on every style nit. Your review rules file has the distinction built in: check the Security and blocking Performance rules at commit time, leave the Style and Patterns sections for the PR review.

If the hook overhead becomes noticeable, scope it to specific file patterns. Running a full review on every commit to a documentation file is wasteful. Trigger it for *.py, *.ts, and SQL files, skip everything else.

What to review and what not to

Claude Code is good at finding certain categories of issues and unreliable for others. Understanding this distinction keeps your review rules useful rather than noisy.

Where Claude adds consistent value:

  • Security patterns. SQL injection, hardcoded secrets, insecure defaults, missing input validation. These are pattern-matching problems. Claude is fast and consistent at them.
  • Performance anti-patterns. N+1 queries, missing indexes on obvious join columns, loading full records when projections would work. Again, pattern matching.
  • Error handling gaps. External calls with no error handling, exceptions swallowed silently, missing fallbacks for nullable values.
  • Style consistency. Whether new code follows the conventions already present in the file. Claude reads the existing code before reviewing, so it notices when something diverges.
  • Test coverage gaps. New public functions with no corresponding test, or test files with no assertions.

Where Claude is less reliable:

  • Long-term architectural decisions. Whether to add another microservice or extend the monolith. Claude can summarize tradeoffs, but the call depends on your team's specific constraints and direction.
  • Business logic correctness. Claude does not know your domain rules. It can check that the code does what it appears to do, but not whether that is the right thing to do.
  • Team convention judgment calls. Whether a naming choice fits the team's emerging style. This requires knowing what the team is moving toward, not just what it has done.

Keep your review rules in the first category. If you find yourself writing rules that require business context to evaluate, those belong in the human review, not the automated layer.

Structuring the output

A review that dumps a wall of text is not useful. Ask for structured output and you will get something you can actually act on.

The JSON format from the parallel review example above works well. You can also use a simpler table format for smaller reviews:

Return findings in this format:

| Severity | File | Line | Rule | Suggested fix |
|----------|------|------|------|---------------|
| BLOCKING | src/api/auth.py | 47 | SQL parameterization | Replace f-string with parameterized query |
| WARN | src/api/auth.py | 83 | Error handling | Wrap external call in try/except |

If a category has no findings, include a row: | PASS | - | - | [category name] | No issues found |

The explicit "PASS" row for empty categories is useful. It tells you Claude looked and found nothing, rather than leaving you wondering whether it skipped a category.

Keeping your review rules current

Your review rules file is a living document. The first version covers the obvious categories. Over time, you add rules based on issues that slipped through, patterns that keep appearing, and conventions your team agrees on.

After each sprint retrospective or postmortem, ask: what bugs or review comments came up repeatedly? If something showed up twice in human reviews, it belongs in the automated layer. Add it to .claude/rules/review.md.

For teams, treat the rules file like any other configuration. Pull requests to add or change a rule, a brief discussion, then merge. The rules become a shared artifact that encodes what your team has learned about its own codebase.

Want to check whether your current CLAUDE.md is set up to support consistent code reviews? The ContextKit Analyzer scores your configuration and flags gaps in your rules structure, including whether your review standards are specific enough to act on.

If you are starting from scratch, the ContextKit Generator can scaffold a review rules template based on your stack. Pick your language and framework, and it generates a starting set of review criteria you can refine to fit your project.

For the full setup, including how to chain review hooks into your CI pipeline and how to version your review rules alongside your code, the AI OS Blueprint has a complete section on code quality workflows.

Want to build your own AI OS?

The AI OS Blueprint gives you the complete system: 53-page playbook, working skills, and a clonable repo. Starting at $47.

30-day money-back guarantee. No subscription.