Why Rules Are the Secret Weapon
of AI Code Review

How structured rules transform AI code review from inconsistent suggestions into deterministic, predictable results

December 20, 2025
10 min read

If you've ever tried using ChatGPT or Claude for code review, you've probably experienced this: the same code gets different feedback depending on how you phrase your question, what conversation came before, or what mood the AI seems to be in that day. That inconsistency isn't just annoying — it's a fundamental problem that makes AI-powered code review unreliable for production use.

At diffray, we solved this with structured rules. Let me show you why they're the key to making AI code review actually work.

The Problem: Context Dilution Kills AI Performance

When you dump an entire PR into an LLM with a prompt like "review this code," several things go wrong:

Signal drowns in noise

The AI tries to check security, performance, testing, documentation, style — everything at once. The result? Shallow analysis that misses critical issues.

The "lost middle" effect

Research consistently shows that information in the middle of long contexts gets ignored. Your most important file might be buried where the AI can't effectively process it.

Inconsistent results

Run the same review twice, get different findings. That's not acceptable for a tool teams rely on.

False positive explosion

Without focus, the AI flags generic patterns that aren't actually problems in your codebase. Studies show context-dump approaches produce 60-80% false positive rates.

The Solution: Structured Rules

A structured rule isn't just a prompt — it's a complete specification that tells the AI exactly what to look for, where to look, and how to report findings.

Here's what a rule looks like:

rules:
  - id: sec_sql_injection
    agent: security
    title: "SQL injection via string concatenation"
    description: "User input concatenated into SQL queries allows attackers to execute arbitrary SQL"
    importance: 9

    match:
      file_glob:
        - "src/api/**/*.ts"
        - "src/routes/**/*.ts"
        - "!**/*.test.ts"
      content_regex:
        - "query.*\$\{|\+.*query"

    checklist:
      - "Find all places where user input is used in SQL queries"
      - "Check if parameterized queries or prepared statements are used"
      - "Verify that string concatenation is NOT used for query building"

    examples:
      bad: |
        const query = `SELECT * FROM users WHERE id = ${userId}`;
        await db.execute(query);
      good: |
        const query = 'SELECT * FROM users WHERE id = ?';
        await db.execute(query, [userId]);

This structure enables three critical capabilities that make AI code review actually work.

1. Deterministic Results Through Precise Targeting

Notice the match section? That's pattern matching that determines when the rule even runs.

Without pattern matching, reviewing a 50-file PR means analyzing all 50 files for every possible issue. With pattern matching:

Without pattern matching

  • 50 files × all rules = massive prompt
  • ~500,000 tokens per review
  • Result: Slow, expensive, unfocused

With pattern matching

  • 5 API files × API security rules
  • ~50,000 tokens per review
  • Result: Fast, cheap, precise

90%

Token savings through precise pattern matching

If a PR has 5 API files containing SQL queries, only those 5 files get sent for SQL injection checks. The AI sees exactly what it needs to evaluate — nothing more.

This precision eliminates the randomness that plagued earlier AI code review attempts. Same code + same rules = same findings. Every time.

2. Multi-Agent Architecture: Right Expert for Each Problem

Here's where it gets interesting. diffray doesn't use one AI to review everything — it uses 31 specialized agents, each focused on what they do best:

Security Expert

Vulnerabilities, auth, data exposure

Performance Specialist

N+1 queries, memory leaks, bottlenecks

Bug Hunter

Null pointers, race conditions, edge cases

React Agent

Hooks patterns, lifecycle, state management

TypeScript Agent

Type safety, generics, strict mode patterns

Architecture Advisor

Design patterns, coupling, scalability

Each agent has a specialized system prompt with domain expertise. A security agent knows OWASP Top 10. A React agent understands hooks rules. This specialization dramatically improves detection quality.

But here's the key: rules determine which agent handles what.

- id: react_useeffect_cleanup
  agent: react              # Handled by React specialist
  match:
    file_glob: ["**/*.tsx"]
    content_regex: ["useEffect"]
  checklist:
    - "Check if useEffect returns a cleanup function"
    - "Verify event listeners and subscriptions are cleaned up"

This rule only goes to the React agent, only for TSX files, only when useEffect is present. The agent receives focused context about exactly one thing — not 50 different concerns competing for attention.

3. Context Curation: The Secret to Not Diluting AI Attention

Modern LLMs can handle 200k+ tokens. But research shows practical performance hits a ceiling around 25-30k tokens for complex reasoning. Beyond that, you're paying for tokens the model can't effectively use.

diffray's context management system ensures every agent receives precisely the information it needs:

AgentReceivesExcludes
Security ExpertAuth flows, API endpoints, data handlingUI components, styling
Performance SpecialistHot paths, loops, data structuresDocumentation, configs
React AgentComponents, hooks, state managementBackend code, SQL

Each agent gets exactly what it needs to do its job. Nothing more. The result: focused attention, better findings, fewer false positives.

4. Flexibility: Project and Team-Specific Rules

This is where rules really shine for teams. Every codebase has conventions that generic tools don't know about:

Your internal libraries:

- id: use_internal_http_client
  agent: consistency
  title: "Use internal HTTP client wrapper"
  match:
    file_glob: ["src/**/*.ts"]
    content_regex: ["\\bfetch\\(", "axios\\."]
  checklist:
    - "Find raw fetch() or axios calls"
    - "Check if code should use httpClient from @/lib/http-client"
  examples:
    bad: |
      const response = await fetch('/api/users');
    good: |
      import { httpClient } from '@/lib/http-client';
      const data = await httpClient.get('/api/users');

Your domain-specific patterns:

- id: money_use_decimal
  agent: bugs
  title: "Use Decimal for monetary values"
  match:
    content_regex: ["(price|amount|total)\\s*:\\s*(number|float)"]
  checklist:
    - "Find monetary fields using float/number types"
    - "Verify storage uses cents (integer) or Decimal type"

Your compliance requirements:

- id: pii_logging_check
  agent: compliance
  title: "Never log PII directly"
  tags: [compliance-gdpr, compliance-hipaa]
  match:
    content_regex: ["log.*(email|phone|ssn|password)"]

Drop these in .diffray/rules/ and they're active on the next PR. No infrastructure changes, no tool updates — just add YAML and your AI reviewer learns your standards.

Why YAML? Structure Enables Intelligence

You might wonder why we use structured YAML instead of just writing prompts. The structure enables capabilities that free-form prompts can't:

Pattern Matching and Filtering

The match section is processed before anything goes to the AI. This happens in code, not in prompts — it's deterministic, fast, and accurate.

Semantic Organization

Rules have id, agent, importance, tags. This allows filtering ("Only run security rules on this PR"), prioritization ("Show critical issues first"), and reporting ("How many security vs. quality issues?").

Version Control and Review

Rules are code. They live in your repo, go through PR review, have git history. When someone asks "why does the AI flag this?", the answer is in a YAML file anyone can read.

Cross-Phase Usage

Different parts of a rule are used in different phases: match for file filtering (no AI needed), checklist + examples for AI review, id + importance for deduplication and prioritization.

A flat prompt can't provide this separation of concerns.

The Result: AI Code Review That Actually Works

When you combine structured rules with multi-agent architecture and intelligent context management, you get:

Deterministic results

Same code, same rules, same findings

Focused analysis

Each agent does one thing exceptionally well

Low false positives

Precise matching eliminates noise

Team customization

Add your standards without infrastructure changes

This is what makes the difference between "AI code review" as a demo and AI code review as infrastructure your team actually relies on.

See It In Action

Install diffray and open a PR. It's free for public repos and includes a generous free tier for private repos.

مقالات ذات صلة