Why Rules Are the Secret Weapon
of AI Code Review
How structured rules transform AI code review from inconsistent suggestions into deterministic, predictable results
If you've ever tried using ChatGPT or Claude for code review, you've probably experienced this: the same code gets different feedback depending on how you phrase your question, what conversation came before, or what mood the AI seems to be in that day. That inconsistency isn't just annoying — it's a fundamental problem that makes AI-powered code review unreliable for production use.
At diffray, we solved this with structured rules. Let me show you why they're the key to making AI code review actually work.
The Problem: Context Dilution Kills AI Performance
When you dump an entire PR into an LLM with a prompt like "review this code," several things go wrong:
Signal drowns in noise
The AI tries to check security, performance, testing, documentation, style — everything at once. The result? Shallow analysis that misses critical issues.
The "lost middle" effect
Research consistently shows that information in the middle of long contexts gets ignored. Your most important file might be buried where the AI can't effectively process it.
Inconsistent results
Run the same review twice, get different findings. That's not acceptable for a tool teams rely on.
False positive explosion
Without focus, the AI flags generic patterns that aren't actually problems in your codebase. Studies show context-dump approaches produce 60-80% false positive rates.
The Solution: Structured Rules
A structured rule isn't just a prompt — it's a complete specification that tells the AI exactly what to look for, where to look, and how to report findings.
Here's what a rule looks like:
rules:
- id: sec_sql_injection
agent: security
title: "SQL injection via string concatenation"
description: "User input concatenated into SQL queries allows attackers to execute arbitrary SQL"
importance: 9
match:
file_glob:
- "src/api/**/*.ts"
- "src/routes/**/*.ts"
- "!**/*.test.ts"
content_regex:
- "query.*\$\{|\+.*query"
checklist:
- "Find all places where user input is used in SQL queries"
- "Check if parameterized queries or prepared statements are used"
- "Verify that string concatenation is NOT used for query building"
examples:
bad: |
const query = `SELECT * FROM users WHERE id = ${userId}`;
await db.execute(query);
good: |
const query = 'SELECT * FROM users WHERE id = ?';
await db.execute(query, [userId]);This structure enables three critical capabilities that make AI code review actually work.
1. Deterministic Results Through Precise Targeting
Notice the match section? That's pattern matching that determines when the rule even runs.
Without pattern matching, reviewing a 50-file PR means analyzing all 50 files for every possible issue. With pattern matching:
Without pattern matching
- 50 files × all rules = massive prompt
- ~500,000 tokens per review
- Result: Slow, expensive, unfocused
With pattern matching
- 5 API files × API security rules
- ~50,000 tokens per review
- Result: Fast, cheap, precise
90%
Token savings through precise pattern matching
If a PR has 5 API files containing SQL queries, only those 5 files get sent for SQL injection checks. The AI sees exactly what it needs to evaluate — nothing more.
This precision eliminates the randomness that plagued earlier AI code review attempts. Same code + same rules = same findings. Every time.
2. Multi-Agent Architecture: Right Expert for Each Problem
Here's where it gets interesting. diffray doesn't use one AI to review everything — it uses 31 specialized agents, each focused on what they do best:
Security Expert
Vulnerabilities, auth, data exposure
Performance Specialist
N+1 queries, memory leaks, bottlenecks
Bug Hunter
Null pointers, race conditions, edge cases
React Agent
Hooks patterns, lifecycle, state management
TypeScript Agent
Type safety, generics, strict mode patterns
Architecture Advisor
Design patterns, coupling, scalability
Each agent has a specialized system prompt with domain expertise. A security agent knows OWASP Top 10. A React agent understands hooks rules. This specialization dramatically improves detection quality.
But here's the key: rules determine which agent handles what.
- id: react_useeffect_cleanup
agent: react # Handled by React specialist
match:
file_glob: ["**/*.tsx"]
content_regex: ["useEffect"]
checklist:
- "Check if useEffect returns a cleanup function"
- "Verify event listeners and subscriptions are cleaned up"This rule only goes to the React agent, only for TSX files, only when useEffect is present. The agent receives focused context about exactly one thing — not 50 different concerns competing for attention.
3. Context Curation: The Secret to Not Diluting AI Attention
Modern LLMs can handle 200k+ tokens. But research shows practical performance hits a ceiling around 25-30k tokens for complex reasoning. Beyond that, you're paying for tokens the model can't effectively use.
diffray's context management system ensures every agent receives precisely the information it needs:
| Agent | Receives | Excludes |
|---|---|---|
| Security Expert | Auth flows, API endpoints, data handling | UI components, styling |
| Performance Specialist | Hot paths, loops, data structures | Documentation, configs |
| React Agent | Components, hooks, state management | Backend code, SQL |
Each agent gets exactly what it needs to do its job. Nothing more. The result: focused attention, better findings, fewer false positives.
4. Flexibility: Project and Team-Specific Rules
This is where rules really shine for teams. Every codebase has conventions that generic tools don't know about:
Your internal libraries:
- id: use_internal_http_client
agent: consistency
title: "Use internal HTTP client wrapper"
match:
file_glob: ["src/**/*.ts"]
content_regex: ["\\bfetch\\(", "axios\\."]
checklist:
- "Find raw fetch() or axios calls"
- "Check if code should use httpClient from @/lib/http-client"
examples:
bad: |
const response = await fetch('/api/users');
good: |
import { httpClient } from '@/lib/http-client';
const data = await httpClient.get('/api/users');Your domain-specific patterns:
- id: money_use_decimal
agent: bugs
title: "Use Decimal for monetary values"
match:
content_regex: ["(price|amount|total)\\s*:\\s*(number|float)"]
checklist:
- "Find monetary fields using float/number types"
- "Verify storage uses cents (integer) or Decimal type"Your compliance requirements:
- id: pii_logging_check
agent: compliance
title: "Never log PII directly"
tags: [compliance-gdpr, compliance-hipaa]
match:
content_regex: ["log.*(email|phone|ssn|password)"]Drop these in .diffray/rules/ and they're active on the next PR. No infrastructure changes, no tool updates — just add YAML and your AI reviewer learns your standards.
Why YAML? Structure Enables Intelligence
You might wonder why we use structured YAML instead of just writing prompts. The structure enables capabilities that free-form prompts can't:
Pattern Matching and Filtering
The match section is processed before anything goes to the AI. This happens in code, not in prompts — it's deterministic, fast, and accurate.
Semantic Organization
Rules have id, agent, importance, tags. This allows filtering ("Only run security rules on this PR"), prioritization ("Show critical issues first"), and reporting ("How many security vs. quality issues?").
Version Control and Review
Rules are code. They live in your repo, go through PR review, have git history. When someone asks "why does the AI flag this?", the answer is in a YAML file anyone can read.
Cross-Phase Usage
Different parts of a rule are used in different phases: match for file filtering (no AI needed), checklist + examples for AI review, id + importance for deduplication and prioritization.
A flat prompt can't provide this separation of concerns.
The Result: AI Code Review That Actually Works
When you combine structured rules with multi-agent architecture and intelligent context management, you get:
Deterministic results
Same code, same rules, same findings
Focused analysis
Each agent does one thing exceptionally well
Low false positives
Precise matching eliminates noise
Team customization
Add your standards without infrastructure changes
This is what makes the difference between "AI code review" as a demo and AI code review as infrastructure your team actually relies on.
See It In Action
Install diffray and open a PR. It's free for public repos and includes a generous free tier for private repos.