Every Mistake Becomes a Rule:
How diffray Learns from Your Feedback
Why AI code review without feedback learning is just an expensive noise generator
Boris Cherny, the creator of Claude Code, recently revealed his workflow, and one phrase from his thread exploded across the developer community: "Anytime we see Claude do something incorrectly we add it to the CLAUDE.md, so Claude knows not to do it next time."
Product leader Aakash Gupta summarized it perfectly: "Every mistake becomes a rule." The longer a team works together with AI, the smarter it becomes.
This is exactly the philosophy diffray is built on. Today, we'll show you how it works under the hood.
The Problem: Context Pollution Kills Review Quality
Before we talk about rules, we need to understand the main technical challenge of AI code review — context pollution.
Anthropic's research shows that LLMs, like humans, lose focus as the context window fills up. Corrections accumulate, side discussions pile up, outdated tool outputs linger. The result is predictable:
False positives
AI finds "problems" that don't exist
Hallucinations
Imaginary bugs and non-existent patterns
Goal drift
Reviews become progressively less relevant
JetBrains Research (December 2025) quantified this: agent contexts grow so rapidly that they become expensive, yet don't deliver significantly better task performance. More context ≠ better results.
The Solution: Specialized Subagents with Isolated Context
Boris Cherny uses subagents as "automated encapsulations of the most common workflows." His philosophy:
"Reliability comes from specialization plus constraint"
Instead of one omniscient reviewer, his code review command spawns multiple parallel agents with distinct responsibilities:
This adversarial layer is crucial. Secondary agents challenge findings from the first pass, eliminating false positives through structured skepticism.
The result, in Cherny's words: "finds all the real issues without the false ones."
How It Works Technically
When the main agent delegates to a subagent, a fresh context window spawns containing only the task description and relevant parameters. The subagent may explore extensively—consuming tens of thousands of tokens searching through code—but returns only a condensed summary of 1,000-2,000 tokens.
This preserves the primary agent's focus while enabling deep analysis.
At diffray, we use 10 specialized agents, each focused on a specific domain: security, performance, code style, architectural patterns, and more. Each agent operates in an isolated context and returns only substantive findings.
Rule Crafting: Turning Feedback into Knowledge
Now for the main event. Subagents solve the context problem. But how do you make AI learn from your corrections?
The CLAUDE.md Pattern
In Claude Code, teams maintain a CLAUDE.md file in their repository—a kind of "constitution" for the project. The file is automatically loaded into context at every session.
But there's a critical limitation. HumanLayer research shows that Claude Code's system prompt already contains ~50 instructions, and frontier LLMs reliably follow only 150-200 instructions total. Instruction-following quality decreases uniformly as count increases.
This means: you can't just dump 500 rules and expect magic.
Three Levels of Knowledge
Effective rules encode knowledge at three levels:
WHAT (Project Map)
## Tech Stack
- Backend: Python 3.11, FastAPI, SQLAlchemy
- Frontend: React 18, TypeScript, TailwindCSS
- DB: PostgreSQL 15WHY (Architectural Decisions)
## Why We DON'T Use ORM for Complex Queries
History: ORM generated N+1 queries in reports.
Decision: Raw SQL for analytics, ORM only for CRUD.HOW (Processes)
## Before Committing
- Run `make lint` — must pass with no errors
- Run `make test` — coverage must not dropThe Problem with Manual Approaches
Manual rule maintenance works... as long as your team is small and disciplined. In reality:
Developers forget to update rules
Rules go stale faster than code
Implicit conventions stay implicit
Tribal knowledge dies when key people leave
How diffray Automates Rule Crafting
diffray flips the process on its head. Instead of manually writing rules, you just give feedback on reviews.
The Learning Loop
Step 1: You Give Feedback
Gave a thumbs-down to a diffray comment? Replied "this isn't a bug, it's intentional"? Ignored a recommendation? diffray captures it all.
Step 2: Pattern Extraction
diffray analyzes: what exactly was wrong? Was it a false alarm (code is correct), inapplicable context (rule doesn't apply here), or project-specific convention (that's how we do it here)?
Step 3: Rule Generation
Based on the pattern, diffray formulates a rule that specifies the scope (which files/directories), what to suppress or enforce, and why. The rule is linked to the original feedback for traceability.
Step 4: Validation
Before applying the rule, diffray runs it against historical PRs. How many comments would have been suppressed? How many of those were actual false positives? The rule is applied only if it improves accuracy.
Types of Rules in diffray
Suppression Rules
"Don't flag X in context Y" — silence specific warnings in legacy code, test files, or generated code.
Enforcement Rules
"Always check for Z" — ensure critical patterns like SQL parameterization or auth checks are never missed.
Context Rules
"Consider the specifics" — adjust priority based on file type, decorators, or surrounding code patterns.
Terminology Rules
"We call it this" — teach diffray your domain vocabulary so it understands your codebase better.
Practical Example: From Annoyance to Rule
Imagine: diffray leaves a comment on your PR:
Warning Performance: Using any reduces type safety. Consider explicit typing.
You know this is a legacy module scheduled for rewrite next quarter. Fixing types now would be a waste of time.
You reply: "This is legacy, typing will be addressed during Q2 refactoring"
What happens next:
src/legacy/, there's a TODO with a datesrc/legacy/** with an expiration date (Q2)src/legacy/ — diffray stays silent about typesBut importantly: the rule isn't permanent. The expiration date means that after Q2, diffray will start checking types in that directory again.
The Metric: Reducing False Positive Rate
The key measure of AI code review effectiveness is false positive rate. How many comments out of 100 were useless?
Typical industry benchmarks:
40-60%
Baseline AI review false positives
25-35%
With manual rules
8-13%
diffray with learned rules
How we achieve this:
Context isolation
Through subagents prevents drift
Agent specialization
Improves accuracy in each domain
Learning from feedback
Eliminates recurring false positives
Rule validation
Prevents overfitting
Getting Started: Three Steps
Step 1: Connect diffray to Your Repository
Integration takes 5 minutes via GitHub App or GitLab webhook.
Step 2: Just Work
For the first 2-3 weeks, diffray operates in learning mode. It studies your project structure, your PR patterns, and your reviewers' comment style.
Step 3: Give Feedback
Don't silently ignore diffray comments. Give thumbs-up to useful ones, thumbs-down to useless ones, reply to debatable ones.
Every interaction makes diffray smarter. After a month, you'll have a personalized AI reviewer that knows your conventions better than a new developer after onboarding.
Conclusion: AI That Grows with Your Team
The philosophy of "every mistake becomes a rule" isn't just a catchy phrase. It's an architectural principle that separates toy tools from production-ready solutions.
diffray is built on three pillars:
Subagents with isolated context
For accuracy without pollution
Rule crafting from feedback
For learning without manual work
Validation on history
For confidence in improvements
The result: AI code review that gets better with every PR. Not because the model was updated, but because it learns from your team.
Start Teaching Your AI Reviewer Today
Install diffray and open a PR. It's free for public repos and includes a generous free tier for private repos.