"If modern LLMs can handle 200k tokens, why not just send the diff with relevant context and let the model figure it out? What's the point of all this agent complexity?"
A prompt can only see what you send it. For meaningful code review, you need context from across your entire codebase — imports, dependencies, related files, tests, conventions.
Research proves that dumping more context into LLMs actively harms performance. This is called "context dilution."
10-20%
performance drop from too many documents
U-curve
info in middle gets "lost"
60-80%
false positive rate in context-dump tools
Agents don't just "read prompts better." They actively investigate your codebase:
Fetch only relevant files on-demand, not dump everything upfront
"I suspect a type mismatch" → search callers → confirm with static analysis
Follow leads across files, dig deeper when something looks suspicious
Run linters, type checkers, and analyzers to verify findings with real data
A prompt sees what you give it.
An agent finds what it needs.
The difference between useful review and noise isn't how much context you have — it's having the right context
Before review starts, we build a map of how files connect — imports, exports, type definitions, and call chains
Each agent receives only the context relevant to its task — security agent gets auth flows, not UI styling
Agents fetch additional context only when needed — following leads without upfront overload
Core context (diff, types) stays resident; surrounding context (callers, tests) loaded as needed
200k tokens of everything — diff, full files, random dependencies...
Focused chunks — diff + direct dependencies + relevant patterns
A single LLM call reviewing code has fundamental limitations
Limited to the diff you provide
No iteration or verification
Blind to dependencies and context
No way to validate claims
Attention spread thin across all concerns
"Make sure callers are updated"
Navigates your entire project
Follows leads, digs deeper
Understands imports and dependencies
Runs static analyzers to confirm
Each agent specializes in one area
"3 call sites have type mismatches at lines 45, 89, 112"
The difference is between speculation and investigation.
An agent is an AI system that can think, act, and verify
Read files, search code, run static analyzers
Choose what to investigate based on findings
Follow leads, verify hypotheses, dig deeper
Validate reasoning against real data
When diffray reviews your PR, agents don't just "look at the diff"
Follow imports to understand how changed code affects the entire system
Examine tests, configs, and documentation for context
Run static analysis to confirm suspected issues actually exist
Look up type definitions, API contracts, and conventions
Consider a function signature change in a PR:
"This changes the return type, make sure callers are updated"
Generic advice. No specifics.
→ "Found 3 breaking changes: src/api/users.ts:45, src/hooks/useAuth.ts:89, src/utils/validate.ts:112"
To truly understand changes, you need to see how they fit into the entire codebase
New function formatUserName() added
Looks syntactically correct
No obvious bugs in these 20 lines
Verdict: "LGTM" — but completely missing the bigger picture
This function duplicates utils/names.ts:formatName()
Existing function handles edge cases this one misses
3 other files already use the existing utility
This breaks the naming convention in /docs/CONVENTIONS.md
Verdict: "Consider using existing formatName() from utils/names.ts"
Is the developer reinventing the wheel? Does a similar solution already exist in the codebase?
Do these changes follow established patterns? Or introduce a conflicting approach?
How do these changes affect the rest of the system? What depends on the modified code?
Are team conventions and documented standards being followed?
A diff shows you what changed. Full codebase context shows you whether it should have.
Powerful foundations enabling true multi-agent collaboration
Every review goes through a multi-phase pipeline, each phase optimized for its purpose
Clone
Fetch repo & checkout PR
Data Prep
Build dependency graph
Summarize
LLM summarizes changes
Triage
Route files to agents
Rules
Load & filter rules
Review
Parallel agent analysis
Dedupe
Merge & rescore
Validation
Verify & rescore
Report
Generate PR comments
Clone
Fetch repo & checkout PR
Data Prep
Build dependency graph
Summarize
LLM summarizes changes
Triage
Route files to agents
Rules
Load & filter rules
Review
Parallel agent analysis
Dedupe
Merge & rescore
Validation
Verify & rescore
Report
Generate PR comments
The Result
A multi-agent system that combines AI reasoning with concrete code analysis — delivering accurate, verified findings instead of speculation.
See how investigation beats speculation. Try diffray free on your next PR.