Multi-Agent Architecture

Why
Multiple Agents?

Most AI code review tools send your code to an LLM with "review this" and hope for the best. diffray uses specialized agents that investigate, verify, and validate — like a team of expert reviewers.

Fair Question

"Can't a Single Prompt Do This?"

"If modern LLMs can handle 200k tokens, why not just send the diff with relevant context and let the model figure it out? What's the point of all this agent complexity?"

The Fundamental Problem: Your Codebase Doesn't Fit

A prompt can only see what you send it. For meaningful code review, you need context from across your entire codebase — imports, dependencies, related files, tests, conventions.

Average codebase: 100k-500k+ lines
LLM context window: ~200k tokens max
Practical performance ceiling: ~25-30k tokens

Even If It Fit — It Wouldn't Work

Research proves that dumping more context into LLMs actively harms performance. This is called "context dilution."

10-20%

performance drop from too many documents

U-curve

info in middle gets "lost"

60-80%

false positive rate in context-dump tools

Read the research: Why Curated Context Beats Context Volume →

What Agents Actually Provide

Agents don't just "read prompts better." They actively investigate your codebase:

Selective Context Retrieval

Fetch only relevant files on-demand, not dump everything upfront

Hypothesis Verification

"I suspect a type mismatch" → search callers → confirm with static analysis

Iterative Investigation

Follow leads across files, dig deeper when something looks suspicious

Tool Integration

Run linters, type checkers, and analyzers to verify findings with real data

A prompt sees what you give it.

An agent finds what it needs.

Precision Over Volume

Curated Context Management

The difference between useful review and noise isn't how much context you have — it's having the right context

How diffray Curates Context

Dependency Graph Analysis

Before review starts, we build a map of how files connect — imports, exports, type definitions, and call chains

Smart Filtering

Each agent receives only the context relevant to its task — security agent gets auth flows, not UI styling

On-Demand Retrieval

Agents fetch additional context only when needed — following leads without upfront overload

Layered Context

Core context (diff, types) stays resident; surrounding context (callers, tests) loaded as needed

Context Dump Approach

200k tokens of everything — diff, full files, random dependencies...

Signal drowns in noise
Important details in the "lost middle"
Attention spread across irrelevant code

Curated Context Approach

Focused chunks — diff + direct dependencies + relevant patterns

Every token serves a purpose
Critical info stays in focus
Full attention on what matters
Learn more about our AI engines →

The Problem with "Just Ask the LLM"

A single LLM call reviewing code has fundamental limitations

Single LLM Call
Sees only what you send

Limited to the diff you provide

One-shot generation

No iteration or verification

Can't follow imports

Blind to dependencies and context

Hallucinations go unchecked

No way to validate claims

Fixed context window

Attention spread thin across all concerns

Generic advice

"Make sure callers are updated"

Agent-Based System
Explores codebase autonomously

Navigates your entire project

Iterative analysis

Follows leads, digs deeper

Navigates project structure

Understands imports and dependencies

Validates with real tools

Runs static analyzers to confirm

Focused attention

Each agent specializes in one area

Specific findings

"3 call sites have type mismatches at lines 45, 89, 112"

The difference is between speculation and investigation.

What Makes an Agent Different?

An agent is an AI system that can think, act, and verify

Use Tools

Read files, search code, run static analyzers

Make Decisions

Choose what to investigate based on findings

Iterate

Follow leads, verify hypotheses, dig deeper

Self-Correct

Validate reasoning against real data

What diffray Agents Actually Do

When diffray reviews your PR, agents don't just "look at the diff"

Trace Dependencies

Follow imports to understand how changed code affects the entire system

Check Related Files

Examine tests, configs, and documentation for context

Verify Assumptions

Run static analysis to confirm suspected issues actually exist

Cross-Reference

Look up type definitions, API contracts, and conventions

Real Example

Consider a function signature change in a PR:

Single LLM approach

"This changes the return type, make sure callers are updated"

Generic advice. No specifics.

Agent approach
  1. Searches for all usages of this function
  2. Identifies 3 call sites with type mismatches
  3. Checks if tests cover these scenarios
  4. Reports specific files and line numbers

→ "Found 3 breaking changes: src/api/users.ts:45, src/hooks/useAuth.ts:89, src/utils/validate.ts:112"

Full Codebase Awareness

The Diff Is Not Enough

To truly understand changes, you need to see how they fit into the entire codebase

What a diff-only review sees

New function formatUserName() added

Looks syntactically correct

No obvious bugs in these 20 lines

Verdict: "LGTM" — but completely missing the bigger picture

What a codebase-aware agent sees

This function duplicates utils/names.ts:formatName()

Existing function handles edge cases this one misses

3 other files already use the existing utility

This breaks the naming convention in /docs/CONVENTIONS.md

Verdict: "Consider using existing formatName() from utils/names.ts"

What diffray agents check beyond the diff:

Duplicate Detection

Is the developer reinventing the wheel? Does a similar solution already exist in the codebase?

Pattern Consistency

Do these changes follow established patterns? Or introduce a conflicting approach?

Impact Analysis

How do these changes affect the rest of the system? What depends on the modified code?

Convention Adherence

Are team conventions and documented standards being followed?

A diff shows you what changed. Full codebase context shows you whether it should have.

The Context Dilution Problem

A single LLM reviewing all aspects of code simultaneously faces a fundamental problem: context dilution.

As it tries to check security, performance, bugs, and style all at once, its attention spreads thin. The more concerns it juggles, the more likely it is to miss issues.

diffray's solution: Specialized agents, each with its own narrow focus. Like having a team of specialists vs. one generalist trying to do everything.

Each Agent:

Curated Context

Starts with precisely gathered, focused context — only the relevant files, dependencies, and patterns for its specific task

Stays Focused

One job, done thoroughly — security agent only looks for vulnerabilities, never drifts to styling

Goes Deep

Can spend full context on its specialty — not splitting attention across 10 different concerns

Never Forgets

Doesn't lose track mid-review — every rule, every check, every time, without exception

Never Tires

50th PR of the day gets the same attention as the first — no fatigue, no rushing, no shortcuts

9 Specialized Agents

Meet the Review Team

Security, Performance, Bugs, Architecture, Testing, and more — each agent brings deep expertise to their domain. See exactly what each one does.

The Engines Behind diffray

Powerful foundations enabling true multi-agent collaboration

Core Engine

  • Latest Anthropic models (Haiku, Sonnet, Opus)
  • Task-matched model selection
  • Intelligent file search
  • Built-in task management

Tooling Engine

  • Static analyzer integration
  • Hypothesis verification
  • Concrete tool output
  • Dramatically reduced false positives

Multi-Agent Architecture

  • Parallel agent execution
  • Shared codebase context
  • Finding deduplication
  • Cross-agent validation

The Phased Review Pipeline

Every review goes through a multi-phase pipeline, each phase optimized for its purpose

1

Clone

Fetch repo & checkout PR

2

Data Prep

Build dependency graph

3

Summarize

LLM summarizes changes

4

Triage

Route files to agents

5

Rules

Load & filter rules

6

Review

Parallel agent analysis

7

Dedupe

Merge & rescore

8

Validation

Verify & rescore

9

Report

Generate PR comments

The Result

A multi-agent system that combines AI reasoning with concrete code analysis — delivering accurate, verified findings instead of speculation.

Free Resource

AI Code Review Playbook

Data-driven insights from 50+ research sources. Why developers spend 5-6 hours weekly on review, why AI-generated code needs more scrutiny, and how to implement AI tools developers actually trust.

Experience the Difference
Agents Make

See how investigation beats speculation. Try diffray free on your next PR.

Free 14-day trial
No credit card required
2-minute setup