Adversarial Review · A Beginner's Tour

You wrote a plan.
You're about to execute Monday.

You're 90% sure it's right. The 10% you missed is what this skill is for — a structured way to send a fresh pair of eyes at your decision before it's too late to undo.

Skill · adversarial-review
Built · 29 April 2026
For · architecture docs · ADRs · RFCs · migration plans
The Problem

You can't proofread your own assumptions.

When you write a multi-section plan over multiple sessions — building up context, making trade-offs, rejecting alternatives — your brain absorbs the constraints invisibly. By the time you finish, you can't see them anymore. They feel like reality.

This is the same reason you can't proofread your own writing. The author of a sentence already knows what it means, so their eye glides over the missing word. A fresh reader stops cold at it.

The fix sounds easy: just ask someone else to review it.

But here's where it gets messy. Most "fresh eyes" reviews fail in predictable ways:

A bad adversarial review produces noise that wastes a fresh chat and erodes trust in the verdict. — from the synthesis behind this skill

So this skill is essentially a structured charter for fresh-eyes review — designed to filter out the five common failure modes above and force the reviewer to deliver grounded, prioritized findings the author can actually act on.

A Metaphor

Think of it as a pre-flight checklist.

A pilot can fly the plane they designed. But before takeoff, they still walk around the aircraft with a checklist — touching every surface, looking for the thing their familiarity made invisible.

That's adversarial review. The reviewer isn't smarter than you. They just bring fresh attention plus a checklist of failure modes that experience has shown are easy to miss. They're allowed to question anything — but only with grounded evidence, not vibes.

The Theory

Four traditions, blended.

This skill isn't invented from scratch. It borrows the strongest ideas from four older review traditions and fuses them into one method. Here's where each piece comes from.

Pre-mortem

Imagine the project has already failed. Now work backwards: why did it fail? The cognitive trick is "prospective hindsight" — your brain finds flaws more easily when it assumes failure than when it forecasts risk.

Gary Klein · Harvard Business Review · 2007

Steelman

Before you criticize an idea, restate it so well its author would say "yes, exactly." Then list what you agree with. Only then are you allowed to attack. Forces real understanding before fault-finding.

Daniel Dennett · Rapoport's Rules · Intuition Pumps

ADR Review

Architecture Decision Records are immutable. You don't edit a decision after it's made — you supersede it with a new decision that cites the old one. This stops endless re-litigation of settled debates.

Michael Nygard · Documenting Architecture Decisions

Red Team

Adopt the attacker's mindset. Don't ask "will this work?" — ask "how would I break this?" Borrowed from security research: focus on attack surfaces, edge cases, and the failure modes that survive a happy-path test.

Bruce Schneier · Google SRE · Microsoft STRIDE

The blend

Pre-mortem gives the cognitive frame ("imagine it failed"). Steelman makes the critique credible. ADR review prevents Groundhog Day debates. Red team focuses attention on real attack surfaces. None alone is enough; together they cover each other's blind spots.

Anatomy

Three reviewers, one moderator.

For any non-trivial doc (more than ~150 lines), the skill spawns three independent reviewers in fresh chats. Each has a distinct lens. A moderator chat — usually the original author — integrates the three reports.

The Decision architecture doc · ADR migration plan · RFC A Grounding pressure-test Top-of-doc decisions Foundational claims Pending-revisions log B Execution risk edge-case hunt Implementation steps Migration / rollout Gotchas, edge cases C Cold eyes no prior context Whole doc, fresh Catches absorbed assumptions Moderator aggregate · dedupe · accept/defer/reject
Three lenses look at the doc independently. The moderator integrates findings — accept, defer, or reject with reason.

The reason for three is practical: a single reviewer reading a 300-line doc loses attention. Three reviewers with smaller scopes stay sharp. Their findings naturally overlap on the most important issues — that overlap is the strongest signal you have.

If the doc is shorter (under ~150 lines) or single-topic, the skill collapses to one reviewer with all three lenses in their charter.

The Steelman Gate

Climb the ladder before you can criticize.

Every reviewer must go through Rapoport's four steps — in order — before they're allowed to raise any objection. This single rule does more to filter noise than any other constraint in the skill.

4
Now you may critique.
Only after climbing the first three rungs.
↑ unlocked by completing rungs 1–3
3
What I learned
Name something this decision taught you. Genuine intellectual updating, not flattery.
2
Points of agreement
List what's right about the decision. The parts that hold up under scrutiny.
1
Re-express the decision
Restate it so accurately, so charitably, that the original author would say "yes, exactly."

If a reviewer can't climb rung 1 — can't restate the decision in a way the author would endorse — they don't understand it well enough to critique it. The steelman acts as a competency check.

It also blocks the "strawman" failure mode where a reviewer attacks a version of the plan that doesn't exist. If you've publicly restated the plan as the author understands it, you can no longer pretend it says something it doesn't.

Why this works

Forcing the reviewer to first say what's right about a decision puts them in a stance of generosity. From there, the critiques that survive feel less like reflexive contrarianism and more like genuine concerns. The author is also vastly more likely to act on findings from a reviewer who clearly understood the plan than from one who skimmed it.

The Grounding Rule

Five objection forms — banned.

Every objection must cite a file:line, a measurable fact, or a documented constraint. If it can't pass that test, it doesn't get raised. The five forms below are specifically forbidden because they masquerade as critique while smuggling in vibes.

1
"I worry that…"
Pure speculation. No grounding. The reviewer's anxiety is not evidence. If the worry is real, name the specific failure mode and cite where it's documented.
2
"What if X happens?"
Banned unless X is a documented failure mode in cited evidence. "What if the database melts?" — sure, when has it? Cite the incident, the postmortem, or the load number.
3
"I don't think this will scale."
Banned without a load number. "Scale" is a magic word that means nothing without a denominator. 100 RPS? 100k users? Cite the actual count or it's noise.
4
"Have you considered Y instead?"
If Y appears in the doc's rejected-alternatives section, this is automatically out of scope. The reviewer didn't read carefully. Re-litigating settled debates wastes everyone's time.
5
"This feels wrong."
Discarded. Feelings aren't actionable. If something is genuinely wrong, the reviewer can find the file:line where the wrongness lives. If they can't, the feeling stays in their head.

Better fewer than mixed

The charter explicitly tells reviewers: better to return 4 grounded findings than 12 mixed. Quality over quantity. A grounded finding is a tool the author can use. A speculation is a chore the author has to refute.

Severity Ladder

Four tiers, sorted from "stop everything" to "fine."

Every finding gets one of four labels. The reviewer's table must be sorted Fatal first, Bikeshed last — so if the author has 60 seconds, they read the most important findings first.

Fatal
Block. Stop.
Critical
Same-PR fix.
Important
Fold into plan.
Bikeshed
Note & ignore.

What each tier means

Fatal — Cannot proceed. Data loss, un-revertible breakage, or contradicts a project rule (CLAUDE.md, safety). Edit the doc and re-circulate before any execution.

Critical — Proceeds but leaves a known bug or violates a documented invariant. Must be fixed in the same migration / PR, not deferred.

Important — Non-blocking but should be resolved during the migration: ambiguity, missing verification step, undocumented assumption.

Bikeshed — "I'd have done it differently but it's fine." Cosmetic. Note and move on. Bikesheds aren't bad — they just shouldn't outrank substance.

Anti-Cheat Mechanics

Two clauses block the most common cheats.

Even with a good charter, reviewers find ways to dodge real critique. The skill bakes in two clauses that catch the two most common dodges.

The minimum-finding floor

Every reviewer must surface at least 3 distinct findings — even if all are Important or below. If they can't, they must explicitly write:

After grounded review of {scope},
I could not surface 3 findings at
or above the Important tier. The
decision appears robust.

Stops the "looks good, ship it" empty-calorie report.

The anti-bikeshed clause

If a reviewer's table has zero Fatal/Critical findings AND more than 5 Bikesheds, the moderator may return the report and require a second pass focused on grounded high-severity issues.

Stops Parkinson's Law of Triviality — debate over the bike shed instead of the nuclear reactor.

The full anti-pattern table

Beyond those two, the charter explicitly names seven failure modes and pairs each with a mitigation:

Anti-patternHow the charter blocks it
Suggesting rejected alternatives Reviewer must read rejected-alternatives sections first. Banned objection #4.
Inventing constraints CLAUDE.md is mandatory grounding. Self-hosted bias rule from verify-before-denying.md.
Steelmanning out of existence Minimum-finding floor (3+) or explicit exit clause.
Going beyond scope Scope is explicit by section/line. Out-of-scope goes in appendix, not table.
Priority inversion (bikeshed first) Required ordering — Fatal → Critical → Important → Bikeshed.
Groundhog Day (re-litigating) Pending-revisions log is mandatory grounding. Charter forbids re-raising tracked issues.
Bikeshedding Severity tiers + minimum-finding floor force engagement with substance.
In Practice

How you actually run one.

Here's the end-to-end flow. The skill itself is durable and reusable; you fill in the placeholders for the specific doc you're reviewing.

1. Trigger the skill

Just say what you want, in plain English:

The skill activates automatically on those phrases.

2. Spawn the reviewers

The skill produces three filled-in launch prompts — one per reviewer (A/B/C). You paste each into a fresh Claude chat. Each chat starts cold, with no context contamination. The placeholder structure looks like this:

# Adversarial Review — Reviewer A: Grounding

You are Reviewer A: Grounding pressure-test.

## Scope
You are reviewing {DOC_TO_REVIEW},
scoped to: {REVIEWER_SCOPE}

## Grounding files (READ FIRST, IN ORDER)
{GROUNDING_FILES}

## Your job
Run an adversarial review of your scope.
Read .claude/skills/adversarial-review/SKILL.md
before you start. Then:

  1. Steelman preface (Rapoport's 4 rules)
  2. Surface findings (minimum 3, or exit clause)
  3. Apply the grounding rule — every objection
     cites file:line, fact, or documented constraint
  4. Output ONE table sorted Fatal → Critical
     → Important → Bikeshed
  5. Add out-of-scope appendix

## Output
Write to: {OUTPUT_PATH}

## Time budget
~30 turns.

3. Each reviewer runs in parallel

They read their grounding files, write the steelman preface, then produce the findings table. They do not edit the original doc — they only write findings. They do not propose alternative architectures — they find risks; you choose the response.

4. The moderator integrates

This is usually you (or the original author chat). Aggregate the three findings tables, dedupe across reviewers (higher severity wins), and for each finding mark one of:

"Bare reject" is not allowed. You must say why. This forces honesty and prevents convenient dismissal.

5. Archive

Move the integrated findings table to findings-{date}.md next to the doc. Future reviews must read it as a grounding file. To re-raise a settled finding, a future reviewer must cite the original finding ID and present new evidence — same rule as ADR supersession.

The whole cycle

TriggerSpawn 3Each reviews independentlyModerator integratesDoc editedFindings archived. From start to finish, typically ~2 hours of wall-clock time, much of it in parallel.

Knowing When

When this skill earns its keep — and when it's overkill.

Situation Use this Just ask
Multi-section architecture / migration / RFC doc
Decision is hard to revert ("we execute Monday")
Doc was authored across multiple sessions (assumptions absorbed)
Doc has rejected-alternatives sections you want respected
Single-decision yes/no that fits in a paragraph
You want a feature wishlist or alternative architectures
The doc isn't written yet

The skill has overhead: spawning three chats, integrating findings, archiving. That overhead is worth it when the cost of a missed assumption is high — when execution is hard to undo, when the doc is long enough to absorb context, when settled debates are at risk of being re-opened.

For a quick "does this make sense?" check, just ask. The skill is for high-stakes review.

A Closing Thought

Why structure matters more than effort.

The intuitive way to improve a review is to try harder — be more thorough, be more skeptical, look for more flaws. But effort without structure produces mostly noise. The reviewer who tries hardest often produces the worst report — twelve speculations and no grounded findings.

What this skill encodes is a different bet: structure outperforms effort. A reviewer constrained by the steelman gate, the grounding rule, the severity ladder, and the minimum-finding floor will produce a better report in 30 turns than an unconstrained reviewer in 100. The constraints don't limit thinking — they direct it toward the kind of finding that's actually useful.

The same logic applies to the moderator. The "accept / defer / reject with reason / already pending" classification looks like bureaucracy, but it's the cheapest way to prevent the most expensive failure mode in review: the finding that gets nodded at and forgotten. If every finding must end in one of four states, none can quietly disappear.

The constraints don't limit thinking — they direct it toward the kind of finding that's actually useful.

That's the whole skill in one sentence. Everything else is plumbing.