You're 90% sure it's right. The 10% you missed is what this skill is for — a structured way to send a fresh pair of eyes at your decision before it's too late to undo.
When you write a multi-section plan over multiple sessions — building up context, making trade-offs, rejecting alternatives — your brain absorbs the constraints invisibly. By the time you finish, you can't see them anymore. They feel like reality.
This is the same reason you can't proofread your own writing. The author of a sentence already knows what it means, so their eye glides over the missing word. A fresh reader stops cold at it.
The fix sounds easy: just ask someone else to review it.
But here's where it gets messy. Most "fresh eyes" reviews fail in predictable ways:
So this skill is essentially a structured charter for fresh-eyes review — designed to filter out the five common failure modes above and force the reviewer to deliver grounded, prioritized findings the author can actually act on.
A pilot can fly the plane they designed. But before takeoff, they still walk around the aircraft with a checklist — touching every surface, looking for the thing their familiarity made invisible.
That's adversarial review. The reviewer isn't smarter than you. They just bring fresh attention plus a checklist of failure modes that experience has shown are easy to miss. They're allowed to question anything — but only with grounded evidence, not vibes.
This skill isn't invented from scratch. It borrows the strongest ideas from four older review traditions and fuses them into one method. Here's where each piece comes from.
Imagine the project has already failed. Now work backwards: why did it fail? The cognitive trick is "prospective hindsight" — your brain finds flaws more easily when it assumes failure than when it forecasts risk.
Before you criticize an idea, restate it so well its author would say "yes, exactly." Then list what you agree with. Only then are you allowed to attack. Forces real understanding before fault-finding.
Architecture Decision Records are immutable. You don't edit a decision after it's made — you supersede it with a new decision that cites the old one. This stops endless re-litigation of settled debates.
Adopt the attacker's mindset. Don't ask "will this work?" — ask "how would I break this?" Borrowed from security research: focus on attack surfaces, edge cases, and the failure modes that survive a happy-path test.
Pre-mortem gives the cognitive frame ("imagine it failed"). Steelman makes the critique credible. ADR review prevents Groundhog Day debates. Red team focuses attention on real attack surfaces. None alone is enough; together they cover each other's blind spots.
For any non-trivial doc (more than ~150 lines), the skill spawns three independent reviewers in fresh chats. Each has a distinct lens. A moderator chat — usually the original author — integrates the three reports.
The reason for three is practical: a single reviewer reading a 300-line doc loses attention. Three reviewers with smaller scopes stay sharp. Their findings naturally overlap on the most important issues — that overlap is the strongest signal you have.
If the doc is shorter (under ~150 lines) or single-topic, the skill collapses to one reviewer with all three lenses in their charter.
Every reviewer must go through Rapoport's four steps — in order — before they're allowed to raise any objection. This single rule does more to filter noise than any other constraint in the skill.
If a reviewer can't climb rung 1 — can't restate the decision in a way the author would endorse — they don't understand it well enough to critique it. The steelman acts as a competency check.
It also blocks the "strawman" failure mode where a reviewer attacks a version of the plan that doesn't exist. If you've publicly restated the plan as the author understands it, you can no longer pretend it says something it doesn't.
Forcing the reviewer to first say what's right about a decision puts them in a stance of generosity. From there, the critiques that survive feel less like reflexive contrarianism and more like genuine concerns. The author is also vastly more likely to act on findings from a reviewer who clearly understood the plan than from one who skimmed it.
Every objection must cite a file:line, a measurable fact, or a documented constraint. If it can't pass that test, it doesn't get raised. The five forms below are specifically forbidden because they masquerade as critique while smuggling in vibes.
The charter explicitly tells reviewers: better to return 4 grounded findings than 12 mixed. Quality over quantity. A grounded finding is a tool the author can use. A speculation is a chore the author has to refute.
Every finding gets one of four labels. The reviewer's table must be sorted Fatal first, Bikeshed last — so if the author has 60 seconds, they read the most important findings first.
Fatal — Cannot proceed. Data loss, un-revertible breakage, or contradicts a project rule (CLAUDE.md, safety). Edit the doc and re-circulate before any execution.
Critical — Proceeds but leaves a known bug or violates a documented invariant. Must be fixed in the same migration / PR, not deferred.
Important — Non-blocking but should be resolved during the migration: ambiguity, missing verification step, undocumented assumption.
Bikeshed — "I'd have done it differently but it's fine." Cosmetic. Note and move on. Bikesheds aren't bad — they just shouldn't outrank substance.
Even with a good charter, reviewers find ways to dodge real critique. The skill bakes in two clauses that catch the two most common dodges.
Every reviewer must surface at least 3 distinct findings — even if all are Important or below. If they can't, they must explicitly write:
After grounded review of {scope},
I could not surface 3 findings at
or above the Important tier. The
decision appears robust.
Stops the "looks good, ship it" empty-calorie report.
If a reviewer's table has zero Fatal/Critical findings AND more than 5 Bikesheds, the moderator may return the report and require a second pass focused on grounded high-severity issues.
Stops Parkinson's Law of Triviality — debate over the bike shed instead of the nuclear reactor.
Beyond those two, the charter explicitly names seven failure modes and pairs each with a mitigation:
| Anti-pattern | How the charter blocks it |
|---|---|
| Suggesting rejected alternatives | Reviewer must read rejected-alternatives sections first. Banned objection #4. |
| Inventing constraints | CLAUDE.md is mandatory grounding. Self-hosted bias rule from verify-before-denying.md. |
| Steelmanning out of existence | Minimum-finding floor (3+) or explicit exit clause. |
| Going beyond scope | Scope is explicit by section/line. Out-of-scope goes in appendix, not table. |
| Priority inversion (bikeshed first) | Required ordering — Fatal → Critical → Important → Bikeshed. |
| Groundhog Day (re-litigating) | Pending-revisions log is mandatory grounding. Charter forbids re-raising tracked issues. |
| Bikeshedding | Severity tiers + minimum-finding floor force engagement with substance. |
Here's the end-to-end flow. The skill itself is durable and reusable; you fill in the placeholders for the specific doc you're reviewing.
Just say what you want, in plain English:
The skill activates automatically on those phrases.
The skill produces three filled-in launch prompts — one per reviewer (A/B/C). You paste each into a fresh Claude chat. Each chat starts cold, with no context contamination. The placeholder structure looks like this:
# Adversarial Review — Reviewer A: Grounding You are Reviewer A: Grounding pressure-test. ## Scope You are reviewing {DOC_TO_REVIEW}, scoped to: {REVIEWER_SCOPE} ## Grounding files (READ FIRST, IN ORDER) {GROUNDING_FILES} ## Your job Run an adversarial review of your scope. Read .claude/skills/adversarial-review/SKILL.md before you start. Then: 1. Steelman preface (Rapoport's 4 rules) 2. Surface findings (minimum 3, or exit clause) 3. Apply the grounding rule — every objection cites file:line, fact, or documented constraint 4. Output ONE table sorted Fatal → Critical → Important → Bikeshed 5. Add out-of-scope appendix ## Output Write to: {OUTPUT_PATH} ## Time budget ~30 turns.
They read their grounding files, write the steelman preface, then produce the findings table. They do not edit the original doc — they only write findings. They do not propose alternative architectures — they find risks; you choose the response.
This is usually you (or the original author chat). Aggregate the three findings tables, dedupe across reviewers (higher severity wins), and for each finding mark one of:
pending-revisions.md"Bare reject" is not allowed. You must say why. This forces honesty and prevents convenient dismissal.
Move the integrated findings table to findings-{date}.md next to the doc. Future reviews must read it as a grounding file. To re-raise a settled finding, a future reviewer must cite the original finding ID and present new evidence — same rule as ADR supersession.
Trigger → Spawn 3 → Each reviews independently → Moderator integrates → Doc edited → Findings archived. From start to finish, typically ~2 hours of wall-clock time, much of it in parallel.
| Situation | Use this | Just ask |
|---|---|---|
| Multi-section architecture / migration / RFC doc | ✓ | |
| Decision is hard to revert ("we execute Monday") | ✓ | |
| Doc was authored across multiple sessions (assumptions absorbed) | ✓ | |
| Doc has rejected-alternatives sections you want respected | ✓ | |
| Single-decision yes/no that fits in a paragraph | ✓ | |
| You want a feature wishlist or alternative architectures | ✓ | |
| The doc isn't written yet | ✓ |
The skill has overhead: spawning three chats, integrating findings, archiving. That overhead is worth it when the cost of a missed assumption is high — when execution is hard to undo, when the doc is long enough to absorb context, when settled debates are at risk of being re-opened.
For a quick "does this make sense?" check, just ask. The skill is for high-stakes review.
The intuitive way to improve a review is to try harder — be more thorough, be more skeptical, look for more flaws. But effort without structure produces mostly noise. The reviewer who tries hardest often produces the worst report — twelve speculations and no grounded findings.
What this skill encodes is a different bet: structure outperforms effort. A reviewer constrained by the steelman gate, the grounding rule, the severity ladder, and the minimum-finding floor will produce a better report in 30 turns than an unconstrained reviewer in 100. The constraints don't limit thinking — they direct it toward the kind of finding that's actually useful.
The same logic applies to the moderator. The "accept / defer / reject with reason / already pending" classification looks like bureaucracy, but it's the cheapest way to prevent the most expensive failure mode in review: the finding that gets nodded at and forgotten. If every finding must end in one of four states, none can quietly disappear.
That's the whole skill in one sentence. Everything else is plumbing.