Retro · methodology study

Zcash Orchard counterfeiting bug

Re-ran our pipeline against the 2021 introducing commit. Honest receipt.

Disclosed: 2026-05-29 · Taylor Hornby (Shielded Labs)
Patched: 2026-06-01 · halo2 commit d8e48efd
Retro target: zcash/halo2 @ cc9dd205 (2021-06-05)
Reviewer stack: claude-opus-4-7 + gpt-5

The bug

On May 29, 2026, Taylor Hornby — security engineer at Shielded Labs, working with a custom audit harness he calls zcash-full-stack-auditor paired with Claude Opus 4.8 — disclosed a critical counterfeiting vulnerability in the Zcash Orchard pool. The defect: in the variable-base scalar-multiplication gadget (ecc::chip::mul), the per-row base coordinates (x_p, y_p) were written with assign_advice instead of copy_advice. The constancy gate (q_mul_2) kept the base constant across rows, but never anchored it to the actual base argument. A prover could supply B' ≠ base and the proof still verified — producing [scalar] B' instead of [scalar] base. Unlimited undetectable counterfeit ZEC.

The defect was present from Orchard activation in May 2022 until the emergency fix on June 1, 2026. Four years.

The question

Would AntFleet's two-model unanimous review — Claude Opus 4.7 + GPT-5, generalist prompt, the same gate that runs on every PR — have caught Taylor's bug at the original 2021 introducing commit?

And how much closer does a thin halo2-specialist prompt wrapper get us, without a custom audit harness or a model upgrade?

Methodology

Four runs, two prompts × two reviewer stacks. All four had the same files, same providers, same temperature, same gate. The retro target is the original PR that introduced the gadget — cc9dd205 on zcash/halo2, June 5, 2021. Both bug surfaces are present in that commit.

Generalist prompt — the unmodified spike prompt used in production AntFleet review. Asks for findings in eight standard categories (correctness, security, concurrency, etc.). No halo2-specific lens.
Specialist prompt — the generalist prompt prefixed with a ~50-line halo2 context block describing five circuit-soundness defect classes (under-constrained advice cells, constraint-scope gaps, missing chip-boundary anchoring, lookup omission, conditional-gate fallback). Names the class of defect; does not name the specific bug, file, or symbol.

Both runs were blind: the prompt label was scrubbed of any reference to the bug, mechanism, or disclosure date. The repo name (zcash/halo2) and the commit SHA are visible — those are what a real reviewer would see.

Results

	Opus 4.7	GPT-5
Generalist (production)	PARTIAL Missed Taylor's specific defect. Flagged : “”. Adjacent counterfeit-class bug in the same gadget, different mechanism.	PARTIAL Missed Taylor's specific defect. Flagged CRITICAL: “Variable-base mul does not constrain the input scalar to the decomposed bits (q_mul_decompose_var never enabled; scalar cell never used)”. A different counterfeit-class soundness bug in the same code.
+ halo2 specialist wrapper (blind)	PARTIAL CRITICAL: “Scalar input to variable-base scalar multiplication is never copy-constrained into the recovery row”. Caught the bug class but localized to `process_lsb` instead of the main `double_and_add` loop. Recall lift over generalist, wrong location.	CLASS HIT “Base point argument is not copy-constrained into x_p/y_p columns (missing anchoring across regions).” Named the missing copy-constraint mechanism on the base coordinates — the defect class Taylor's bug exploits. Mechanism and fix direction align with the June 1 commit; not a direct pin of the `copy_advice` call site.

The class hit, verbatim

GPT-5, specialist wrapper, blind. No mention of the bug, the disclosure date, or Taylor in the prompt.

Base point argument is not copy-constrained into x_p/y_p columns (missing anchoring across regions) (critical)
“The base point passed into mul is a pair of advice cells from another region (witness_point). Throughout incomplete and complete addition, the code repeatedly assigns x_p/y_p from base.x.value()/base.y.value() using assign_advice, but never copy-constrains those cells to the original base cells via the permutation argument. The only enforced property is that x_p,y_p remain constant across rows (x_p_cur - x_p_next = 0, etc.), which does not tie them to the input base. A malicious prover can therefore choose an internal base Q unrelated to the provided base B, and satisfy all internal gates; the resulting point is then [k]Q, not [k]B, but the proof still verifies. This is a classic missing chip-boundary anchoring defect.”
Recommendation: - In the first row where x_p/y_p are assigned for incomplete and complete addition, use copy(...) to copy-constrain base.x and base.y into the x_p and y_p columns, respectively, and then enforce constancy across rows as currently done. - Alternatively, add explicit region.constrain_equal calls tying the initial x_p/y_p cells to base.x()/base.y(). - Audit all occurrences of assigning base coordinates (including process_lsb and complete addition) to ensure at least one row copy-constrains from the provided base cells in each sub-region.

Compare to the actual fix commit d8e48efd (Daira-Emma Hopwood, 2026-05-31):

“The coordinates were written with assign_advice, and the constancy chain reached neither the doubling-row nor the complete-addition base anchors. A prover could therefore run the incomplete loop against a free constant B' ≠ base... Anchor the base by copy_advice-ing it into the first incomplete row.”

What we claim

Production AntFleet probably would not have pinned Taylor's specific bug at the 2021 PR. Both providers, generalist prompt, blind: neither identified the missing copy_advice on the base coordinates in the main loop. Both surfaced adjacent counterfeit-class defects in the same file.
A 50-line halo2 specialist wrapper, blind, puts the right defect class on one reviewer's radar. GPT-5 with the wrapper names the missing copy-constraint mechanism and recommends the fix direction that matches the June 2026 patch — not a direct pin of the copy_advice call site, but the class, exploit path, and fix all point at the right thing. No model upgrade, no custom harness, no agentic loop. Opus 4.7 with the same wrapper also catches the bug class but localizes to process_lsb rather than the main loop — partial recall lift.
The unanimous AND-gate is the right knob for PR-time noise control and the wrong knob for deep targeted audit. A finding only one reviewer surfaces is dropped at the gate, even when both flagged real soundness bugs in the same gadget. Specialist reviewer AntFleet will be building next.

What we do NOT claim

Parity with Taylor's audit harness. Taylor used Opus 4.8 (released the day before discovery), an agentic loop, multiple targeted prompts, and full Zcash protocol context. AntFleet is a continuous diff-time generalist gate. Those are different products solving different problems; this study does not collapse that distinction.
That AntFleet would have prevented the four-year window.The 2021 reviewers were the world's best halo2 cryptographers and they missed this. Saying “a 2021-deployed AntFleet would have stopped it” requires counterfactuals we can't honestly defend — the model tier in 2021 was different, the install pattern would have been different, and a HIGH finding on a soundness flag still depends on a human acting on it.
That the specialist wrapper is general-purpose. It was written knowing what defect class we wanted measured. Fairer test: run the wrapper against halo2 commits known to not contain under-constraint bugs and measure false-positive rate. We have not done that yet.

Caveats

n=1 per cell.Provider non-determinism could flip individual findings between runs. Re-running the blind specialist 3–5× would tell us whether GPT-5's direct hit is stable.
Single commit.We picked the commit that introduced the gadget. Whether the gate would surface the bug on a random subsequent refactor PR that doesn't touch the buggy lines is a different question.
Specialist prompt is post-hoc. The five defect classes named in the wrapper were chosen knowing what bug class was present. The block is bug-class-fair, not bug-fair.
Opus 4.7, not 4.8. Taylor used 4.8. Re-running with 4.8 on both prompts is the obvious next step.
A first run was contaminated.The original baseline label contained “under-constrained base (disclosed 2026-05-29)” — that string landed in the prompt's feature title and both providers nailed the bug, with Anthropic's reasoning explicitly citing the public disclosure. Discarded; blind re-runs above are the actual data.

Sources & artifacts

Shielded Labs: The Orchard Counterfeiting Vulnerability and Next Steps — Wilcox, McGee, Hornby
Fix commit d8e48efd on zcash/halo2 — “Anchor variable-base scalar-mul incomplete-addition base”
Introducing commit cc9dd205 on zcash/halo2 — “chip::mul.rs: Implement variable-base scalar mul instruction.” (2021-06-05, therealyingtong)
Blind baseline evidence: data/retro/zcash-orchard-counterfeit-2026-05.blind-baseline.json — prompt SHA 9294adda4b11cd8c…
Blind specialist evidence: data/retro/zcash-orchard-counterfeit-2026-05.blind-specialist.json — prompt SHA e068c5e2ef03d444…

Scanned 2026-06-06 · claude-opus-4-7 + gpt-5 · unanimous-gate pipeline, same providers as production