ICML 2026 Desk-Rejects 497 Papers Over Reviewer LLM Policy Violations
Summary
- • ICML 2026 desk-rejected ~2% of submissions due to reviewer LLM policy violations
- • Novel watermarking technique detected 795 LLM-written reviews across 506 reviewers
- • All flagged reviews were manually verified — no generic AI-text detectors used
- • ICML takes mass enforcement action against peer-review LLM violations at unprecedented scale
Details
497 papers desk-rejected (~2% of all submissions)
Papers were rejected not for their own flaws but because their designated reciprocal reviewers violated the LLM usage policy those reviewers had explicitly agreed to follow. The penalty falls on authors, not reviewers, making this a significant and controversial enforcement outcome.
795 LLM-generated reviews detected from 506 unique reviewers
This represents roughly 1% of all reviews submitted to ICML 2026. All detections were manually verified by humans, meaning the watermarking signal was used to flag candidates, not to auto-decide outcomes.
Watermarking detection: hidden LLM instructions embedded in submission PDFs
The detection method worked by watermarking submitted PDFs with concealed prompt-like instructions. When a reviewer fed the paper to an LLM to generate a review, those hidden instructions subtly influenced the LLM output in detectable ways. The technique is novel at this scale but was publicly known for most of the review period, making it relatively easy to circumvent for a motivated bad actor.
Two-policy framework: Policy A (no LLM) and Policy B (LLMs allowed for comprehension and polish)
Reviewers self-selected their policy. Only those who chose Policy A, or indicated willingness to follow either policy, were assigned to Policy A papers. Violations were therefore unambiguous breaches of a commitment the reviewer had personally made, not accidental rule confusion.
51 reviewers with >50% of reviews flagged were fully removed from the reviewer pool
All reviews by these 51 individuals were deleted. Area chairs overseeing affected papers must now recruit replacement reviewers, creating downstream delays and logistical strain on the program.
ICML explicitly declined to judge review quality or reviewer intent
The enforcement action is framed purely as a policy compliance issue. This framing sidesteps the thorny question of whether LLM-assisted reviews are better or worse than human-only reviews, focusing instead on the integrity of commitments made within the peer-review system.
Enforcement action at scale sets a precedent for ML conference governance
While other venues have issued policies, ICML 2026 is among the first to operationalize detection and enforce consequences at this scale. The action may pressure other conferences to develop comparable enforcement infrastructure.
Detected violations likely represent only a fraction of actual non-compliance
With thousands of reviewers and a watermarking method that was publicly known, the detected violations likely represent a fraction of actual non-compliance. The conference's own acknowledgment that the technique is easy to circumvent suggests the problem may be substantially larger than the numbers captured.
Stat = quantitative data point, New Tech = novel technical method, Policy = rule or governance decision, Legal = punitive or compliance action, Insight = analytical observation, Industry Update = sector-level precedent
What This Means
ICML 2026 has taken mass enforcement action on LLM use in peer review, desk-rejecting nearly 500 papers and removing over 50 reviewers whose submissions violated policies they personally agreed to. The action exposes a genuine integrity problem: reviewers who explicitly opted into a no-LLM policy used LLMs anyway at measurable scale, and the authors of the reviewed papers paid the price. The detection method — watermarking PDFs to catch LLM-generated reviews — is creative but acknowledged to be easily defeated, suggesting true non-compliance is likely higher than what was caught. As AI-generated text becomes harder to distinguish from human writing, the academic community will need more robust structural solutions than self-selected honor policies.
