ICML 2026 Desk-Rejects 497 Papers Over Reviewer LLM Policy Violations

Policy1 source·Mar 19

Summary

• ICML 2026 desk-rejected ~2% of submissions due to reviewer LLM policy violations
• Novel watermarking technique detected 795 LLM-written reviews across 506 reviewers
• All flagged reviews were manually verified — no generic AI-text detectors used
• ICML takes mass enforcement action against peer-review LLM violations at unprecedented scale

Adjust signal

Details

#	Type	Key Point	Context
1	Stat	497 papers desk-rejected (~2% of all submissions)	Papers were rejected not for their own flaws but because their designated reciprocal reviewers violated the LLM usage policy those reviewers had explicitly agreed to follow. The penalty falls on authors, not reviewers, making this a significant and controversial enforcement outcome.
2	Stat	795 LLM-generated reviews detected from 506 unique reviewers	This represents roughly 1% of all reviews submitted to ICML 2026. All detections were manually verified by humans, meaning the watermarking signal was used to flag candidates, not to auto-decide outcomes.
3	New Tech	Watermarking detection: hidden LLM instructions embedded in submission PDFs	The detection method worked by watermarking submitted PDFs with concealed prompt-like instructions. When a reviewer fed the paper to an LLM to generate a review, those hidden instructions subtly influenced the LLM output in detectable ways. The technique is novel at this scale but was publicly known for most of the review period, making it relatively easy to circumvent for a motivated bad actor.
4	Policy	Two-policy framework: Policy A (no LLM) and Policy B (LLMs allowed for comprehension and polish)	Reviewers self-selected their policy. Only those who chose Policy A, or indicated willingness to follow either policy, were assigned to Policy A papers. Violations were therefore unambiguous breaches of a commitment the reviewer had personally made, not accidental rule confusion.
5	Legal	51 reviewers with >50% of reviews flagged were fully removed from the reviewer pool	All reviews by these 51 individuals were deleted. Area chairs overseeing affected papers must now recruit replacement reviewers, creating downstream delays and logistical strain on the program.
6	Insight	ICML explicitly declined to judge review quality or reviewer intent	The enforcement action is framed purely as a policy compliance issue. This framing sidesteps the thorny question of whether LLM-assisted reviews are better or worse than human-only reviews, focusing instead on the integrity of commitments made within the peer-review system.
7	Industry Update	Enforcement action at scale sets a precedent for ML conference governance	While other venues have issued policies, ICML 2026 is among the first to operationalize detection and enforce consequences at this scale. The action may pressure other conferences to develop comparable enforcement infrastructure.
8	Insight	Detected violations likely represent only a fraction of actual non-compliance	With thousands of reviewers and a watermarking method that was publicly known, the detected violations likely represent a fraction of actual non-compliance. The conference's own acknowledgment that the technique is easy to circumvent suggests the problem may be substantially larger than the numbers captured.

1.Stat

497 papers desk-rejected (~2% of all submissions)

Papers were rejected not for their own flaws but because their designated reciprocal reviewers violated the LLM usage policy those reviewers had explicitly agreed to follow. The penalty falls on authors, not reviewers, making this a significant and controversial enforcement outcome.

2.Stat

795 LLM-generated reviews detected from 506 unique reviewers

This represents roughly 1% of all reviews submitted to ICML 2026. All detections were manually verified by humans, meaning the watermarking signal was used to flag candidates, not to auto-decide outcomes.

3.New Tech

Watermarking detection: hidden LLM instructions embedded in submission PDFs

The detection method worked by watermarking submitted PDFs with concealed prompt-like instructions. When a reviewer fed the paper to an LLM to generate a review, those hidden instructions subtly influenced the LLM output in detectable ways. The technique is novel at this scale but was publicly known for most of the review period, making it relatively easy to circumvent for a motivated bad actor.

4.Policy

Two-policy framework: Policy A (no LLM) and Policy B (LLMs allowed for comprehension and polish)

Reviewers self-selected their policy. Only those who chose Policy A, or indicated willingness to follow either policy, were assigned to Policy A papers. Violations were therefore unambiguous breaches of a commitment the reviewer had personally made, not accidental rule confusion.

5.Legal

51 reviewers with >50% of reviews flagged were fully removed from the reviewer pool

All reviews by these 51 individuals were deleted. Area chairs overseeing affected papers must now recruit replacement reviewers, creating downstream delays and logistical strain on the program.

6.Insight

ICML explicitly declined to judge review quality or reviewer intent

The enforcement action is framed purely as a policy compliance issue. This framing sidesteps the thorny question of whether LLM-assisted reviews are better or worse than human-only reviews, focusing instead on the integrity of commitments made within the peer-review system.

7.Industry Update

Enforcement action at scale sets a precedent for ML conference governance

While other venues have issued policies, ICML 2026 is among the first to operationalize detection and enforce consequences at this scale. The action may pressure other conferences to develop comparable enforcement infrastructure.

8.Insight

Detected violations likely represent only a fraction of actual non-compliance

With thousands of reviewers and a watermarking method that was publicly known, the detected violations likely represent a fraction of actual non-compliance. The conference's own acknowledgment that the technique is easy to circumvent suggests the problem may be substantially larger than the numbers captured.

Stat = quantitative data point, New Tech = novel technical method, Policy = rule or governance decision, Legal = punitive or compliance action, Insight = analytical observation, Industry Update = sector-level precedent

What This Means

ICML 2026 has taken mass enforcement action on LLM use in peer review, desk-rejecting nearly 500 papers and removing over 50 reviewers whose submissions violated policies they personally agreed to. The action exposes a genuine integrity problem: reviewers who explicitly opted into a no-LLM policy used LLMs anyway at measurable scale, and the authors of the reviewed papers paid the price. The detection method — watermarking PDFs to catch LLM-generated reviews — is creative but acknowledged to be easily defeated, suggesting true non-compliance is likely higher than what was caught. As AI-generated text becomes harder to distinguish from human writing, the academic community will need more robust structural solutions than self-selected honor policies.

Sources

2% of ICML papers desk rejected because the authors used LLM in their reviewsBlog

Similar Events

Study Finds AI Writing Assistants Cause 'Blandification' — Neutral, Impersonal, Less Human

Mar 20

LLM API Router Supply Chain Attacks: Systematic Study Finds Active Exploits in the Wild

2d ago