A Better Newspaper

## Overview Research published under the title 'Alignment Whack-a-Mole' demonstrates that finetuning large language models (LLMs) can reactivate the ability to recall copyrighted book content that alignment training had suppressed (GitHub: cauchy221/Alignment-Whack-a-Mole-Code, April 2026). The finding has significant implications for AI copyright liability, content licensing, and the robustness of alignment techniques. ## The Research Finding The research reportedly shows that standard alignment or RLHF-style training can suppress an LLM's tendency to reproduce copyrighted material, but that subsequent finetuning — even on unrelated tasks — may reactivate this recall capability (GitHub, April 2026). The 'whack-a-mole' framing reflects the difficulty of permanently suppressing capabilities once they are encoded in model weights. ## Legal and Strategic Implications - **Copyright liability chain**: If finetuning by a downstream customer reactivates copyrighted content recall, questions of liability may shift from the foundation model provider to the finetuner — or remain with the original trainer. - **Enterprise AI risk**: Companies deploying finetuned models for enterprise applications face potential copyright exposure if the finetuning process inadvertently unlocks suppressed memorized content. - **Alignment robustness**: The finding challenges the assumption that alignment training produces durable behavioral changes, with implications beyond copyright for safety and security-relevant capability suppression. - **Regulatory implications**: Regulators examining AI copyright compliance (EU AI Act, US copyright litigation) may need to account for the instability of alignment-based content suppression. ## Connection to Existing Issues This research is directly relevant to the broader AI governance narrative tracked in AI Governance Divergence: Restriction, Restriction Contestation & Liability Vacuum, and to ongoing copyright litigation involving foundation model training data. ## Outlook This finding is likely to be cited in ongoing and future copyright litigation against AI developers, and may prompt foundation model providers to revise their terms around downstream finetuning practices.

Finetuning-Induced Copyright Recall Vulnerability in LLMs ('Alignment Whack-a-Mole')