A Better Newspaper

## Finetuning-Induced Verbatim Content Recall in LLMs ### Overview Narrowly finetuned language models reportedly memorize and can reproduce implanted content verbatim, creating audit, liability, and intellectual property risks for deploying organizations (arXiv:2605.25902, 2025). The challenge is that detecting what a deployed model has been taught — without access to its weights or training data — has until recently been an open problem. ### Technical Developments Contrastive Decoding Diffing (CDD) is a proposed 'model diffing' technique that compares outputs of a base model and a finetuned model to recover content encoded during finetuning (arXiv:2605.25902). Unlike prior approaches such as the Activation Difference Lens (ADL), CDD operates in a 'black-box' or near-black-box manner — without requiring access to model internals — making it applicable to auditing externally deployed models. ### Strategic and Legal Implications - **Copyright liability**: If finetuned models memorize and reproduce copyrighted training content, deploying organizations may face infringement exposure under *Kadrey v. Meta* and related precedents. - **Trade secret risk**: Proprietary documents used in enterprise finetuning could be reconstructed by adversaries with API access if CDD-style techniques are widely deployed. - **Third-party model auditing**: Legal counsel and compliance teams may use model diffing to audit vendor-supplied finetuned models for undisclosed training content. - **Regulatory disclosure**: Emerging AI transparency requirements may require disclosure of finetuning data sources; CDD-style tools could be used to verify or challenge such disclosures. ### Connection to Backdoor Risks Finetuning memorization also intersects with training-time security: adversaries could implant content or behaviors during finetuning that are later recoverable or triggerable (arXiv:2605.19262). The same audit techniques that detect memorization may also detect backdoors. ### Status CDD is at the research stage as of mid-2025. Commercial audit tools based on model diffing have not yet been reported.