Proactive Privacy Amnesia for Large Language Models: Safeguarding PII with Negligible Impact on Model Utility
[ICLR'25 Poster]
Martin Kuo ,Jingyang Zhang ,Jianyi Zhang , Minxue Tang , Louis DiValentin , Aolin Ding , Jingwei Sun , William Chen , Amin Hass , Tianlong Chen , Yiran Chen , Hai Li
1*
5
1*
3
1
4
1
1
1
3
2
1
*Equal Contribution
1
2
Center for Computational Evolutionary Intelligence, Duke University Center for Advanced AI, Accenture
Accenture University of North Carolina at Chapel Hill Cary Academy
4
5
3
Figure 1: The flowchart illustrates our method, Proactive Privacy Amnesia (PPA). All examples presented in the flowchart are real instances from the LLaMA2-7b experiments.
With the rise of large language models (LLMs), increasing research has recognized their risk of leaking personally identifiable information (PII) under malicious attacks. Although efforts have been made to protect PII in LLMs, existing methods struggle to balance privacy protection with maintaining model utility. In this paper, inspired by studies of amnesia in cognitive science, we propose a novel approach, Proactive Privacy Amnesia (PPA), to safeguard PII in LLMs while preserving their utility. This mechanism works by actively identifying and forgetting key memories most closely associated with PII in sequences, followed by a memory implanting using suitable substitute memories to maintain the LLM’s functionality. We conduct evaluations across multiple models to protect common PII, such as phone numbers and physical addresses, against prevalent PII-targeted attacks, demonstrating the superiority of our method compared with other existing defensive techniques. The results show that our PPA method completely eliminates the risk of phone number exposure by 100% and significantly reduces the risk of physical address exposure by 9.8%– 87.6%, all while maintaining comparable model utility performance.
Our contributions are as follows:
● We propose a novel method PPA that can preserve a person’s PII on LLMs while maintaining LLMs’
performance.
● We conducted input rephrasing, probing, and soft prompt attacks to evaluate the effectiveness of our
PPA approach. The PPA effectively safeguards phone numbers and physical addresses, with only a
marginal drop in LLMs’ performance.
● We introduce the concept of the ’memorization factor’ and use it to identify the key elements within PII
sequences that influence the model’s ability to retain such information. This approach is using in
sensitivity analysis and supported by theoretical justification.
● PPA is a flexible method that enables adjusting the balance between defense capability and model
performance by modifying the number of key elements to be forgotten.
Table 2: Comparative Analysis of Phone Number Defense Strategies Against Various Attacks in
Enron Email Experiment. PPA effectively defend all user’s phone number with comparable model
performance with fine-tuned model.
Table 3: Comparative Analysis of Physical Address Defense Strategies Against Various Attacks
in Enron Email Experiment. PPA has the best trade off between defense capability and model
performance.
BibTex
@inproceedings{
kuo2025proactive,
title={Proactive Privacy Amnesia for Large Language Models: Safeguarding {PII} with Negligible Impact on Model Utility},
author={Martin Kuo and Jingyang Zhang and Jianyi Zhang and Minxue Tang and Louis DiValentin and Aolin Ding and Jingwei Sun and William Chen
and Amin Hass and Tianlong Chen and Yiran Chen and Hai Li},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=io8uRPYktn}
}
This website is licensed under a This website is licensed under a MIT License.