当前位置: X-MOL 学术IEEE Trans. Inform. Forensics Secur. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Silent Guardian: Protecting Text From Malicious Exploitation by Large Language Models
IEEE Transactions on Information Forensics and Security ( IF 6.3 ) Pub Date : 2024-09-06 , DOI: 10.1109/tifs.2024.3455775
Jiawei Zhao 1 , Kejiang Chen 1 , Xiaojian Yuan 1 , Yuang Qi 1 , Weiming Zhang 1 , Nenghai Yu 1
Affiliation  

The rapid development of large language models (LLMs) has yielded impressive success in various downstream tasks. However, the vast potential and remarkable capabilities of LLMs also raise new security and privacy concerns if they are exploited for nefarious purposes due to their open-endedness. For example, LLMs may be used to plagiarize or imitate writing, thereby infringing the copyright of the original content or to create indiscriminate fake information based on a certain source text. In some cases, LLMs can even analyze text from the Internet to infer personal privacy. Unfortunately, previous text protection research could not foresee the emergence of powerful LLMs, rendering it no longer effective in this new context. To bridge this gap, we introduce Silent Guardian (SG), a text protection mechanism against LLMs, which allows LLMs to refuse to generate responses when receiving protected text, preventing the malicious use of text from the source. Specifically, we first propose the concept of Truncation Protection Examples (TPE). By carefully modifying the text to be protected, TPE can induce LLMs to first sample the end token, thus directly terminating the interaction. In addition, to efficiently construct TPE in the discrete space of text data, we propose a novel optimization algorithm called Super Tailored Protection (STP), which is not only highly efficient but also maintains the semantic consistency of the text during the optimization process. The comprehensive experimental evaluation demonstrates that SG can effectively protect the target text under various configurations and achieve almost 100% protection success rate in some cases. Notably, SG also exhibits relatively good transferability and robustness, making its application in practical scenarios possible. Our code is available at https://github.com/weiyezhimeng/Silent-Guardian .

中文翻译:


沉默的守护者:保护文本免遭大型语言模型的恶意利用



大型语言模型( LLMs )的快速发展在各种下游任务中取得了令人印象深刻的成功。然而, LLMs的巨大潜力和卓越能力,如果由于其开放性而被用于邪恶目的,也会引发新的安全和隐私问题。例如, LLMs可能会被用来抄袭或模仿写作,从而侵犯原始内容的版权或根据某些源文本随意创建虚假信息。在某些情况下, LLMs甚至可以分析来自互联网的文本以推断个人隐私。不幸的是,之前的文本保护研究无法预见强大的LLMs的出现,使其在新的背景下不再有效。为了弥补这一差距,我们引入了Silent Guardian(SG),这是一种针对LLMs文本保护机制,它允许LLMs在收到受保护的文本时拒绝生成响应,从而防止来自源头的文本的恶意使用。具体来说,我们首先提出截断保护示例(TPE)的概念。通过仔细修改要保护的文本,TPE可以诱导LLMs首先采样结束令牌,从而直接终止交互。此外,为了在文本数据的离散空间中有效地构建TPE,我们提出了一种称为超级定制保护(STP)的新型优化算法,该算法不仅效率高,而且在优化过程中保持了文本的语义一致性。综合实验评估表明,SG在各种配置下都能有效保护目标文本,在某些情况下达到几乎100%的保护成功率。 值得注意的是,SG还表现出相对良好的可移植性和鲁棒性,使其在实际场景中的应用成为可能。我们的代码可在 https://github.com/weiyezhimeng/Silent-Guardian 获取。
更新日期:2024-09-06
down
wechat
bug