Character-level White-Box Adversarial Attacks against Transformers via Attachable Subwords Substitution,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Character-level White-Box Adversarial Attacks against Transformers via Attachable Subwords Substitution
arXiv - CS - Computation and Language Pub Date : 2022-10-31 , DOI: arxiv-2210.17004
Aiwei Liu, Honghai Yu, Xuming Hu, Shu'ang Li, Li Lin, Fukun Ma, Yawen Yang, Lijie Wen

We propose the first character-level white-box adversarial attack method against transformer models. The intuition of our method comes from the observation that words are split into subtokens before being fed into the transformer models and the substitution between two close subtokens has a similar effect to the character modification. Our method mainly contains three steps. First, a gradient-based method is adopted to find the most vulnerable words in the sentence. Then we split the selected words into subtokens to replace the origin tokenization result from the transformer tokenizer. Finally, we utilize an adversarial loss to guide the substitution of attachable subtokens in which the Gumbel-softmax trick is introduced to ensure gradient propagation. Meanwhile, we introduce the visual and length constraint in the optimization process to achieve minimum character modifications. Extensive experiments on both sentence-level and token-level tasks demonstrate that our method could outperform the previous attack methods in terms of success rate and edit distance. Furthermore, human evaluation verifies our adversarial examples could preserve their origin labels.

中文翻译：

通过可附加子词替换对变形金刚进行字符级白盒对抗攻击

我们提出了第一个针对 Transformer 模型的字符级白盒对抗攻击方法。我们方法的直觉来自观察到单词在被输入到转换器模型之前被分成子标记，并且两个接近的子标记之间的替换与字符修改具有类似的效果。我们的方法主要包含三个步骤。首先，采用基于梯度的方法来寻找句子中最脆弱的词。然后我们将选定的单词拆分为子标记，以替换来自转换器标记器的原始标记化结果。最后，我们利用对抗性损失来指导可附加子令牌的替换，其中引入了 Gumbel-softmax 技巧以确保梯度传播。同时，我们在优化过程中引入了视觉和长度约束，以实现最小的字符修改。在句子级和令牌级任务上的大量实验表明，我们的方法在成功率和编辑距离方面可以胜过以前的攻击方法。此外，人工评估验证了我们的对抗性示例可以保留其来源标签。

更新日期：2022-11-01

点击分享查看原文

点击收藏

阅读更多本刊新发论文