Disagreement, AI alignment, and bargaining,Philosophical Studies

当前位置： X-MOL 学术 › Philosophical Studies › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Disagreement, AI alignment, and bargaining
Philosophical Studies ( IF 1.1 ) Pub Date : 2024-11-18 , DOI: 10.1007/s11098-024-02224-5
Harry R. Lloyd

New AI technologies have the potential to cause unintended harms in diverse domains including warfare, judicial sentencing, medicine and governance. One strategy for realising the benefits of AI whilst avoiding its potential dangers is to ensure that new AIs are properly ‘aligned’ with some form of ‘alignment target.’ One danger of this strategy is that–dependent on the alignment target chosen–our AIs might optimise for objectives that reflect the values only of a certain subset of society, and that do not take into account alternative views about what constitutes desirable and safe behaviour for AI agents. In response to this problem, several AI ethicists have suggested alignment targets that are designed to be sensitive to widespread normative disagreement amongst the relevant stakeholders. Authors inspired by voting theory have suggested that AIs should be aligned with the verdicts of actual or simulated ‘moral parliaments’ whose members represent the normative views of the relevant stakeholders. Other authors inspired by decision theory and the philosophical literature on moral uncertainty have suggested that AIs should maximise socially expected choiceworthiness. In this paper, I argue that both of these proposals face several important problems. In particular, they fail to select attractive ‘compromise options’ in cases where such options are available. I go on to propose and defend an alternative, bargaining-theoretic alignment target, which avoids the problems associated with the voting- and decision-theoretic approaches.

中文翻译：

分歧、AI 对齐和讨价还价

新的 AI 技术有可能在战争、司法量刑、医学和治理等不同领域造成意外伤害。在避免其潜在危险的同时实现 AI 优势的一种策略是确保新的 AI 与某种形式的“对齐目标”正确“对齐”。这种策略的一个危险是，根据所选的对齐目标，我们的 AI 可能会针对仅反映社会特定子集价值观的目标进行优化，并且不考虑关于什么是 AI 代理的理想和安全行为的其他观点。为了解决这个问题，一些 AI 伦理学家提出了对齐目标，这些目标旨在对相关利益相关者之间广泛的规范分歧敏感。受投票理论启发的作者建议，人工智能应该与实际或模拟的“道德议会”的裁决保持一致，这些议会的成员代表了相关利益相关者的规范性观点。其他受决策理论和道德不确定性哲学文献启发的作者建议，人工智能应该最大限度地提高社会期望的选择价值。在本文中，我认为这两个提案都面临几个重要问题。特别是，在有有吸引力的“折衷方案”的情况下，他们没有选择有吸引力的“折衷方案”。我继续提出并捍卫一个替代的、讨价还价理论的对齐目标，它避免了与投票和决策理论方法相关的问题。

更新日期：2024-11-18

点击分享查看原文

点击收藏

阅读更多本刊新发论文