Online learning in sequential Bayesian persuasion: Handling unknown priors,Artificial Intelligence

当前位置： X-MOL 学术 › Artif. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Online learning in sequential Bayesian persuasion: Handling unknown priors
Artificial Intelligence ( IF 5.1 ) Pub Date : 2024-11-06 , DOI: 10.1016/j.artint.2024.104245
Martino Bernasconi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti, Francesco Trovò

We study a repeated information design problem faced by an informed sender who tries to influence the behavior of a self-interested receiver, through the provision of payoff-relevant information. We consider settings where the receiver repeatedly faces a sequential decision making (SDM) problem. At each round, the sender observes the realizations of random events in the SDM problem, which are only partially observable by the receiver. This begets the challenge of how to incrementally disclose such information to the receiver to persuade them to follow (desirable) action recommendations. We study the case in which the sender does not know random events probabilities, and, thus, they have to gradually learn them while persuading the receiver. We start by providing a non-trivial polytopal approximation of the set of the sender's persuasive information-revelation structures. This is crucial to design efficient learning algorithms. Next, we prove a negative result which also applies to the non-sequential case: no learning algorithm can be persuasive in high probability. Thus, we relax the persuasiveness requirement, studying algorithms that guarantee that the receiver's regret in following recommendations grows sub-linearly. In the full-feedback setting—where the sender observes the realizations of all the possible random events—, we provide an algorithm with O˜(T) regret for both the sender and the receiver. Instead, in the bandit-feedback setting—where the sender only observes the realizations of random events actually occurring in the SDM problem—, we design an algorithm that, given an α∈[1/2,1] as input, guarantees O˜(Tα) and O˜(Tmax⁡{α,1−α2}) regrets, for the sender and the receiver respectively. This result is complemented by a lower bound showing that such a regret trade-off is tight for α∈[1/2,2/3].

中文翻译：

顺序贝叶斯说服中的在线学习：处理未知的先验

我们研究了知情的发送者面临的重复信息设计问题，该发送者试图通过提供与支付相关的信息来影响自利接收者的行为。我们考虑了接收者反复面临顺序决策（SDM）问题的设置。在每一轮中，发送者观察 SDM 问题中随机事件的实现，这些事件只能由接收者部分观察到。这就引出了如何逐步向接收者披露此类信息以说服他们遵循（可取的）行动建议的挑战。我们研究了发送者不知道随机事件概率的情况，因此，他们必须在说服接收者的同时逐渐学习它们。我们首先提供发送者有说服力的信息启示结构集的非平凡多面近似。这对于设计高效的学习算法至关重要。接下来，我们证明了一个也适用于非序列情况的否定结果：没有学习算法可以高概率地具有说服力。因此，我们放宽了说服力要求，研究了保证接收者对遵循建议的后悔呈亚线性增长的算法。在全反馈设置中（发送者观察所有可能的随机事件的实现），我们为发送者和接收者提供一种具有 O ̃（T）遗憾的算法。相反，在老虎机反馈设置中（发送者仅观察 SDM 问题中实际发生的随机事件的实现），我们设计了一种算法，该算法以 α∈[1/2,1] 作为输入，分别保证发送者和接收者的 O ̃（Tα）和 O ̃（Tmax{α，1−α2}）遗憾。这个结果得到了一个下限的补充，表明这种遗憾权衡对 α∈ 来说是严格的[1/2,2/3]。

更新日期：2024-11-06

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南