Journal of Combinatorial Optimization ( IF 0.9 ) Pub Date : 2024-12-04 , DOI: 10.1007/s10878-024-01240-9 Xin Sun, Tiande Guo, Congying Han, Hongyang Zhang
In this paper, we theoretically study the Combinatorial Multi-Armed Bandit problem with stochastic monotone k-submodular reward function under full-bandit feedback. In this setting, the decision-maker is allowed to select a super arm composed of multiple base arms in each round and then receives its k-submodular reward. The k-submodularity enriches the application scenarios of the problem we consider in contexts characterized by diverse options. We present two simple greedy algorithms for two budget constraints (total size and individual size) and provide the theoretical analysis for upper bound of the regret value. For the total size budget, the proposed algorithm achieves a \(\frac{1}{2}\)-regret upper bound by \(\tilde{\mathcal {O}}\left( T^\frac{2}{3}(kn)^{\frac{1}{3}}B\right) \) where T is the time horizon, n is the number of base arms and B denotes the budget. For the individual size budget, the proposed algorithm achieves a \(\frac{1}{3}\)-regret with the same upper bound. Moreover, we conduct numerical experiments on these two algorithms to empirically demonstrate the effectiveness.
中文翻译:
全老虎机反馈下随机单调 k-submod 最大化的贪婪算法
在本文中,我们从理论上研究了全老虎机反馈下具有随机单调 k-submodular 奖励函数的组合多臂老虎机问题。在这种设置中,决策者可以在每一轮中选择一个由多个基础武器组成的超级手臂,然后获得其 k-submodular 奖励。k-submodularity 丰富了我们在以多种选择为特征的环境中考虑的问题的应用场景。我们针对两个预算约束(总规模和个人规模)提出了两种简单的贪婪算法,并提供了遗憾值上限的理论分析。对于总大小预算,所提出的算法实现了 \(\frac{1}{2}\) -遗憾上限 \(\tilde{\mathcal {O}}\left( T^\frac{2}{3}(kn)^{\frac{1}{3}}B\right) \),其中 T 是时间范围,n 是基础武器的数量,B 表示预算。对于单个大小预算,所提出的算法实现了具有相同上限的 \(\frac{1}{3}\) -regret。此外,我们对这两种算法进行了数值实验,以实证证明其有效性。