当前位置: X-MOL 学术Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automatically designing counterfactual regret minimization algorithms for solving imperfect-information games
Artificial Intelligence ( IF 5.1 ) Pub Date : 2024-10-11 , DOI: 10.1016/j.artint.2024.104232
Kai Li, Hang Xu, Haobo Fu, Qiang Fu, Junliang Xing

Strategic decision-making in imperfect-information games is an important problem in artificial intelligence. Counterfactual regret minimization (CFR), a family of iterative algorithms, has been the workhorse for solving these types of games since its inception. In recent years, a series of novel CFR variants have been proposed, significantly improving the convergence rate of vanilla CFR. However, most of these new variants are hand-designed by researchers through trial and error, often based on different motivations, which generally requires a tremendous amount of effort and insight. This work proposes AutoCFR, a systematic framework that meta-learns novel CFR algorithms through evolution, easing the burden of manual algorithm design. We first design a search language that is rich enough to represent various CFR variants. We then exploit a scalable regularized evolution algorithm with a set of acceleration techniques to efficiently search over the combinatorial space of algorithms defined by this language. The learned novel CFR algorithm can generalize to new imperfect-information games not seen during training and performs on par with or better than existing state-of-the-art CFR variants. In addition to superior empirical performance, we also theoretically show that the learned algorithm converges to an approximate Nash equilibrium. Extensive experiments across diverse imperfect-information games highlight the scalability, extensibility, and generalizability of AutoCFR, establishing it as a general-purpose framework for solving imperfect-information games.

中文翻译:


自动设计反事实后悔最小化算法以解决不完美信息博弈



不完美信息博弈中的战略决策是人工智能中的一个重要问题。反事实遗憾最小化 (CFR) 是一系列迭代算法,自问世以来一直是解决此类游戏的主力军。近年来,提出了一系列新型 CFR 变体,显著提高了香草 CFR 的收敛速率。然而,这些新变体中的大多数都是研究人员通过反复试验手工设计的,通常基于不同的动机,这通常需要大量的努力和洞察力。这项工作提出了 AutoCFR,这是一个系统框架,通过进化对新的 CFR 算法进行元学习,减轻了手动算法设计的负担。我们首先设计一种搜索语言,该语言足够丰富,可以表示各种 CFR 变体。然后,我们利用一个可扩展的正则化进化算法和一组加速技术来有效地搜索该语言定义的算法的组合空间。学习到的新型 CFR 算法可以推广到训练期间未见过的新的不完美信息博弈,并且性能与现有的最先进的 CFR 变体相当或更好。除了卓越的实证性能外,我们还从理论上表明,学习的算法收敛到近似的纳什均衡。对各种不完美信息博弈的广泛实验突出了 AutoCFR 的可扩展性、可扩展性和通用性,使其成为解决不完美信息博弈的通用框架。
更新日期:2024-10-11
down
wechat
bug