当前位置: X-MOL 学术J. Neurosci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Computational Mechanisms Underlying Motivation to Earn Symbolic Reinforcers
Journal of Neuroscience ( IF 4.4 ) Pub Date : 2024-06-12 , DOI: 10.1523/jneurosci.1873-23.2024
Diana C. Burk , Craig Taswell , Hua Tang , Bruno B. Averbeck

Reinforcement learning is a theoretical framework that describes how agents learn to select options that maximize rewards and minimize punishments over time. We often make choices, however, to obtain symbolic reinforcers (e.g., money, points) that are later exchanged for primary reinforcers (e.g., food, drink). Although symbolic reinforcers are ubiquitous in our daily lives, widely used in laboratory tasks because they can be motivating, mechanisms by which they become motivating are less understood. In the present study, we examined how monkeys learn to make choices that maximize fluid rewards through reinforcement with tokens. The question addressed here is how the value of a state, which is a function of multiple task features (e.g., the current number of accumulated tokens, choice options, task epoch, trials since the last delivery of primary reinforcer, etc.), drives value and affects motivation. We constructed a Markov decision process model that computes the value of task states given task features to then correlate with the motivational state of the animal. Fixation times, choice reaction times, and abort frequency were all significantly related to values of task states during the tokens task (n = 5 monkeys, three males and two females). Furthermore, the model makes predictions for how neural responses could change on a moment-by-moment basis relative to changes in the state value. Together, this task and model allow us to capture learning and behavior related to symbolic reinforcement.



中文翻译:


获得象征强化物动机的计算机制



强化学习是一个理论框架,描述了代理如何随着时间的推移学习选择最大化奖励和最小化惩罚的选项。然而,我们经常做出选择来获得象征性强化物(例如金钱、积分),然后将其交换为主要强化物(例如食物、饮料)。尽管象征性强化物在我们的日常生活中无处不在,并且由于它们具有激励作用而广泛应用于实验室任务中,但人们对它们产生激励作用的机制却知之甚少。在本研究中,我们研究了猴子如何通过代币强化来学习做出最大化流动奖励的选择。这里解决的问题是状态的值是如何驱动的,它是多个任务特征的函数(例如,当前累积的令牌数量、选择选项、任务历元、自上次提供主要强化物以来的试验等)价值并影响动机。我们构建了一个马尔可夫决策过程模型,该模型计算给定任务特征的任务状态的值,然后与动物的动机状态相关联。固定时间、选择反应时间和中止频率都与令牌任务期间的任务状态值显着相关(n = 5 只猴子,三只雄性和两只雌性)。此外,该模型还预测神经响应相对于状态值的变化如何逐时变化。该任务和模型共同使我们能够捕获与符号强化相关的学习和行为。

更新日期:2024-06-13
down
wechat
bug