Nature Machine Intelligence ( IF 18.8 ) Pub Date : 2024-07-18 , DOI: 10.1038/s42256-024-00863-1 James Gornet , Matt Thomson
Humans construct internal cognitive maps of their environment directly from sensory inputs without access to a system of explicit coordinates or distance measurements. Although machine learning algorithms like simultaneous localization and mapping utilize specialized inference procedures to identify visual features and construct spatial maps from visual and odometry data, the general nature of cognitive maps in the brain suggests a unified mapping algorithmic strategy that can generalize to auditory, tactile and linguistic inputs. Here we demonstrate that predictive coding provides a natural and versatile neural network algorithm for constructing spatial maps using sensory data. We introduce a framework in which an agent navigates a virtual environment while engaging in visual predictive coding using a self-attention-equipped convolutional neural network. While learning a next-image prediction task, the agent automatically constructs an internal representation of the environment that quantitatively reflects spatial distances. The internal map enables the agent to pinpoint its location relative to landmarks using only visual information.The predictive coding network generates a vectorized encoding of the environment that supports vector navigation, where individual latent space units delineate localized, overlapping neighbourhoods in the environment. Broadly, our work introduces predictive coding as a unified algorithmic framework for constructing cognitive maps that can naturally extend to the mapping of auditory, sensorimotor and linguistic inputs.
中文翻译:
使用视觉预测编码自动构建认知图
人类直接根据感官输入构建环境的内部认知地图,而无需访问明确的坐标或距离测量系统。尽管同时定位和映射等机器学习算法利用专门的推理程序来识别视觉特征并根据视觉和里程数据构建空间地图,但大脑中认知图的一般性质表明了一种统一的映射算法策略,可以推广到听觉、触觉和触觉。语言输入。在这里,我们证明预测编码提供了一种自然且通用的神经网络算法,用于使用感知数据构建空间地图。我们引入了一个框架,其中代理在虚拟环境中导航,同时使用配备自注意力的卷积神经网络进行视觉预测编码。在学习下一个图像预测任务时,代理会自动构建定量反映空间距离的环境内部表示。内部地图使代理能够仅使用视觉信息来精确定位其相对于地标的位置。预测编码网络生成支持矢量导航的环境矢量化编码,其中各个潜在空间单元描绘环境中的局部重叠邻域。从广义上讲,我们的工作引入了预测编码作为构建认知图的统一算法框架,该框架可以自然地扩展到听觉、感觉运动和语言输入的映射。