Reinforcement Learning-Based MIMO Radar Multitarget Detection Assisted by Bayesian Inference,IEEE Transactions on Aerospace and Electronic Systems

当前位置： X-MOL 学术 › IEEE Trans. Aerosp. Electron. Sys. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Reinforcement Learning-Based MIMO Radar Multitarget Detection Assisted by Bayesian Inference
IEEE Transactions on Aerospace and Electronic Systems ( IF 5.1 ) Pub Date : 2024-03-29 , DOI: 10.1109/taes.2024.3380581
Zicheng Wang ₁ , Wei Xie ₂ , Zhengchun Zhou ₁ , Hua Meng ₁ , Meng Yang ₁

Affiliation

Reinforcement learning (RL) has been used to implement the perception-action cycle of multi-input–multi-output (MIMO) cognitive radar. This allows for the adaptive optimization of the radar's beampattern, which is guided by information from echoes and an appropriate reward signal. However, the present approaches rely on a greedy DOA estimation to select candidate angles, which means that decisions are primarily based on detection results from the last pulse. If the system misses a target, it can be time-consuming to recapture it, and this can limit the detection performance. To address this issue, we develop a data-driven method for executing DOA estimation by utilizing the Bayesian inference to evaluate the likelihood of an angle containing a target based on historical detection information. Furthermore, to accommodate dynamic scenarios, a decay method for historical experience is proposed, allowing the system to adapt to environmental changes dynamically. The simulation results show that the RL-based MIMO radar with our refined DOA estimation module outperforms existing RL-based detector, providing the SOTA detection performance by focusing more frequently on essential angles, even in small-scale system setups.

中文翻译：

贝叶斯推理辅助的基于强化学习的 MIMO 雷达多目标检测

强化学习（RL）已被用于实现多输入多输出（MIMO）认知雷达的感知-行动循环。这允许雷达波束方向图的自适应优化，该波束方向图由来自回波的信息和适当的奖励信号引导。然而，目前的方法依赖于贪婪的 DOA 估计来选择候选角度，这意味着决策主要基于最后一个脉冲的检测结果。如果系统错过目标，重新捕获目标可能会非常耗时，这会限制检测性能。为了解决这个问题，我们开发了一种数据驱动的方法来执行 DOA 估计，利用贝叶斯推理根据历史检测信息评估包含目标的角度的可能性。此外，为了适应动态场景，提出了历史经验的衰减方法，使系统能够动态适应环境变化。仿真结果表明，带有我们改进的 DOA 估计模块的基于 RL 的 MIMO 雷达优于现有的基于 RL 的检测器，即使在小规模系统设置中，也能通过更频繁地关注基本角度来提供 SOTA 检测性能。

更新日期：2024-03-29

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南