Instrument-Tissue Interaction Detection Framework for Surgical Video Understanding,IEEE Transactions on Medical Imaging

当前位置： X-MOL 学术 › IEEE Trans. Med. Imaging › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Instrument-Tissue Interaction Detection Framework for Surgical Video Understanding
IEEE Transactions on Medical Imaging ( IF 8.9 ) Pub Date : 2024-03-26 , DOI: 10.1109/tmi.2024.3381209
Wenjun Lin ₁ , Yan Hu ₂ , Huazhu Fu ₃ , Mingming Yang ₄ , Chin-Boon Chng ₁ , Ryo Kawasaki ₅ , Cheekong Chui ₁ , Jiang Liu ₂

Affiliation

Instrument-tissue interaction detection task, which helps understand surgical activities, is vital for constructing computer-assisted surgery systems but with many challenges. Firstly, most models represent instrument-tissue interaction in a coarse-grained way which only focuses on classification and lacks the ability to automatically detect instruments and tissues. Secondly, existing works do not fully consider relations between intra- and inter-frame of instruments and tissues. In the paper, we propose to represent instrument-tissue interaction as

$\langle $

instrument class, instrument bounding box, tissue class, tissue bounding box, action class

$\rangle $

quintuple and present an Instrument-Tissue Interaction Detection Network (ITIDNet) to detect the quintuple for surgery videos understanding. Specifically, we propose a Snippet Consecutive Feature (SCF) Layer to enhance features by modeling relationships of proposals in the current frame using global context information in the video snippet. We also propose a Spatial Corresponding Attention (SCA) Layer to incorporate features of proposals between adjacent frames through spatial encoding. To reason relationships between instruments and tissues, a Temporal Graph (TG) Layer is proposed with intra-frame connections to exploit relationships between instruments and tissues in the same frame and inter-frame connections to model the temporal information for the same instance. For evaluation, we build a cataract surgery video (PhacoQ) dataset and a cholecystectomy surgery video (CholecQ) dataset. Experimental results demonstrate the promising performance of our model, which outperforms other state-of-the-art models on both datasets.

中文翻译：

用于手术视频理解的仪器-组织相互作用检测框架

仪器与组织相互作用检测任务有助于理解手术活动，对于构建计算机辅助手术系统至关重要，但也面临许多挑战。首先，大多数模型以粗粒度的方式表示器械与组织的相互作用，仅注重分类，缺乏自动检测器械和组织的能力。其次，现有的工作没有充分考虑仪器和组织框架内和框架间的关系。在本文中，我们建议将仪器-组织相互作用表示为$\兰格$仪器类、仪器边界框、组织类、组织边界框、动作类$\角度$五元组并提出一个仪器组织相互作用检测网络（ITIDNet）来检测五元组以理解手术视频。具体来说，我们提出了一个片段连续特征（SCF）层，通过使用视频片段中的全局上下文信息对当前帧中的提案关系进行建模来增强特征。我们还提出了一个空间对应注意（SCA）层，通过空间编码合并相邻帧之间的提案特征。为了推理仪器和组织之间的关系，提出了具有帧内连接的时间图（TG）层，以利用同一帧中仪器和组织之间的关系，以及帧间连接来对同一实例的时间信息进行建模。为了进行评估，我们构建了白内障手术视频（PhacoQ）数据集和胆囊切除手术视频（CholecQ）数据集。实验结果证明了我们的模型的良好性能，在两个数据集上都优于其他最先进的模型。

更新日期：2024-03-26

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南