当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SRConvNet: A Transformer-Style ConvNet for Lightweight Image Super-Resolution
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2024-07-24 , DOI: 10.1007/s11263-024-02147-y
Feng Li , Runmin Cong , Jingjing Wu , Huihui Bai , Meng Wang , Yao Zhao

Recently, vision transformers have demonstrated their superiority against convolutional neural networks (ConvNet) in various tasks including single-image super-resolution (SISR). The success of transformers can be attributed to the indispensable multi-head self-attention (MHSA) mechanism, which enables to effectively model global connectivity with fewer parameters. However, the quadratic complexity of MHSA usually encounters vast computation costs and memory resource occupation, limiting their efficient deployment on mobile devices compared to widely used lightweight ConvNets. In this work, we thoroughly explore the key differences between ConvNet- and transformer-based SR models, thus presenting SRConvNet that absorbs both the merits for lightweight SISR. Our SRConvNet is accomplished by two primary designs: (1) the Fourier modulated attention (FMA), an MHSA-like but more computationally and parametrically efficient operator that performs regional frequency-spatial modulation and aggregation to ensure long-term and short-term dependencies modeling; (2) the dynamic mixing layer (DML) utilizing mixed-scale depthwise dynamic convolution with channel splitting and shuffling to explore multi-scale contextualized information for model locality and adaptability enhancement. Combining FMA and DFN, we can build a pure transformer-style ConvNet to compete with the best lightweight SISR models in the trade-off between efficiency and accuracy. Extensive experiments demonstrate that SRConvNet can achieve more efficient SR reconstruction than recent state-of-the-art lightweight SISR methods on both computation and parameters while preserving comparable performance. Code is available at https://github.com/lifengcs/SRConvNet.



中文翻译:


SRConvNet:用于轻量级图像超分辨率的 Transformer 式 ConvNet



最近,视觉转换器在包括单图像超分辨率(SISR)在内的各种任务中证明了其相对于卷积神经网络(ConvNet)的优越性。 Transformer 的成功可以归功于不可或缺的多头自注意力(MHSA)机制,该机制能够用更少的参数有效地模拟全局连接。然而,与广泛使用的轻量级ConvNet相比,MHSA的二次复杂度通常会遇到巨大的计算成本和内存资源占用,限制了它们在移动设备上的有效部署。在这项工作中,我们彻底探讨了基于 ConvNet 和基于 Transformer 的 SR 模型之间的主要区别,从而提出了吸收轻量级 SISR 优点的 SRConvNet。我们的 SRConvNet 通过两个主要设计来实现:(1) 傅立叶调制注意力 (FMA),一种类似于 MHSA 但计算和参数效率更高的算子,执行区域频率空间调制和聚合以确保长期和短期依赖性造型; (2)动态混合层(DML)利用具有通道分裂和洗牌的混合尺度深度动态卷积来探索多尺度上下文信息,以增强模型局部性和适应性。结合 FMA 和 DFN,我们可以构建一个纯 Transformer 式的 ConvNet,在效率和准确性之间的权衡中与最好的轻量级 SISR 模型竞争。大量实验表明,SRConvNet 在计算和参数方面比最近最先进的轻量级 SISR 方法可以实现更有效的 SR 重建,同时保持可比较的性能。代码可在 https://github.com/lifengcs/SRConvNet 获取。

更新日期:2024-07-24
down
wechat
bug