Label-Specific Time–Frequency Energy-Based Neural Network for Instrument Recognition,IEEE Transactions on Cybernetics

当前位置： X-MOL 学术 › IEEE Trans. Cybern. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Label-Specific Time–Frequency Energy-Based Neural Network for Instrument Recognition
IEEE Transactions on Cybernetics ( IF 9.4 ) Pub Date : 2024-08-19 , DOI: 10.1109/tcyb.2024.3433519
Jian Zhang ₁ , Tong Wei ₁ , Min-Ling Zhang ₁

Affiliation

Predominant instrument recognition plays a vital role in music information retrieval. This task involves identifying and categorizing the dominant instruments present in a piece of music based on their distinctive time–frequency characteristics and harmonic distribution. Existing predominant instrument recognition approaches mainly focus on learning implicit mappings (such as deep neural networks) from time-domain or frequency-domain representations of music audio to instrument labels. However, different instruments playing in polyphonic music produce local superposed time–frequency representations while most implicit models could be sensitive to such local data changes. This thus poses a challenge for these implicit methods to accurately capture the unique harmonic features of each instrument. To address this challenge, considering that the complete harmonic information of an instrument is also distributed across a wide range of frequencies, we design a label-specific time–frequency feature learning approach to convert the task of building implicit classification mappings into the process of extracting and matching features that are specific to each instrument, as a result, a new explicit learning model: label-specific time–frequency energy-based neural network (LSTN) is proposed. Unlike existing implicit models, LSTN not only extracts their commonly used local time–frequency features but also incorporates time-domain factors and frequency-domain factors in its energy function to explicitly parameterize the long-term correlation and long-frequency correlation features. Using the extracted time–frequency features and the two long correlation features as instrument label-specific features, LSTN detects whether the harmonic distribution of each instrument appears in polyphonic music on both long time–frequency scales and local time–frequency scales to mitigate the challenges posed by local superposed representations. We conduct an analysis of the complexity and the convergence of LSTN, then experiments conducted on benchmark datasets demonstrate the superiority of LSTN over other established instrument recognition algorithms.

中文翻译：

用于仪器识别的标签特异性时频能量神经网络

主要乐器识别在音乐信息检索中起着至关重要的作用。这项任务涉及根据音乐中独特的时频特性和谐波分布来识别和分类音乐中存在的主要乐器。现有主流乐器识别方法主要侧重于学习从音乐音频的时域或频域表示到乐器标签的隐式映射（例如深度神经网络）。然而，在复音音乐中演奏的不同乐器会产生局部叠加的时频表示，而大多数隐式模型可能对这种局部数据变化很敏感。因此，这给这些隐式方法带来了挑战，无法准确捕捉每种乐器的独特谐波特征。为了应对这一挑战，考虑到乐器的完整谐波信息也分布在很宽的频率范围内，我们设计了一种特定于标签的时频特征学习方法，将构建隐式分类映射的任务转换为提取和匹配每个乐器特有的特征的过程，因此，提出了一种新的显式学习模型：标签特异性时频能量基于神经网络（LSTN）。与现有的隐式模型不同，LSTN 不仅提取了它们常用的局部时频特征，而且还在其能量函数中加入了时域因子和频域因子，以显式参数化长期相关和长频相关特征。使用提取的时频特征和两个长相关特征作为乐器标签特定特征，LSTN 检测每种乐器的谐波分布是否出现在长时频尺度和局部时频尺度上的复调音乐中，以减轻局部叠加表示带来的挑战。我们对 LSTN 的复杂性和收敛性进行了分析，然后在基准数据集上进行的实验证明了 LSTN 优于其他已建立的仪器识别算法。

更新日期：2024-08-19

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南