文献详情 >PiVoD: Pitch, Volume and Durat... 收藏

IEEE Transactions on Audio, Speech and Language Processing

PiVoD: Pitch, Volume and Duration Invariant Representation for Music Tone Quality Evaluation

作者：Yixin Wang Xiaohong Guan Youtian Du Chenxu Wang Xiaobing Li Yu Pan

作者机构：Center for Art & Science and Presentation & Communication Frontier Institute of Science and Technology Xi'an Jiaotong University Xi'an China MOE KLINNS Laboratory Faculty of Electronics and Information Engineering Xi'an Jiaotong University Xi'an China Department of Automation Center for Intelligent and Networked Systems Tsinghua University Beijing China Department of AI Music and Music Information Technology Central Conservatory of Music Beijing China

出版物：《IEEE Transactions on Audio, Speech and Language Processing》

年卷期：2025年第33卷

页面：613-626页

基　　金：National Natural Science Foundation of China Fundamental Research Funds for the Central Universities

主　　题：Instruments Music Feature extraction Accuracy Speech processing Timbre Harmonic analysis Deep learning Training Power harmonic filters

摘要：Tone quality is of pivotal importance in the auditory perception of musical performance. Depending on the performer and the instrument, tone quality evaluation is subjective and time-consuming, with inherent difficulties stemming from the absence of precise measurement methods. In this study, we develop a novel method for tone quality evaluation utilizing an adversarial domain-invariant learning strategy to construct a representation invariant to changes in pitch, volume, and duration. The wide-band Mel frequency cepstral coefficients are employed for pitch-invariant feature extraction and instance normalization for volume invariance. An adversarial-trained time-delay neural network encoder is developed for enhancing pitch and duration invariance via random pitch shift and temporal segmentation. Experiments conducted on our curated dataset and the Good-sound dataset show that significant improvements from the new method are achieved in evaluating tone quality ascribed to performers and instruments, yielding a 15.3% and 9.5% increase in classification accuracy, respectively, compared to classical feature-based techniques. Remarkably, the class-wise outcomes exhibit enhancements in F-scores of 33.6% and 9.8% for each respective dataset. Ablation studies on pitch, volume, and duration invariance further underscore the efficacy of our approach. This substantial enhancement over existing methods presents a novel perspective on tone quality representation and offers a practical resource for music performance analysis.

本地馆藏 | 借阅须知 | 我要预约

已订购，未入库

sda

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

PiVoD: Pitch, Volume and Duration Invariant Representation for Music Tone Quality Evaluation

读者评论与其他读者分享你的观点

请选择收藏分类：

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

PiVoD: Pitch, Volume and Duration Invariant Representation for Music Tone Quality Evaluation

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：