咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >PiVoD: Pitch, Volume and Durat... 收藏
IEEE Transactions on Audio, Speech and Language Processing

PiVoD: Pitch, Volume and Duration Invariant Representation for Music Tone Quality Evaluation

作     者:Yixin Wang Xiaohong Guan Youtian Du Chenxu Wang Xiaobing Li Yu Pan 

作者机构:Center for Art & Science and Presentation & Communication Frontier Institute of Science and Technology Xi'an Jiaotong University Xi'an China MOE KLINNS Laboratory Faculty of Electronics and Information Engineering Xi'an Jiaotong University Xi'an China Department of Automation Center for Intelligent and Networked Systems Tsinghua University Beijing China Department of AI Music and Music Information Technology Central Conservatory of Music Beijing China 

出 版 物:《IEEE Transactions on Audio, Speech and Language Processing》 

年 卷 期:2025年第33卷

页      面:613-626页

基  金:National Natural Science Foundation of China Fundamental Research Funds for the Central Universities 

主  题:Instruments Music Feature extraction Accuracy Speech processing Timbre Harmonic analysis Deep learning Training Power harmonic filters 

摘      要:Tone quality is of pivotal importance in the auditory perception of musical performance. Depending on the performer and the instrument, tone quality evaluation is subjective and time-consuming, with inherent difficulties stemming from the absence of precise measurement methods. In this study, we develop a novel method for tone quality evaluation utilizing an adversarial domain-invariant learning strategy to construct a representation invariant to changes in pitch, volume, and duration. The wide-band Mel frequency cepstral coefficients are employed for pitch-invariant feature extraction and instance normalization for volume invariance. An adversarial-trained time-delay neural network encoder is developed for enhancing pitch and duration invariance via random pitch shift and temporal segmentation. Experiments conducted on our curated dataset and the Good-sound dataset show that significant improvements from the new method are achieved in evaluating tone quality ascribed to performers and instruments, yielding a 15.3% and 9.5% increase in classification accuracy, respectively, compared to classical feature-based techniques. Remarkably, the class-wise outcomes exhibit enhancements in F-scores of 33.6% and 9.8% for each respective dataset. Ablation studies on pitch, volume, and duration invariance further underscore the efficacy of our approach. This substantial enhancement over existing methods presents a novel perspective on tone quality representation and offers a practical resource for music performance analysis.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分