检索结果-内蒙古大学图书馆

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

作者： Zhang, Zhihui Li, Xiaoqi Li, Yaxing Dong, Yuanjie Wang, Dan Xiong, Shengwu Wuhan Univ Technol Sch Comp Sci & Technol Wuhan Peoples R China

ISBN: (纸本)9781728176055

Most of the deep learning based speech enhancement methods focus on the modeling of complicated relationship between the noisy speech and the clean speech without the consideration of noise information. In order to cope with various complex noise scenes, we introduce a novel enhancement architecture that integrates a deep autoencoder with neural noise embedding. In this study, a new normalization method, termed conditional layer normalization (CLN), is introduced to improve the generalization of deep learning based speech enhancement approaches for unseen environments. The noise embedding is passed through the CLN layers to regularize the network for speech enhancement task. The proposed network can be adaptively adjusted according to different noise information extracted from the noisy speech input. The network in overall is trained in an end-to-end manner and the experimental results show that the proposed scheme produces satisfactory enhancement performance comparing the other methods. The visualization shows that our proposed network captures noise information, which is helpful to improve robustness to unseen environments for speech enhancement.

关键词： speech enhancement noise embedding conditional layer normalization end-to-end

来源：评论

学校读者我要写书评

暂无评论

Joint relational triple extraction based on potential relation detection and conditional entity mapping

引用

APPLIED INTELLIGENCE 2023年第24期53卷 29656-29676页

作者： Zhou, Xiong Zhang, Qinghua Gao, Man Wang, Guoyin Chongqing Univ Posts & Telecommun Sch Comp Sci & Technol Chongqing 400065 Peoples R China Chongqing Univ Posts & Telecommun Key Lab Big Data Intelligent Comp Chongqing 400065 Peoples R China Chongqing Univ Posts & Telecommun Chongqing Key Lab Computat Intelligence Chongqing 400065 Peoples R China Minist Culture & Tourism Key Lab Tourism Multisource Data Percept & Decis Chongqing 400065 Peoples R China

Joint relational triple extraction treats entity recognition and relation extraction as a joint task to extract relational triples, and this is a critical task in information extraction and knowledge graph construction. However, most existing joint models still fall short in terms of extracting overlapping triples. Moreover, these models ignore the trigger words of potential relations during the relation detection process. To address the two issues, a joint model based on Potential Relation Detection and conditional Entity Mapping is proposed, named PRDCEM. Specifically, the proposed model consists of three components, i.e., potential relation detection, candidate entity tagging, and conditional entity mapping, corresponding to three subtasks. First, a non-autoregressive decoder that contains a cross-attention mechanism is applied to detect potential relations. In this way, different potential relations are associated with the corresponding trigger words in the given sentence, and the semantic representations of the trigger words are fully utilized to encode potential relations. Second, two distinct sequence taggers are employed to extract candidate subjects and objects. Third, an entity mapping module incorporating conditional layer normalization is designed to align the candidate subjects and objects. As such, each candidate subject and each potential relation are combined to form a condition that is incorporated into the sentence, which can effectively extract overlapping triples. Finally, the negative sampling strategy is employed in the entity mapping module to mitigate the error propagation from the previous two components. In a comparison with 15 baselines, the experimental results obtained on two widely used public datasets demonstrate that PRDCEM can effectively extract overlapping triples and achieve improved performance.

关键词： conditional layer normalization Entity mapping Joint relational triple extraction Potential relation detection

来源：评论

学校读者我要写书评

暂无评论

PML-ED: A method of partial multi-label learning by using encoder-decoder framework and exploring label correlation

引用

INFORMATION SCIENCES 2024年 661卷

作者： Wang, Zhenwu Liu, Fanghan Han, Mengjie Tang, Hongjian Wan, Benting China Univ Min & Technol Beijing Sch Artificial Intelligence Beijing 100083 Peoples R China Dalarna Univ Sch Informat & Engn S-79131 Falun Sweden Jiangxi Univ Finance & Econ Sch Software & IoT Engn Nanchang 330013 Peoples R China

Partial multi-label learning (PML) addresses problems where each instance is assigned a candidate label set and only a subset of these candidate labels is correct. The major challenge of PML is that the training procedure can be easily misguided by noisy labels. Current studies on PML have revealed two significant drawbacks. First, most of them do not sufficiently explore complex label correlations, which could improve the effectiveness of label disambiguation. Second, PML models heavily rely on prior assumptions, limiting their applicability to specific scenarios. In this work, we propose a novel method of PML based on the Encoder-Decoder Framework (PML-ED) to address the drawbacks. PML-ED initially achieves the distribution of label probability through a KNN label attention mechanism. It then adopts conditional layer normalization (CLN) to extract the high-order label correlation and relaxes the prior assumption of label noise by introducing a universal Encoder-Decoder framework. This approach makes PML-ED not only more efficient compared to the state-of-the-art methods, but also capable of handling the data with large noisy labels across different domains. Experimental results on 28 benchmark datasets demonstrate that the proposed PML-ED model, when benchmarked against nine leading-edge PML algorithms, achieves the highest average ranking across five evaluation criteria.

关键词： Partial multi-label learning Label correlation Label disambiguation Encoder-Decoder framework conditional layer normalization

来源：评论

学校读者我要写书评

暂无评论

Leveraging Low-Rank Adaptation for Parameter-Efficient Fine-Tuning in Multi-Speaker Adaptive Text-to-Speech Synthesis

引用

IEEE ACCESS 2024年 12卷 190711-190727页

作者： Hong, Changi Lee, Jung Hyuk Kim, Hong Kook Gwangju Inst Sci & Technol AI Grad Sch Gwangju 61005 South Korea Gwangju Inst Sci & Technol Sch Elect Engn & Comp Sci Gwangju 61005 South Korea AunionAI Co Ltd Gwangju 61005 South Korea

Text-to-speech (TTS) technology is commonly used to generate personalized voices for new speakers. Despite considerable progress in TTS technology, personal voice synthesis remains problematic in achieving high-quality custom voices. In addressing this issue, fine-tuning a TTS model is a popular approach. However, it must be applied once for every new speaker, which results in both time-consuming model training and excessive storage of the TTS model parameters. Therefore, to support a large number of new speakers, a parameter-efficient fine-tuning (PEFT) approach must be used instead of full fine-tuning, as well as an approach to accommodate multiple speakers with a small number of parameters. To this end, this work first incorporates a low-rank adaptation-based fine-tuning method for variational inference with adversarial learning for end-to-end TTS (VITS) model. Next, the approach is extended with conditional layer normalization for multi-speaker fine-tuning, and the residual adapter is further applied to the text encoder outputs of the VITS model to improve the intelligibility and naturalness of the speech quality of personalized speech. The performance of the fine-tuned TTS models with different combinations of fine-tuning modules is evaluated using the Libri-TTS-100, VCTK, and Common Voice datasets, as well as a Korean multi-speaker dataset. Objective and subjective quality comparisons reveal that the proposed approach achieves speech quality comparable to that of a fully fine-tuned model, with around a 90% reduction in the number of model parameters.

关键词： Adaptation models Predictive models Computational modeling Acoustics Training Data models Tuning Text to speech Load modeling Zero shot learning Text-to-speech synthesis low-rank adaptation multi-speaker adaptation parameter-efficient fine-tuning residual adapter conditional layer normalization variational inference with adversarial learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：