检索结果-内蒙古大学图书馆

Accentspeech: Learning Accent from Crowd-sourced Data for Target Speaker TTS with Accents

学校读者我要写书评

暂无评论

AccentSpeech: Learning Accent from Crowd-sourced Data for Ta...

International Symposium on Chinese Spoken language processing

作者： Yongmao Zhang Zhichao Wang Peiji Yang Hongshen Sun Zhisheng Wang Lei Xie Audio Speech and Language Processing Group (ASLP@NPU) School of Computer Science Northwestern Polytechnical University Xi’an China Tencent Shenzhen China

ISBN: (纸本)9798350397970

Learning accent from crowd-sourced data is a feasible way to achieve a target speaker TTS system that can synthesize accent speech. To this end, there are two challenging problems to be solved. First, direct use of the poor acoustic quality crowdsourced data and the target speaker data in accent transfer will apparently lead to synthetic speech with degraded quality. To mitigate this problem, we take a bottleneck feature (BN) based TTS approach, in which TTS is decomposed into a Text-to-BN (T2BN) module to learn accent and a BN-to-Mel (BN2Me1) module to learn speaker timbre, where neural network based BN feature serves as the intermediate representation that are robust to noise interference. Second, direct training T2BN using the crowd-sourced data in the two-stage system will produce accent speech of target speaker with poor prosody. This is because the the crowd-sourced recordings are contributed from the ordinary unprofessional speakers. To tackle this problem, we update the two-stage approach to a novel three-stage approach, where T2BN and BN2Me1 are trained using the high-quality target speaker data and a new BN-to-BN module is plugged in between the two modules to perform accent transfer. To train the BN2BN module, the parallel unaccented and accented BN features are obtained by a proposed data augmentation procedure. Finally the proposed three-stage approach manages to produce accent speech for the target speaker with good prosody, as the prosody pattern is inherited from the professional target speaker and accent transfer is achieved by the BN2BN module at the same time. The proposed approach, named as Accentspeech, is validated in a Mandarin TTS accent transfer task.

关键词： Training Neural networks Interference Data models Acoustics Recording Timbre

TEGTOK: Augmenting Text Generation via Task-specific and Open-world Knowledge

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Tan, Chao-Hong Gu, Jia-Chen Tao, Chongyang Ling, Zhen-Hua Xu, Can Hu, Huang Geng, Xiubo Jiang, Daxin National Engineering Research Center for Speech and Language Information Processing University of Science and Technology of China Hefei China Microsoft Beijing China

Generating natural and informative texts has been a long-standing problem in NLP. Much effort has been dedicated into incorporating pre-trained language models (PLMs) with various open-world knowledge, such as knowledge graphs or wiki pages. However, their ability to access and manipulate the task-specific knowledge is still limited on downstream tasks, as this type of knowledge is usually not well covered in PLMs and is hard to acquire. To address the problem, we propose augmenting TExt Generation via Task-specific and Open-world Knowledge (TEGTOK) in a unified framework. Our model selects knowledge entries from two types of knowledge sources through dense retrieval and then injects them into the input encoding and output decoding stages respectively on the basis of PLMs. With the help of these two types of knowledge, our model can learn what and how to generate. Experiments on two text generation tasks of dialogue generation and question generation, and on two datasets show that our method achieves better performance than various baseline models. Copyright © 2022, The Authors. All rights reserved.

关键词： Knowledge graph

HFabD+M: A Web-based Platform for Automated Hyperledger Fabric Deployment and Management

学校读者我要写书评

暂无评论

HFabD+M: A Web-based Platform for Automated Hyperledger Fabr...

Global Emerging Technology Blockchain Forum: Blockchain & Beyond (iGETblockchain), IEEE

作者： Ioannis Zikos Andreas Sendros George Drosatos Pavlos S. Efraimidis Department of Electrical and Computer Engineering Democritus University of Thrace Xanthi Greece Institute for Language and Speech Processing Athena Research Center Xanthi Greece

Hyperledger Fabric is an open-source private permissioned blockchain that supports the use of smart contracts (chaincode). It is aimed mainly at private networks of companies. To serve the different needs of each company and to be flexible in customer requirements, it consists of various adaptive components. Although this structure efficiently addresses a wide range of needs, deploying such a network for research purposes or rapid development is complex. In this paper, we present a web-based system architecture for the automated deployment of a Hyperledger Fabric network, and in addition, we describe the tools needed to manage and update such a network. Finally, as a proof-of-concept, we implement the proposed architecture to demonstrate the feasibility of our approach.

关键词： Distributed ledger Smart contracts Prototypes Systems architecture Companies Fabrics Blockchains

Silver Syntax Pre-training for Cross-Domain Relation Extraction

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Bassignana, Elisa Ginter, Filip Pyysalo, Sampo van der Goot, Rob Plank, Barbara Department of Computer Science IT University of Copenhagen Denmark TurkuNLP Department of Computing University of Turku Finland MaiNLP Center for Information and Language Processing LMU Munich Germany

Relation Extraction (RE) remains a challenging task, especially when considering realistic out-of-domain evaluations. One of the main reasons for this is the limited training size of current RE datasets: obtaining high-quality (manually annotated) data is extremely expensive and cannot realistically be repeated for each new domain. An intermediate training step on data from related tasks has shown to be beneficial across many NLP tasks. However, this setup still requires supplementary annotated data, which is often not available. In this paper, we investigate intermediate pre-training specifically for RE. We exploit the affinity between syntactic structure and semantic RE, and identify the syntactic relations which are closely related to RE by being on the shortest dependency path between two entities. We then take advantage of the high accuracy of current syntactic parsers in order to automatically obtain large amounts of low-cost pre-training data. By pre-training our RE model on the relevant syntactic relations, we are able to outperform the baseline in five out of six cross-domain setups, without any additional annotated data. © 2023, CC BY.

关键词： Syntactics

HETERMPC: A Heterogeneous Graph Neural Network for Response Generation in Multi-Party Conversations

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Gu, Jia-Chen Tan, Chao-Hong Tao, Chongyang Ling, Zhen-Hua Hu, Huang Geng, Xiubo Jiang, Daxin National Engineering Research Center for Speech and Language Information Processing Universit of Science and Technolo of China Hefei China Microsoft Beijing China

Recently, various response generation models for two-party conversations have achieved impressive improvements, but less effort has been paid to multi-party conversations (MPCs) which are more practical and complicated. Compared with a two-party conversation where a dialogue context is a sequence of utterances, building a response generation model for MPCs is more challenging, since there exist complicated context structures and the generated responses heavily rely on both interlocutors (i.e., speaker and addressee) and history utterances. To address these challenges, we present HeterMPC, a heterogeneous graph-based neural network for response generation in MPCs which models the semantics of utterances and interlocutors simultaneously with two types of nodes in a graph. Besides, we also design six types of meta relations with node-edge-type-dependent parameters to characterize the heterogeneous interactions within the graph. Through multi-hop updating, HeterMPC can adequately utilize the structural knowledge of conversations for response generation. Experimental results on the Ubuntu Internet Relay Chat (IRC) channel benchmark show that HeterMPC outperforms various baseline models for response generation in MPCs. Copyright © 2022, The Authors. All rights reserved.

关键词： Graphic methods

Conversational speech Recognition by Learning Audio-textual Cross-modal Contextual Representation

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Wei, Kun Li, Bei Lv, Hang Lu, Quan Jiang, Ning Xie, Lei The Audio Speech and Language Processing Group School of Computer Science Northwestern Polytechnical University Xi'An710072 China The School of Computer Science and Engineering Northeastern University Shenyang110167 China Mashang Consumer Finance Co. Ltd. Chongqing401121 China

Automatic speech Recognition (ASR) in conversational settings presents unique challenges, including extracting relevant contextual information from previous conversational turns. Due to irrelevant content, error propagation, and redundancy, existing methods struggle to extract longer and more effective contexts. To address this issue, we introduce a novel Conversational ASR system, extending the Conformer encoder-decoder model with cross-modal conversational representation. Our approach leverages a cross-modal extractor that combines pre-trained speech and text models through a specialized encoder and a modal-level mask input. This enables the extraction of richer historical speech context without explicit error propagation. We also incorporate conditional latent variational modules to learn conversational-level attributes such as role preference and topic coherence. By introducing both cross-modal and conversational representations into the decoder, our model retains context over longer sentences without information loss, achieving relative accuracy improvements of 8.8% and 23% on Mandarin conversation datasets HKUST and MagicData-RAMC, respectively, compared to the standard Conformer model. Copyright © 2023, The Authors. All rights reserved.

关键词： Signal encoding

PSST: A Benchmark for Evaluation-driven Text Public-Speaking Style Transfer

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Sun, Huashan Wu, Yixiao Ye, Yuhao Yang, Yizhe Li, Yinghao Li, Jiawei Gao, Yang School of Computer Science and Technology Beijing Institute of Technology Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications China

language style is necessary for AI systems to understand and generate diverse human language accurately. However, previous text style transfer primarily focused on sentence-level data-driven approaches, limiting exploration of potential problems in large language models (LLMs) and the ability to meet complex application needs. To overcome these limitations, we introduce a novel task called Public-Speaking Style Transfer (PSST), which aims to simulate humans to transform passage-level, official texts into a public-speaking style. Grounded in the analysis of real-world data from a linguistic perspective, we decompose public-speaking style into key sub-styles to pose challenges and quantify the style modeling capability of LLMs. For such intricate text style transfer, we further propose a fine-grained evaluation framework to analyze the characteristics and identify the problems of stylized texts. Comprehensive experiments suggest that current LLMs struggle to generate public speaking texts that align with human preferences, primarily due to excessive stylization and loss of semantic information Copyright © 2023, The Authors. All rights reserved.

关键词： Semantics

Salt: Distinguishable Speaker Anonymization Through Latent Space Transformation

学校读者我要写书评

暂无评论

Salt: Distinguishable Speaker Anonymization Through Latent S...

IEEE Workshop on Automatic speech Recognition and Understanding

作者： Yuanjun Lv Jixun Yao Peikun Chen Hongbin Zhou Heng Lu Lei Xie Audio Speech and Language Processing Group (ASLP@NPU) School of Computer Science Northwestern Polytechnical University Xi’an China Ximalaya Inc. China Xizhang (Shanghai) Network Technology Co. Ltd

Speaker anonymization aims to conceal a speaker’s identity without degrading speech quality and intelligibility. Most speaker anonymization systems disentangle the speaker representation from the original speech and achieve anonymization by averaging or modifying the speaker representation. However, the anonymized speech is subject to reduction in pseudo speaker distinctiveness, speech quality and intelligibility for out-of-distribution speaker. To solve this issue, we propose SALT, a Speaker Anonymization system based on Latent space Transformation. Specifically, we extract latent features by a self-supervised feature extractor and randomly sample multiple speakers and their weights, and then interpolate the latent vectors to achieve speaker anonymization. Meanwhile, we explore the extrapolation method to further extend the diversity of pseudo speakers. Experiments on Voice Privacy Challenge dataset show our system achieves a state-of-the-art distinctiveness metric while preserving speech quality and intelligibility. Our code and demo is availible at github 1 . 1 https://***/BakerBunker/SALT

关键词：

MULTI-CROSSRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction

学校读者我要写书评

暂无评论

arXiv 2023年

Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources. We propose MULTI-CROSSRE, the broadest multi-lingual dataset for RE, including 26 languages in addition to English, and covering six text domains. MULTI-CROSSRE is a machine translated version of CrossRE (Bassignana and Plank, 2022a), with a sub-portion including more than 200 sentences in seven diverse languages checked by native speakers. We run a baseline model over the 26 new datasets and-as sanity check-over the 26 back-translations to English. Results on the back-translated data are consistent with the ones on the original English CrossRE, indicating high quality of the translation and the resulting dataset. © 2023, CC BY.

关键词： Extraction