The goal of hate speech detection is to filter negative online content aiming at certain groups of people. Due to the easy accessibility of social media platforms it is crucial to protect everyone which requires build...
详细信息
Learning accent from crowd-sourced data is a feasible way to achieve a target speaker TTS system that can synthesize accent speech. To this end, there are two challenging problems to be solved. First, direct use of th...
详细信息
ISBN:
(纸本)9798350397970
Learning accent from crowd-sourced data is a feasible way to achieve a target speaker TTS system that can synthesize accent speech. To this end, there are two challenging problems to be solved. First, direct use of the poor acoustic quality crowdsourced data and the target speaker data in accent transfer will apparently lead to synthetic speech with degraded quality. To mitigate this problem, we take a bottleneck feature (BN) based TTS approach, in which TTS is decomposed into a Text-to-BN (T2BN) module to learn accent and a BN-to-Mel (BN2Me1) module to learn speaker timbre, where neural network based BN feature serves as the intermediate representation that are robust to noise interference. Second, direct training T2BN using the crowd-sourced data in the two-stage system will produce accent speech of target speaker with poor prosody. This is because the the crowd-sourced recordings are contributed from the ordinary unprofessional speakers. To tackle this problem, we update the two-stage approach to a novel three-stage approach, where T2BN and BN2Me1 are trained using the high-quality target speaker data and a new BN-to-BN module is plugged in between the two modules to perform accent transfer. To train the BN2BN module, the parallel unaccented and accented BN features are obtained by a proposed data augmentation procedure. Finally the proposed three-stage approach manages to produce accent speech for the target speaker with good prosody, as the prosody pattern is inherited from the professional target speaker and accent transfer is achieved by the BN2BN module at the same time. The proposed approach, named as Accentspeech, is validated in a Mandarin TTS accent transfer task.
Generating natural and informative texts has been a long-standing problem in NLP. Much effort has been dedicated into incorporating pre-trained language models (PLMs) with various open-world knowledge, such as knowled...
详细信息
Hyperledger Fabric is an open-source private permissioned blockchain that supports the use of smart contracts (chaincode). It is aimed mainly at private networks of companies. To serve the different needs of each comp...
详细信息
Hyperledger Fabric is an open-source private permissioned blockchain that supports the use of smart contracts (chaincode). It is aimed mainly at private networks of companies. To serve the different needs of each company and to be flexible in customer requirements, it consists of various adaptive components. Although this structure efficiently addresses a wide range of needs, deploying such a network for research purposes or rapid development is complex. In this paper, we present a web-based system architecture for the automated deployment of a Hyperledger Fabric network, and in addition, we describe the tools needed to manage and update such a network. Finally, as a proof-of-concept, we implement the proposed architecture to demonstrate the feasibility of our approach.
Relation Extraction (RE) remains a challenging task, especially when considering realistic out-of-domain evaluations. One of the main reasons for this is the limited training size of current RE datasets: obtaining hig...
详细信息
Recently, various response generation models for two-party conversations have achieved impressive improvements, but less effort has been paid to multi-party conversations (MPCs) which are more practical and complicate...
详细信息
Automatic speech Recognition (ASR) in conversational settings presents unique challenges, including extracting relevant contextual information from previous conversational turns. Due to irrelevant content, error propa...
详细信息
language style is necessary for AI systems to understand and generate diverse human language accurately. However, previous text style transfer primarily focused on sentence-level data-driven approaches, limiting explo...
详细信息
Speaker anonymization aims to conceal a speaker’s identity without degrading speech quality and intelligibility. Most speaker anonymization systems disentangle the speaker representation from the original speech and ...
Speaker anonymization aims to conceal a speaker’s identity without degrading speech quality and intelligibility. Most speaker anonymization systems disentangle the speaker representation from the original speech and achieve anonymization by averaging or modifying the speaker representation. However, the anonymized speech is subject to reduction in pseudo speaker distinctiveness, speech quality and intelligibility for out-of-distribution speaker. To solve this issue, we propose SALT, a Speaker Anonymization system based on Latent space Transformation. Specifically, we extract latent features by a self-supervised feature extractor and randomly sample multiple speakers and their weights, and then interpolate the latent vectors to achieve speaker anonymization. Meanwhile, we explore the extrapolation method to further extend the diversity of pseudo speakers. Experiments on Voice Privacy Challenge dataset show our system achieves a state-of-the-art distinctiveness metric while preserving speech quality and intelligibility. Our code and demo is availible at github 1 . 1 https://***/BakerBunker/SALT
Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources. We propose MULTI-CROSSRE, the broadest multi-lingual dataset for RE, including 26 languages i...
详细信息
暂无评论