Self-supervised learning (SSL) has garnered significant attention in speech processing, particularly excelling in linguistic tasks such as speech recognition. However, improving the performance of pre-trained models a...
详细信息
Self-supervised learning (SSL) has garnered significant attention in speech processing, particularly excelling in linguistic tasks such as speech recognition. However, improving the performance of pre-trained models across various downstream tasks—each requiring distinct types of speech information—remains a significant challenge. To address this, we propose a progressive residual extraction based SSL method, named ProgRE. Specifically, we introduce two lightweight, specialized task modules into an encoder-style SSL backbone to enhance its ability to extract pitch variation and speaker information from speech. Furthermore, to mitigate the incompatibility between the reinforced pitch variation and speaker information and the learning of content information, we employ residual extraction, leveraging the extracted representations as references or conditioning signals to guide the subsequent modules in more effectively learning content-related information under the supervision of HuBERT-based speech masking prediction. In this manner, we can incrementally extract pitch variation, speaker, and content representations from the input speech. Finally, these multiple representations, each capturing diverse speech information, are combined using different layer weights to produce task-specific representations for various downstream tasks. Experimental results demonstrate that our ProgRE achieves significant performance improvements across several tasks, such as speaker identification, speech recognition, emotion recognition, speech enhancement, and voice conversion, outperforming excellent SSL methods like wav2vec2.0, HuBERT, and WavLM.
Switching units and networks have been analyzed as extensible fabrics,mostly in terms of their scheduling *** traditional literature on switching extensibility has provided complexity theory only relating to the total...
详细信息
Switching units and networks have been analyzed as extensible fabrics,mostly in terms of their scheduling *** traditional literature on switching extensibility has provided complexity theory only relating to the total numbers of inputs(or outputs)and exchange *** paper analyzes switching extensibility in terms of not only the scheduling algorithm and also the fabric *** is found that determining extensibility from soft complexity related to the number of inputs(or outputs)of the scheduling algorithm and the fabric extensibility in previous studies without quantization is a flawed conception.A method is thus proposed to express the spatial extensibility of a switching unit or network in terms of the connections of a switching resource and *** method calculates parameter ES(the efciency of switching)of an m×n switching unit and obtains two functions of the switching unit to describe spatial extensibility along with the number of unilateral inputs or *** is found that the range of ES is(0,1]and three types of switching unit and two types of crosspoint networks have ES=*** is calculated for banyan,Clos,parallel packet,fully interconnected and recirculation switching *** ES value for the banyan switching network is larger than that for other networks,and switching networks are classified into three types that have absolute/linear/denied spatial extensibility according to the limES *** is demonstrated that a switching network has the largest ES value when it contains only the five types of switching unit for which ES=***,a group-switching-first self-routing banyan switching network with lower blocking probability and time delay is deduced,and the ES method is contrasted with two other methods of evaluating spatial extensibility in terms of their mathematical expressions and intuitive graphics,for the five types of switching network listed above.
Recent studies show that graph neural networks (GNNs) are vulnerable to backdoor attacks. Existing backdoor attacks against GNNs use fixed-pattern triggers and lack reasonable trigger constraints, overlooking individu...
详细信息
Recently, emotional speech generation and speaker cloning have garnered significant interest in text-to-speech (TTS). With the open-sourcing of codec language TTS models trained on massive datasets with large-scale pa...
详细信息
Dear editor,Depicting superior punctuality and originality,Weibo has become increasingly critical and influential in China for online information acquisition and sharing. However, very few research has studied Weibo t...
详细信息
Dear editor,Depicting superior punctuality and originality,Weibo has become increasingly critical and influential in China for online information acquisition and sharing. However, very few research has studied Weibo to investigate event summarizing even though most of the published Weibos are event-driven. Besides, we observe that the existing methods are unsuitable for process-
This paper presents an efficient hardware prototype for network coding (NC). First, a packet synchronization mechanism is introduced to settle the problem of packet arriving mismatch between different incoming channel...
详细信息
ISBN:
(纸本)9781849195478
This paper presents an efficient hardware prototype for network coding (NC). First, a packet synchronization mechanism is introduced to settle the problem of packet arriving mismatch between different incoming channels. Then a high-speed lookup-table-based circuit is designed to perform dot product over Galois Field, which forms the basic calculation unit of NC operation. Taking the speed advantage of FPGA hardware, this prototype is able to perform network coding operations within several hundred nanoseconds. Thus further studies and emulations on NC are able to be carried out upon this platform in the real network scenario.
Query-oriented relevance, information richness and novelty are important requirements in query-focused summarization, which, to a considerable extent, determine the summary quality. Previous work either rarely took in...
详细信息
Query-oriented multi-document summarization (QMDS) attempts to generate a concise piece of text byextracting sentences from a target document collection, withthe aim of not only conveying the key content of that corpu...
详细信息
Recent studies show that graph neural networks (GNNs) are vulnerable to backdoor attacks. Existing backdoor attacks against GNNs use fixed-pattern triggers and lack reasonable trigger constraints, overlooking individu...
详细信息
Today’s Internet architecture was designed and proposed in the 60s and 70s with the intention to interconnect several computing resources across a geographically distributed user group. With the advent of substantial...
详细信息
暂无评论