Target speaker extraction aims to extract the speech of a specific speaker from a multi-talker mixture as specified by an auxiliary reference. Most studies focus on the scenario where the target speech is highly overl...
Target speaker extraction aims to extract the speech of a specific speaker from a multi-talker mixture as specified by an auxiliary reference. Most studies focus on the scenario where the target speech is highly overlapped with the interfering speech. However, this scenario only accounts for a small percentage of real-world conversations. In this paper, we aim at the sparsely overlapped scenarios in which the auxiliary reference needs to perform two tasks simultaneously: detect the activity of the target speaker and disentangle the active speech from any interfering speech. We propose an audio-visual speaker extraction model named ActiveExtract, which leverages speaking activity from audio-visual active speaker detection (ASD). The ASD directly provides the frame-level activity of the target speaker, while its intermediate feature representation is trained to discriminate speech-lip synchronization that could be used for speaker disentanglement. Experimental results show our model outperforms baselines across various overlapping ratios, achieving an average improvement of more than 4 dB in terms of SI-SNR.
There are many different sorts of data that can be gathered and analyzed, including pictures, videos, texts, speeches, music, and other noises, Video content, for example, generally includes minimum some types of audi...
详细信息
According to the U.S. Energy Information Administration, 60 percent of the world's electricity is generated from fossil fuels, 18 percent from nuclear power, and only 21 percent from green energy resources. These ...
详细信息
The current lyrics transcription approaches heavily rely on supervised learning with labeled data, but such data are scarce and manual labeling of singing is expensive. How to benefit from unlabeled data and alleviate...
详细信息
It is common in everyday spoken communication that we look at the turning head of a talker to listen to his/her voice. Humans see the talker to listen better, so do machines. However, previous studies on audio-visual ...
详细信息
Protein surface plays a key role in many biological *** proteins participate in the life activities of cells via binding to other proteins or ligand molecules. It is an important work to study protein structure and fu...
Protein surface plays a key role in many biological *** proteins participate in the life activities of cells via binding to other proteins or ligand molecules. It is an important work to study protein structure and function by analyzing the protein surface shape. Based on the CX algorithm and the 2 D fngerprint-base method, we proposed a FCX method to identify the morphology of bulges and depressions on the protein surface. The experimental results show that the FCX algorithm has a more desirable outcome than CX algorithm. The FCX algorithm has a higher correlation with the convex and concave features than CX values with solvent accessibility, solvent accessibility, and Bfactor's Pearson correlation coefficient. This result shows that the FCX algorithm can describe the shape of the protein surface residues more accurately than the CX algorithm.
Person re-identification(ReID)aims to recognize the same person in multiple images from different camera *** person ReID models are time-consuming and resource-intensive;thus,cloud computing is an appropriate model tr...
详细信息
Person re-identification(ReID)aims to recognize the same person in multiple images from different camera *** person ReID models are time-consuming and resource-intensive;thus,cloud computing is an appropriate model training ***,the required massive personal data for training contain private information with a significant risk of data leakage in cloud environments,leading to significant communication *** paper proposes a federated person ReID method with model-contrastive learning(MOON)in an edge-cloud environment,named ***,based on federated partial averaging,MOON warmup is added to correct the local training of individual edge servers and improve the model’s effectiveness by calculating and back-propagating a model-contrastive loss,which represents the similarity between local and global *** addition,we propose a lightweight person ReID network,named multi-branch combined depth space network(MB-CDNet),to reduce the computing resource usage of the edge device when training and testing the person ReID ***-CDNet is a multi-branch version of combined depth space network(CDNet).We add a part branch and a global branch on the basis of CDNet and introduce an attention pyramid to improve the performance of the *** experimental results on open-access person ReID datasets demonstrate that FRM achieves better performance than existing baseline.
Generative Artificial Intelligence (GAI) has recently emerged as a promising solution to address critical challenges of blockchain technology, including scalability, security, privacy, and interoperability. In this pa...
详细信息
Learning to hash is a method that can deal with content-based information retrieval efficiently. Traditional learning to hash methods, however, lack the ability to map the generated hash codes to the high-level semant...
详细信息
The primary objective of fog computing is to minimize the reliance of IoT devices on the cloud by leveraging the resources of fog network. Typically, IoT devices offload computation tasks to fog to meet different task...
详细信息
The primary objective of fog computing is to minimize the reliance of IoT devices on the cloud by leveraging the resources of fog network. Typically, IoT devices offload computation tasks to fog to meet different task requirements such as latency in task execution, computation costs, etc. So, selecting such a fog node that meets task requirements is a crucial challenge. To choose an optimal fog node, access to each node's resource availability information is essential. Existing approaches often assume state availability or depend on a subset of state information to design mechanisms tailored to different task requirements. In this paper, OptiFog: a cluster-based fog computing architecture for acquiring the state information followed by optimal fog node selection and task offloading mechanism is proposed. Additionally, a continuous time Markov chain based stochastic model for predicting the resource availability on fog nodes is proposed. This model prevents the need to frequently synchronize the resource availability status of fog nodes, and allows to maintain an updated state information. Extensive simulation results show that OptiFog lowers task execution latency considerably, and schedules almost all the tasks at the fog layer compared to the existing state-of-the-art. IEEE
暂无评论