Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection ***,most audio‐visual wake word s...
详细信息
Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection ***,most audio‐visual wake word spotting models are only suitable for simple single‐speaker scenarios and require high computational *** development is hindered by complex multi‐person scenarios and computational limitations in mobile *** this paper,a novel audio‐visual model is proposed for on‐device multi‐person wake word ***,an attention‐based audio‐visual voice activity detection module is presented,which generates an attention score matrix of audio and visual representations to derive active speaker ***,the knowledge distillation method is introduced to transfer knowledge from the large model to the on‐device model to control the size of our ***,a new audio‐visual dataset,PKU‐KWS,is collected for sentence‐level multi‐person wake word *** results on the PKU‐KWS dataset show that this approach outperforms the previous state‐of‐the‐art methods.
For a long time, people have believed that representation problems are one of the bottlenecks in the field of machinelearning. Therefore, it is a long-term and exploratory work to study machinelearning representatio...
详细信息
Pixel-level structure segmentations have attracted considerable attention,playing a crucial role in autonomous driving within the metaverse and enhancing comprehension in light field-based machine ***,current light fi...
详细信息
Pixel-level structure segmentations have attracted considerable attention,playing a crucial role in autonomous driving within the metaverse and enhancing comprehension in light field-based machine ***,current light field modeling methods fail to integrate appearance and geometric structural information into a coherent semantic space,thereby limiting the capability of light field transmission for visual *** this paper,we propose a general light field modeling method for pixel-level structure segmentation,comprising a generative light field prompting encoder(LF-GPE)and a prompt-based masked light field pretraining(LF-PMP)*** LF-GPE,serving as a light field backbone,can extract both appearance and geometric structural cues *** aligns these features into a unified visual space,facilitating semantic ***,our LF-PMP,during the pretraining phase,integrates a mixed light field and a multi-view light field *** prioritizes considering the geometric structural properties of the light field,enabling the light field backbone to accumulate a wealth of prior *** evaluate our pretrained LF-GPE on two downstream tasks:light field salient object detection and semantic *** results demonstrate that LF-GPE can effectively learn high-quality light field features and achieve highly competitive performance in pixel-level segmentation tasks.
The classification loss functions used in deep neural network classifiers can be grouped into two categories based on maximizing the margin in either Euclidean or angular spaces. Euclidean distances between sample vec...
详细信息
1Introduction The satisfiability(SAT)problem has been considered the"seed"of other NP-complete *** regular partial exact(k,d)-SAT problem is an important extension of the SAT *** any(k,d)-CNF formula with a ...
详细信息
1Introduction The satisfiability(SAT)problem has been considered the"seed"of other NP-complete *** regular partial exact(k,d)-SAT problem is an important extension of the SAT *** any(k,d)-CNF formula with a variable set V,V'is a proper subset of V,the problem involves determining whether a truth assignment set on V'exists such that only a literal in each clause is *** V'=V,it is a regular exact(k,d)-SAT ***,both experimental verifications and theoretical analyses of k-SAT problem have shown that the ratioα(clause constraint density)of the number of clauses m to the number of variables n is an important parameter affecting the satisfiability of the formula[1].However,the regular(k,d)-SAT problem has the same clause constraint density d/k.
With the rapid development of blockchain technology in the financial sector, the security of blockchain is being put to the test due to an increase in phishing fraud. Therefore, it is essential to study more effective...
详细信息
In recent years,the nuclear norm minimization(NNM)as a convex relaxation of the rank minimization has attracted great research *** assigning different weights to singular values,the weighted nuclear norm minimization(...
详细信息
In recent years,the nuclear norm minimization(NNM)as a convex relaxation of the rank minimization has attracted great research *** assigning different weights to singular values,the weighted nuclear norm minimization(WNNM)has been utilized in many ***,most of the work on WNNM is combined with the l 2-data-fidelity term,which is under additive Gaussian noise *** this paper,we introduce the L1-WNNM model,which incorporates the l 1-data-fidelity term and the regularization from *** apply the alternating direction method of multipliers(ADMM)to solve the non-convex minimization problem in this *** exploit the low rank prior on the patch matrices extracted based on the image non-local self-similarity and apply the L1-WNNM model on patch matrices to restore the image corrupted by impulse *** results show that our method can effectively remove impulse noise.
A large mode area multi-core orbital angular momentum(OAM)transmission fiber is designed and optimized by neural network and optimization *** neural network model has been established first to predict the optical prop...
详细信息
A large mode area multi-core orbital angular momentum(OAM)transmission fiber is designed and optimized by neural network and optimization *** neural network model has been established first to predict the optical properties of multi-core OAM transmission fibers with high accuracy and speed,including mode area,nonlinear coefficient,purity,dispersion,and effective index *** the trained neural network model is combined with different particle swarm optimization(PSO)algorithms for automatic iterative optimization of multi-core structures *** to the structural advantages of multi-core fiber and the automatic optimization process,we designed a number of multi-core structures with high OAM mode purity(>95%)and ultra-large mode area(>3000µm^(2)),which is larger by more than an order of magnitude compared to the conventional ring-core OAM transmission fibers.
In the context of cloud computing, the task scheduling issue has an immediate effect on service quality. Task scheduling is the process of assigning work to available resources based on requirements. The objective of ...
详细信息
Diagnosing individuals with autism spectrum disorder(ASD)accurately faces great chal-lenges in clinical practice,primarily due to the data's high heterogeneity and limited sample *** tackle this issue,the authors ...
详细信息
Diagnosing individuals with autism spectrum disorder(ASD)accurately faces great chal-lenges in clinical practice,primarily due to the data's high heterogeneity and limited sample *** tackle this issue,the authors constructed a deep graph convolutional network(GCN)based on variable multi‐graph and multimodal data(VMM‐DGCN)for ASD ***,the functional connectivity matrix was constructed to extract primary ***,the authors constructed a variable multi‐graph construction strategy to capture the multi‐scale feature representations of each subject by utilising convolutional filters with varying kernel ***,the authors brought the non‐imaging in-formation into the feature representation at each scale and constructed multiple population graphs based on multimodal data by fully considering the correlation between *** extracting the deeper features of population graphs using the deep GCN(DeepGCN),the authors fused the node features of multiple subgraphs to perform node classification tasks for typical control and ASD *** proposed algorithm was evaluated on the Autism Brain Imaging Data Exchange I(ABIDE I)dataset,achieving an accuracy of 91.62%and an area under the curve value of 95.74%.These results demon-strated its outstanding performance compared to other ASD diagnostic algorithms.
暂无评论