Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection ***,most audio‐visual wake word s...
详细信息
Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection ***,most audio‐visual wake word spotting models are only suitable for simple single‐speaker scenarios and require high computational *** development is hindered by complex multi‐person scenarios and computational limitations in mobile *** this paper,a novel audio‐visual model is proposed for on‐device multi‐person wake word ***,an attention‐based audio‐visual voice activity detection module is presented,which generates an attention score matrix of audio and visual representations to derive active speaker ***,the knowledge distillation method is introduced to transfer knowledge from the large model to the on‐device model to control the size of our ***,a new audio‐visual dataset,PKU‐KWS,is collected for sentence‐level multi‐person wake word *** results on the PKU‐KWS dataset show that this approach outperforms the previous state‐of‐the‐art methods.
For a long time, people have believed that representation problems are one of the bottlenecks in the field of machinelearning. Therefore, it is a long-term and exploratory work to study machinelearning representatio...
详细信息
Pixel-level structure segmentations have attracted considerable attention,playing a crucial role in autonomous driving within the metaverse and enhancing comprehension in light field-based machine ***,current light fi...
详细信息
Pixel-level structure segmentations have attracted considerable attention,playing a crucial role in autonomous driving within the metaverse and enhancing comprehension in light field-based machine ***,current light field modeling methods fail to integrate appearance and geometric structural information into a coherent semantic space,thereby limiting the capability of light field transmission for visual *** this paper,we propose a general light field modeling method for pixel-level structure segmentation,comprising a generative light field prompting encoder(LF-GPE)and a prompt-based masked light field pretraining(LF-PMP)*** LF-GPE,serving as a light field backbone,can extract both appearance and geometric structural cues *** aligns these features into a unified visual space,facilitating semantic ***,our LF-PMP,during the pretraining phase,integrates a mixed light field and a multi-view light field *** prioritizes considering the geometric structural properties of the light field,enabling the light field backbone to accumulate a wealth of prior *** evaluate our pretrained LF-GPE on two downstream tasks:light field salient object detection and semantic *** results demonstrate that LF-GPE can effectively learn high-quality light field features and achieve highly competitive performance in pixel-level segmentation tasks.
The classification loss functions used in deep neural network classifiers can be grouped into two categories based on maximizing the margin in either Euclidean or angular spaces. Euclidean distances between sample vec...
详细信息
1Introduction The satisfiability(SAT)problem has been considered the"seed"of other NP-complete *** regular partial exact(k,d)-SAT problem is an important extension of the SAT *** any(k,d)-CNF formula with a ...
详细信息
1Introduction The satisfiability(SAT)problem has been considered the"seed"of other NP-complete *** regular partial exact(k,d)-SAT problem is an important extension of the SAT *** any(k,d)-CNF formula with a variable set V,V'is a proper subset of V,the problem involves determining whether a truth assignment set on V'exists such that only a literal in each clause is *** V'=V,it is a regular exact(k,d)-SAT ***,both experimental verifications and theoretical analyses of k-SAT problem have shown that the ratioα(clause constraint density)of the number of clauses m to the number of variables n is an important parameter affecting the satisfiability of the formula[1].However,the regular(k,d)-SAT problem has the same clause constraint density d/k.
The manual annotation of perfectly aligned labels for cross-modal retrieval (CMR) is incredibly labor-intensive. As an alternative, the collection of co-occurring data pairs from the Internet is a remarkably cost-effe...
详细信息
The manual annotation of perfectly aligned labels for cross-modal retrieval (CMR) is incredibly labor-intensive. As an alternative, the collection of co-occurring data pairs from the Internet is a remarkably cost-effective way, but which, inevitably induces the Partially Mismatched Pairs (PMPs) and therefore significantly degrades the retrieval performance without particular treatment. Previous efforts often utilize the pair-wise similarity to filter out the mismatched pairs, and such operation is highly sensitive to mismatched or ambiguous data and thus leads to sub-optimal performance. To alleviate these concerns, we propose an efficient approach, termed UCPM, i.e., Uncertainty-guided Cross-modal retrieval with Partially Mismatched pairs, which can significantly reduce the adverse impact of mismatched data pairs. Specifically, a novel Uncertainty Guided Division (UGD) strategy is sophisticatedly designed to divide the corrupted training data into confident matched (clean), easily-identifiable mismatched (noisy) and hardly-determined hard subsets, and the derived uncertainty can simultaneously guide the informative pair learning while reducing the negative impact of potential mismatched pairs. Meanwhile, an effective Uncertainty Self-Correction (USC) mechanism is concurrently presented to accurately identify and rectify the fluctuated uncertainty during the training process, which further improves the stability and reliability of the estimated uncertainty. Besides, a Trusted Margin Loss (TML) is newly designed to enhance the discriminability between those hard pairs, by dynamically adjusting their soft margins to amplify the positive contributions of matched pairs while suppressing the negative impacts of mismatched pairs. Extensive experiments on three widely-used benchmark datasets, verify the effectiveness and reliability of UCPM compared with the existing SOTA approaches, and significantly improve the robustness in both synthetic and real-world PMPs. The code i
With the rapid development of blockchain technology in the financial sector, the security of blockchain is being put to the test due to an increase in phishing fraud. Therefore, it is essential to study more effective...
详细信息
In recent years,the nuclear norm minimization(NNM)as a convex relaxation of the rank minimization has attracted great research *** assigning different weights to singular values,the weighted nuclear norm minimization(...
详细信息
In recent years,the nuclear norm minimization(NNM)as a convex relaxation of the rank minimization has attracted great research *** assigning different weights to singular values,the weighted nuclear norm minimization(WNNM)has been utilized in many ***,most of the work on WNNM is combined with the l 2-data-fidelity term,which is under additive Gaussian noise *** this paper,we introduce the L1-WNNM model,which incorporates the l 1-data-fidelity term and the regularization from *** apply the alternating direction method of multipliers(ADMM)to solve the non-convex minimization problem in this *** exploit the low rank prior on the patch matrices extracted based on the image non-local self-similarity and apply the L1-WNNM model on patch matrices to restore the image corrupted by impulse *** results show that our method can effectively remove impulse noise.
A large mode area multi-core orbital angular momentum(OAM)transmission fiber is designed and optimized by neural network and optimization *** neural network model has been established first to predict the optical prop...
详细信息
A large mode area multi-core orbital angular momentum(OAM)transmission fiber is designed and optimized by neural network and optimization *** neural network model has been established first to predict the optical properties of multi-core OAM transmission fibers with high accuracy and speed,including mode area,nonlinear coefficient,purity,dispersion,and effective index *** the trained neural network model is combined with different particle swarm optimization(PSO)algorithms for automatic iterative optimization of multi-core structures *** to the structural advantages of multi-core fiber and the automatic optimization process,we designed a number of multi-core structures with high OAM mode purity(>95%)and ultra-large mode area(>3000µm^(2)),which is larger by more than an order of magnitude compared to the conventional ring-core OAM transmission fibers.
Weakly supervised video anomaly detection (WSVAD) often relies on Multiple Instance learning (MIL). However, selecting only the most discriminative segments for training limits the model's ability to comprehensive...
详细信息
暂无评论