检索结果-内蒙古大学图书馆

On‐device audio‐visual multi‐person wake word spotting

CAAI Transactions on Intelligence Technology 2023年第4期8卷 1578-1589页

作者： Yidi Li Guoquan Wang Zhan Chen Hao Tang Hong Liu Key Laboratory of Machine Perception Peking UniversityShenzhen Graduate SchoolShenzhenChina College of Computer and Information Hefei University of TechnologyHefeiChina Computer Vision Lab ETH ZurichZurichSwitzerland

Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection ***,most audio‐visual wake word spotting models are only suitable for simple single‐speaker scenarios and require high computational *** development is hindered by complex multi‐person scenarios and computational limitations in mobile *** this paper,a novel audio‐visual model is proposed for on‐device multi‐person wake word ***,an attention‐based audio‐visual voice activity detection module is presented,which generates an attention score matrix of audio and visual representations to derive active speaker ***,the knowledge distillation method is introduced to transfer knowledge from the large model to the on‐device model to control the size of our ***,a new audio‐visual dataset,PKU‐KWS,is collected for sentence‐level multi‐person wake word *** results on the PKU‐KWS dataset show that this approach outperforms the previous state‐of‐the‐art methods.

关键词： audio‐visual fusion human‐computer interfacing speech processing

来源：评论

学校读者我要写书评

暂无评论

Fusion of data dimensionality reduction algorithms baced on category representation theory 4

Fusion of data dimensionality reduction algorithms baced on ...

引用

4th International Conference on computer vision, Application, and Algorithm, CVAA 2024

作者： Xu, Xiaoxiang Li, Fanzhang Zhang, Li School of Computer Science and Technology Joint International Research Laboratory of Machine Learning and Neuromorphic Computing Provincial Key Laboratory for Computer Information Processing Technology Soochow University Suzhou China

ISBN: (纸本)9781510687615

For a long time, people have believed that representation problems are one of the bottlenecks in the field of machine learning. Therefore, it is a long-term and exploratory work to study machine learning representation methods. Due to this, we use category theory to study the fusion representation of data dimensionality reduction. We propose the basic concept of category representation for data dimensionality reduction and provide a data dimensionality reduction fusion representation framework. We have conducted research and analysis on algorithms such as PCA, KPCA, and LDA, identified the essential connection between PCA, KPCA, and LDA and proposed a data dimensionality reduction fusion representation algorithm based on a data dimensionality reduction fusion representation framework. Finally, we demonstrated the feasibility of the proposed method through experiments. © 2025 SPIE.

关键词： Dimensionality reduction

来源：评论

学校读者我要写书评

暂无评论

Masked Generative Light Field Prompting for Pixel-Level Structure Segmentations

引用

Research 2024年第4期2024卷 533-544页

作者： Mianzhao Wang Fan Shi Xu Cheng Shengyong Chen The Engineering Research Center of Learning-Based Intelligent System(Ministry of Education) Tianjin University of TechnologyTianjin 300384China Key Laboratory of Computer Vision and System(Ministry of Education) Tianjin University of TechnologyTianjin 300384China School of Computer Science and Engineering Tianjin University of TechnologyTianjin 300384China

Pixel-level structure segmentations have attracted considerable attention,playing a crucial role in autonomous driving within the metaverse and enhancing comprehension in light field-based machine ***,current light field modeling methods fail to integrate appearance and geometric structural information into a coherent semantic space,thereby limiting the capability of light field transmission for visual *** this paper,we propose a general light field modeling method for pixel-level structure segmentation,comprising a generative light field prompting encoder(LF-GPE)and a prompt-based masked light field pretraining(LF-PMP)*** LF-GPE,serving as a light field backbone,can extract both appearance and geometric structural cues *** aligns these features into a unified visual space,facilitating semantic ***,our LF-PMP,during the pretraining phase,integrates a mixed light field and a multi-view light field *** prioritizes considering the geometric structural properties of the light field,enabling the light field backbone to accumulate a wealth of prior *** evaluate our pretrained LF-GPE on two downstream tasks:light field salient object detection and semantic *** results demonstrate that LF-GPE can effectively learn high-quality light field features and achieve highly competitive performance in pixel-level segmentation tasks.

关键词： prompt backbone integrate

来源：评论

学校读者我要写书评

暂无评论

Deep Simplex Classifier for Maximizing the Margin in Both Euclidean and Angular Spaces 23rd

Deep Simplex Classifier for Maximizing the Margin in Both...

引用

22nd Scandinavian Conference on Image Analysis, SCIA 2023

作者： Cevikalp, Hakan Saribas, Hasan Machine Learning and Computer Vision Laboratory Eskisehir Osmangazi Univerity Eskisehir Turkey Huawei Turkey R &D Center Istanbul Turkey

ISBN: (纸本)9783031314377

The classification loss functions used in deep neural network classifiers can be grouped into two categories based on maximizing the margin in either Euclidean or angular spaces. Euclidean distances between sample vectors are used during classification for the methods maximizing the margin in Euclidean spaces whereas the Cosine similarity distance is used during the testing stage for the methods maximizing margin in the angular spaces. This paper introduces a novel classification loss that maximizes the margin in both the Euclidean and angular spaces at the same time. This way, the Euclidean and Cosine distances will produce similar and consistent results and complement each other, which will in turn improve the accuracies. The proposed loss function enforces the samples of classes to cluster around the centers that represent them. The centers approximating classes are chosen from the boundary of a hypersphere, and the pairwise distances between class centers are always equivalent. This restriction corresponds to choosing centers from the vertices of a regular simplex. There is not any hyperparameter that must be set by the user in the proposed loss function, therefore the use of the proposed method is extremely easy for classical classification problems. Moreover, since the class samples are compactly clustered around their corresponding means, the proposed classifier is also very suitable for open set recognition problems where test samples can come from the unknown classes that are not seen in the training phase. Experimental studies show that the proposed method achieves the state-of-the-art accuracies on open set recognition despite its simplicity. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

关键词： computer vision

来源：评论

学校读者我要写书评

暂无评论

Exact satisfiability and phase transition analysis of the regular(k,d)-CNF formula

引用

Frontiers of computer Science 2024年第1期18卷 263-265页

作者： Guoxia NIE Daoyun XU Xi WANG Zaijun ZHANG College of Computer Science and Technology Guizhou UniversityGuiyang 550025China School of Mathematics and Statistics Qiannan Normal University for NationalitiesDuyun 558000China Key Laboratory of Industrial Automation and Machine Vision of Qiannan Duyun 558000China

1Introduction The satisfiability(SAT)problem has been considered the"seed"of other NP-complete *** regular partial exact(k,d)-SAT problem is an important extension of the SAT *** any(k,d)-CNF formula with a variable set V,V'is a proper subset of V,the problem involves determining whether a truth assignment set on V'exists such that only a literal in each clause is *** V'=V,it is a regular exact(k,d)-SAT ***,both experimental verifications and theoretical analyses of k-SAT problem have shown that the ratioα(clause constraint density)of the number of clauses m to the number of variables n is an important parameter affecting the satisfiability of the formula[1].However,the regular(k,d)-SAT problem has the same clause constraint density d/k.

关键词： formula constraint exact

来源：评论

学校读者我要写书评

暂无评论

UCPM: Uncertainty-Guided Cross-Modal Retrieval with Partially Mismatched Pairs

引用

IEEE Transactions on Image Processing 2025年 34卷 3622-3634页

作者： Zha, Quanxing Liu, Xin Cheung, Yiu-Ming Peng, Shu-Juan Xu, Xing Wang, Nannan Huaqiao University Department of Computer Science Xiamen 361021 China Key Laboratory of Pattern Recognition and Computer Vision Xiamen 361021 China Huaqiao University Fujian Key Laboratory of Big Data Intelligence and Security Xiamen 361021 China Hong Kong Baptist University Department of Computer Science Hong Kong Huaqiao University Department of Artificial Intelligence Xiamen 361021 China Fujian Province University Key Laboratory of Computer Vision and Machine Learning (Huaqiao University) Xiamen 361021 China University of Electronic Science and Technology of China Center for Future Multimedia School of Computer Science and Engineering Chengdu 610051 China Xidian University State Key Laboratory of Integrated Services Networks Xi’an 710071 China

The manual annotation of perfectly aligned labels for cross-modal retrieval (CMR) is incredibly labor-intensive. As an alternative, the collection of co-occurring data pairs from the Internet is a remarkably cost-effective way, but which, inevitably induces the Partially Mismatched Pairs (PMPs) and therefore significantly degrades the retrieval performance without particular treatment. Previous efforts often utilize the pair-wise similarity to filter out the mismatched pairs, and such operation is highly sensitive to mismatched or ambiguous data and thus leads to sub-optimal performance. To alleviate these concerns, we propose an efficient approach, termed UCPM, i.e., Uncertainty-guided Cross-modal retrieval with Partially Mismatched pairs, which can significantly reduce the adverse impact of mismatched data pairs. Specifically, a novel Uncertainty Guided Division (UGD) strategy is sophisticatedly designed to divide the corrupted training data into confident matched (clean), easily-identifiable mismatched (noisy) and hardly-determined hard subsets, and the derived uncertainty can simultaneously guide the informative pair learning while reducing the negative impact of potential mismatched pairs. Meanwhile, an effective Uncertainty Self-Correction (USC) mechanism is concurrently presented to accurately identify and rectify the fluctuated uncertainty during the training process, which further improves the stability and reliability of the estimated uncertainty. Besides, a Trusted Margin Loss (TML) is newly designed to enhance the discriminability between those hard pairs, by dynamically adjusting their soft margins to amplify the positive contributions of matched pairs while suppressing the negative impacts of mismatched pairs. Extensive experiments on three widely-used benchmark datasets, verify the effectiveness and reliability of UCPM compared with the existing SOTA approaches, and significantly improve the robustness in both synthetic and real-world PMPs. The code i

关键词： Cross-modal retrieval partially mismatched pairs trusted margin loss uncertainty guided division uncertainty self-correction

来源：评论

学校读者我要写书评

暂无评论

Phishing detection on Ethereum via transaction subgraphs embedding

IET Blockchain

引用

IET Blockchain 2023年第4期3卷 194-203页

作者： Lv, Haifeng Ding, Yong Guangxi Key Laboratory of Machine Vision and Intelligent Control WuZhou University Wuzhou China Guangxi Key Laboratory of Cryptography and Information Security School of Computer Science and Information Security Guilin University of Electronic Technology Guilin China Guangxi Colleges and Universities Key Laboratory of Industry Software Technology WuZhou University Wuzhou China

With the rapid development of blockchain technology in the financial sector, the security of blockchain is being put to the test due to an increase in phishing fraud. Therefore, it is essential to study more effective measures and better solutions. Graph models have been proven to provide abundant information for downstream assignments. In this study, a graph-based embedding classification method is proposed for phishing detection on Ethereum by modeling its transaction records using subgraphs. Initially, the transaction data of normal addresses and an equal number of confirmed phishing addresses are collected through web crawling. Multiple subgraphs using the collected transaction records are constructed, with each subgraph containing a target address and its nearby transaction network. To extract features of the addresses, a modified Graph2Vec model called imgraph2vec is designed, which considers block height, timestamp, and amount of transactions. Finally, the Extreme Gradient Boosting (XGBoost) algorithm is employed to detect phishing and normal addresses. The experimental results show that the proposed method achieves good performance in phishing detection, indicating the effectiveness of imgraph2vec in feature acquisition of transaction networks compared to existing models. © 2023 The Authors. IET Blockchain published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.

关键词： Graphic methods

来源：评论

学校读者我要写书评

暂无评论

IMPULSE NOISE REMOVAL BY L1 WEIGHTED NUCLEAR NORM MINIMIZATION

引用

Journal of Computational Mathematics 2023年第6期41卷 1171-1191页

作者： Jian Lu Yuting Ye Yiqiu Dong Xiaoxia Liu Yuru Zou Shenzhen Key Laboratory of Advanced Machine Learning and Applications College of Mathematics and StatisticsShenzhen UniversityShenzhen 518060China Guangdong Key Laboratory of Intelligent Information Processing Pazhou LabGuangzhou 510335China Department of Applied Mathematics and Computer Science Technical University of Denmark2800 Kgs.LyngbyDenmark

In recent years,the nuclear norm minimization(NNM)as a convex relaxation of the rank minimization has attracted great research *** assigning different weights to singular values,the weighted nuclear norm minimization(WNNM)has been utilized in many ***,most of the work on WNNM is combined with the l 2-data-fidelity term,which is under additive Gaussian noise *** this paper,we introduce the L1-WNNM model,which incorporates the l 1-data-fidelity term and the regularization from *** apply the alternating direction method of multipliers(ADMM)to solve the non-convex minimization problem in this *** exploit the low rank prior on the patch matrices extracted based on the image non-local self-similarity and apply the L1-WNNM model on patch matrices to restore the image corrupted by impulse *** results show that our method can effectively remove impulse noise.

关键词： Image denoising Weighted nuclear norm minimization l 1-data-fidelity term Low rank analysis Impulse noise

来源：评论

学校读者我要写书评

暂无评论

Ultra-large mode area multi-core orbital angular momentum transmission fiber designed by neural network and optimization algorithms

引用

Optoelectronics Letters 2023年第12期19卷 744-751页

作者： GU Zhiwei HUANG Wei ZHANG Ran FAN Junjie SONG Binbin Key Laboratory of Computer Vision and Systems(Ministry of Education) School of Computer Science and EngineeringTianjin University of TechnologyTianjin 300384China Engineering Research Center of Learning-Based Intelligent System(Ministry of Education) Tianjin University of TechnologyTianjin 300384China

A large mode area multi-core orbital angular momentum(OAM)transmission fiber is designed and optimized by neural network and optimization *** neural network model has been established first to predict the optical properties of multi-core OAM transmission fibers with high accuracy and speed,including mode area,nonlinear coefficient,purity,dispersion,and effective index *** the trained neural network model is combined with different particle swarm optimization(PSO)algorithms for automatic iterative optimization of multi-core structures *** to the structural advantages of multi-core fiber and the automatic optimization process,we designed a number of multi-core structures with high OAM mode purity(>95%)and ultra-large mode area(>3000µm^(2)),which is larger by more than an order of magnitude compared to the conventional ring-core OAM transmission fibers.

关键词： network fiber OAM

来源：评论

学校读者我要写书评

暂无评论

Discriminative Score Suppression for Weakly Supervised Video Anomaly Detection

Discriminative Score Suppression for Weakly Supervised Video...

引用

2025 IEEE/CVF Winter Conference on Applications of computer vision, WACV 2025

作者： Xu, Chen Li, Chunguo Xing, Hongjie College of Mathematics and Information Science Hebei University Hebei Key Laboratory of Machine Learning and Computational Intelligence Baoding071002 China School of Cyber Security and Computer Hebei University Baoding071000 China

ISBN: (纸本)9798331510831

Weakly supervised video anomaly detection (WSVAD) often relies on Multiple Instance learning (MIL). However, selecting only the most discriminative segments for training limits the model's ability to comprehensively detect anomalous events, particularly hard anomalies. To overcome this limitation, we propose the Discriminative Score Suppression (DSS) module. This module suppresses the discriminative scores of the most prominent anomalies, shifting the model's attention to less obvious but important hard anomalies. This approach guides the model to learn the critical features of hard anomalies, enabling a more comprehensive detection of anomalous events. Additionally, the Anomaly Score Refinement (ASR) module constructs a dissimilarity-based classifier by storing normal patterns as prototypes, and integrates this with a neural network classifier. By combining the anomaly scores from both classifiers, more accurate detection of true hard anomalies is achieved. A score-sensitive inner-bag loss function not only adjusts penalties based on anomaly scores but also ensures that the model avoids erroneous selections. Our method accurately detects various anomalies, including challenging and multi-segment anomalies, while minimizing false positives for normal events. Extensive experiments show that the proposed framework outperforms state-of-the-art methods on the UCF-Crime and XD-Violence datasets. © 2025 IEEE.

关键词： Supervised learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：