Diabetes disease is prevalent worldwide, and predicting its progression is crucial. Several model have been proposed to predict such disease. Those models only determine the disease label, leaving the likelihood of de...
详细信息
Network pruning has a significant role in reducing network parameters and accelerating the inference time of the network. Some existing methods prune the network based on the frequency of the data, and finally obtain ...
详细信息
Network pruning has a significant role in reducing network parameters and accelerating the inference time of the network. Some existing methods prune the network based on the frequency of the data, and finally obtain a sub-network with high accuracy. However, according to our experimental analysis, different frequencies of information in the data contribute differently to the accuracy of the model, and using this information directly for pruning without making a selection will lead to incorrect results. We believe that pruning should retain the convolutional kernels in the network that process important information, while those kernels that process unimportant information should be removed. In this paper, we first investigate the meaning of each frequency band information in the spectrum and their contribution to the prediction accuracy of the network,and according to these results, we propose a new pruning method based on frequency response(PFR). Our PFR finds and removes the convolutional kernels in the network that specialize in processing unimportant information, resulting in a compact neural network model. PFR obtains significant experimental results on different datasets, for example, a 56.0% raduction of float points operations(FLOPs) on Res Net-50 and only 0.37% of Top-1 accuracy degradation on the Image Net dataset.
Text-based person search aims at locating a person described by natural language in uncropped scene images. Recent works for TBPS mainly focus on aligning multi-granularity vision and language representations, neglect...
Background Lip reading uses lip images for visual speech ***-learning-based lip reading has greatly improved performance in current datasets;however,most existing research ignores the significance of short-term tempor...
详细信息
Background Lip reading uses lip images for visual speech ***-learning-based lip reading has greatly improved performance in current datasets;however,most existing research ignores the significance of short-term temporal dependencies of lip-shape variations between adjacent frames,which leaves space for further improvement in feature *** This article presents a spatiotemporal feature fusion network(STDNet)that compensates for the deficiencies of current lip-reading approaches in short-term temporal dependency ***,to distinguish more similar and intricate content,STDNet adds a temporal feature extraction branch based on a 3D-CNN,which enhances the learning of dynamic lip movements in adjacent frames while not affecting spatial feature *** particular,we designed a local–temporal block,which aggregates interframe differences,strengthening the relationship between various local lip regions through multiscale *** incorporated the squeeze-and-excitation mechanism into the Global-Temporal Block,which processes a single frame as an independent unitto learn temporal variations across the entire lip region more ***,attention pooling was introduced to highlight meaningful frames containing key semantic information for the target *** Experimental results demonstrated STDNet's superior performance on the LRW and LRW-1000,achieving word-level recognition accuracies of 90.2% and 53.56%,*** ablation experiments verified the rationality and effectiveness of its *** The proposed model effectively addresses short-term temporal dependency limitations in lip reading,and improves the temporal robustness of the model against variable-length *** advancements validate the importance of explicit short-term dynamics modeling for practical lip-reading systems.
Code benchmarks such as HumanEval are widely adopted to evaluate capabilities of Large Language Models (LLMs), providing insights into their strengths and weaknesses. However, current benchmarks primarily exercise LLM...
详细信息
The rapid spread of fake news significantly impacts social cognition and media credibility, making the effective detection of fake news a critical issue. This paper proposes a fake news detection method based on a com...
详细信息
The integration of the contrastive learning paradigm into deep clustering has led to enhanced performance in image clustering. However, in existing researches, the samples in the class of the target may be still treat...
详细信息
With the development of deep learning in EEG-related tasks, the complexity of learning models has gradually increased. These complex models often result in long inference times, high energy consumption, and an increas...
详细信息
暂无评论