检索结果-内蒙古大学图书馆

Hybrid Data-Free Knowledge Distillation

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Tang, Jialiang Chen, Shuo Gong, Chen School of Computer Science and Engineering Nanjing University of Science and Technology China Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education China Jiangsu Key Laboratory of Image and Video Understanding for Social Security China Center for Advanced Intelligence Project RIKEN Japan Department of Automation Institute of Image Processing and Pattern Recognition Shanghai Jiao Tong University China

Data-free knowledge distillation aims to learn a compact student network from a pre-trained large teacher network without using the original training data of the teacher network. Existing collection-based and generation-based methods train student networks by collecting massive real examples and generating synthetic examples, respectively. However, they inevitably become weak in practical scenarios due to the difficulties in gathering or emulating sufficient real-world data. To solve this problem, we propose a novel method called Hybrid Data-Free Distillation (HiDFD), which leverages only a small amount of collected data as well as generates sufficient examples for training student networks. Our HiDFD comprises two primary modules, i.e., the teacher-guided generation and student distillation. The teacher-guided generation module guides a Generative Adversarial Network (GAN) by the teacher network to produce high-quality synthetic examples from very few real-world collected examples. Specifically, we design a feature integration mechanism to prevent the GAN from overfitting and facilitate the reliable representation learning from the teacher network. Meanwhile, we drive a category frequency smoothing technique via the teacher network to balance the generative training of each category. In the student distillation module, we explore a data inflation strategy to properly utilize a blend of real and synthetic data to train the student network via a classifier-sharing-based feature alignment technique. Intensive experiments across multiple benchmarks demonstrate that our HiDFD can achieve state-of-the-art performance using 120 times less collected data than existing methods. Code is available at https://***/tangjialiang97/HiDFD. Copyright © 2024, The Authors. All rights reserved.

关键词： Students

Unifying Perplexing Behaviors in Modified BP Attributions through Alignment Perspective

学校读者我要写书评

暂无评论

arXiv 2025年

作者： Zheng, Guanhua Sang, Jitao Xu, Changsheng The University of Science and Technology of China Hefei230026 China The School of Computer and Information Technology The Beijing Key Laboratory of Traffic Data Analysis and Mining Beijing Jiaotong University Beijing100044 China The National Lab of Pattern Recognition Institute of Automation CAS Beijing100190 China The University of Chinese Academy of Sciences China

Attributions aim to identify input pixels that are relevant to the decision-making process. A popular approach involves using modified backpropagation (BP) rules to reverse decisions, which improves interpretability compared to the original gradients. However, these methods lack a solid theoretical foundation and exhibit perplexing behaviors, such as reduced sensitivity to parameter randomization, raising concerns about their reliability and highlighting the need for theoretical justification. In this work, we present a unified theoretical framework for methods like GBP, RectGrad, LRP, and DTD, demonstrating that they achieve input alignment by combining the weights of activated neurons. This alignment improves the visualization quality and reduces sensitivity to weight randomization. Our contributions include: (1) Providing a unified explanation for multiple behaviors, rather than focusing on just one. (2) Accurately predicting novel behaviors. (3) Offering insights into decision-making processes, including layer-wise information changes and the relationship between attributions and model decisions. © 2025, CC BY.

关键词：

A Semantic Segmentation Method of Buildings in Remote Sensing Image Based on Improved UNet 2

学校读者我要写书评

暂无评论

A Semantic Segmentation Method of Buildings in Remote Sensin...

2nd International Conference on Signal Image Processing and Communication, ICSIPC 2022

作者： Li, Zhongyu Liu, Yang Kuang, Yin Wang, Huajun Liu, Cheng College of Computer Science Chengdu Normal University Chengdu611130 China College of Geophysics Chengdu University of Technology Chengdu610059 China Key Laboratory of Pattern Recognition and Intelligent Information Processing of Sichuan Chengdu University Chengdu610106 China Artificial Intelligence Key Laboratory of Sichuan Province Zigong643000 China Key Laboratory of interior Layout optimization and Security Institutions of Higher Education of Sichuan Province Chengdu Normal University Sichuan Chengdu611130 China College of Movie and Media Sichuan Normal University Chengdu610066 China

ISBN: (纸本)9781510657694

Aiming at the problem of model instability and overfitting of deep neural networks with the deepening of the number of network layers, the current mainstream method is to use batch normalization (BN) to alleviate them. However, since the BN method is more sensitive to batch size when the batch size is small, the model performance will be poor. For a relatively large model, due to the limitation of video memory, the batch size cannot take a large value, limiting the model's performance. Because of the dependence of BN on batch size, this paper adopts group normalization (GN) instead of batch normalization (BN) in the UNet network to alleviate the impact of the model on batch size. Then experiments are carried out on the WHUBuilding dataset. The experimental results show that the improved model (UNet-GN) improves the mean intersection over union (MIoU) and mean pixel accuracy (MPA) by 10.66% and 1.65% respectively compared with the original model (UNet-BN). © 2022 SPIE.

关键词： Deep neural networks

DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial Attention Detection

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Fan, Cunhang Zhang, Hongyu Huang, Wei Xue, Jun Tao, Jianhua Yi, Jiangyan Lv, Zhao Wu, Xiaopei The Anhui Province Key Laboratory of Multimodal Cognitive Computation School of Computer Science and Technology Anhui University Hefei230601 China The National Laboratory of Pattern Recognition Institute of Automation Chinese Academy of Sciences Beijing100190 China Department of Automation Tsinghua University Beijing100190 China

Auditory Attention Detection (AAD) aims to detect target speaker from brain signals in a multi-speaker environment. Although EEG-based AAD methods have shown promising results in recent years, current approaches primarily rely on traditional convolutional neural network designed for processing Euclidean data like images. This makes it challenging to handle EEG signals, which possess non-Euclidean characteristics. In order to address this problem, this paper proposes a dynamical graph self-distillation (DGSD) approach for AAD, which does not require speech stimuli as input. Specifically, to effectively represent the non-Euclidean properties of EEG signals, dynamical graph convolutional networks are applied to represent the graph structure of EEG signals, which can also extract crucial features related to auditory spatial attention in EEG signals. In addition, to further improve AAD detection performance, self-distillation, consisting of feature distillation and hierarchical distillation strategies at each layer, is integrated. These strategies leverage features and classification results from the deepest network layers to guide the learning of shallow layers. Our experiments are conducted on two publicly available datasets, KUL and DTU. Under a 1-second time window, we achieve results of 90.0% and 79.6% accuracy on KUL and DTU, respectively. We compare our DGSD method with competitive baselines, and the experimental results indicate that the detection performance of our proposed DGSD method is not only superior to the best reproducible baseline but also significantly reduces the number of trainable parameters by approximately 100 times. Copyright © 2023, The Authors. All rights reserved.

关键词： Electroencephalography

Local Neighbor Propagation on Graphs for Robust Feature Matching

学校读者我要写书评

暂无评论

SSRN

SSRN 2023年

作者： Guo, Hanlin Xiao, Guobao Su, Lumei Zhou, Jiaxing Wang, Dahan Xiamen Key Laboratory of Frontier Electric Power Equipment and Intelligent Control School of Electrical Engineering and Automation Xiamen University of Technology China Fujian Key Laboratory of Sensing and Computing for Smart Cities School of Information Science and Engineering Xiamen University China College of Computer and Control Engineering Minjiang University China Fujian Key Laboratory of Pattern Recognition and Image Understanding School of Computer and Information Engineering Xiamen University of Technology China

Establishing reliable correspondences between two sets of feature points is a critical preprocessing step in many computer vision and pattern recognition tasks. In this paper, we propose a novel robust Local Neighbor Propagation on Graphs based feature matching (LNPG) method, to obtain good correspondences for feature matching. LNPG starts from a novel neighborhood graph construction strategy. The strategy leverages the spatial consistency constraint to generate a series of neighborhood sets, and employs the residual information to preserve the local neighborhood relationships of potential inliers (i.e., true matches). Subsequently, LNPG incorporates local neighbor propagation into the graph to enhance connections between data in different neighborhoods, by using the path-based similarity measurement and the adaptive graph partition. In addition, LNPG includes a novel consistency-filtering-based clustering algorithm for robust feature matching. This clustering algorithm introduces a reliable neighborhood consistency measure function and an effective cluster merging criterion for cluster detection and cluster merging during the clustering process, respectively. Overall, LNPG not only effectively distinguishes inliers from outliers, but also reliably classifies inliers into different transformation models between pairs of images. Experiments on publicly available datasets with different types of image transformations show the superiority of our LNPG in comparison with other state-of-the-art methods. © 2023, The Authors. All rights reserved.

关键词： Clustering algorithms

Riemannian Self-Attention Mechanism for SPD Networks

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Wang, Rui Wu, Xiao-Jun Li, Hui Kittler, Josef School of Artificial Intelligence and Computer Science Jiangnan University Wuxi214122 China Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence Jiangnan University China Centre for Vision Speech and Signal Processing University of Surrey GuildfordGU2 7XH United Kingdom

Symmetric positive definite (SPD) matrix has been demonstrated to be an effective feature descriptor in many scientific areas, as it can encode spatiotemporal statistics of the data adequately on a curved Riemannian manifold, i.e., SPD manifold. Although there are many different ways to design network architectures for SPD matrix nonlinear learning, very few solutions explicitly mine the geometrical dependencies of features at different layers. Motivated by the great success of self-attention mechanism in capturing long-range relationships, an SPD manifold self-attention mechanism (SMSA) is proposed in this paper using some manifold-valued geometric operations, mainly the Riemannian metric, Riemannian mean, and Riemannian optimization. Then, an SMSA-based geometric learning module (SMSA-GLM) is designed for the sake of improving the discrimination of the generated deep structured representations. Extensive experimental results achieved on three benchmarking datasets show that our modification against the baseline network further alleviates the information degradation problem and leads to improved accuracy. Copyright © 2023, The Authors. All rights reserved.

关键词： Network architecture

Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Liu, Xuannan Cui, Xing Li, Peipei Li, Zekun Huang, Huaibo Xia, Shuhan Zhang, Miaoxuan Zou, Yueying He, Ran School of Artificial Intelligence Beijing University of Posts and Telecommunications Beijing100876 China School of Computer Science University of California Santa Barbara United States State Key Laboratory of Multimodal Artificial Intelligence Systems CASIA New Laboratory of Pattern Recognition CASIA School of Artificial Intelligence University of Chinese Academy of Sciences Beijing100190 China

The rapid evolution of multimodal foundation models has led to significant advancements in cross-modal understanding and generation across diverse modalities, including text, images, audio, and video. However, these models remain susceptible to jailbreak attacks, which can bypass built-in safety mechanisms and induce the production of potentially harmful content. Consequently, understanding the methods of jailbreak attacks and existing defense mechanisms is essential to ensure the safe deployment of multimodal generative models in real-world scenarios, particularly in security-sensitive applications. To provide comprehensive insight into this topic, this survey reviews jailbreak and defense in multimodal generative models. First, given the generalized lifecycle of multimodal jailbreak, we systematically explore attacks and corresponding defense strategies across four levels: input, encoder, generator, and output. Based on this analysis, we present a detailed taxonomy of attack methods, defense mechanisms, and evaluation frameworks specific to multimodal generative models. Additionally, we cover a wide range of input-output configurations, including modalities such as Any-to-Text, Any-to-vision, and Any-to-Any within generative systems. Finally, we highlight current research challenges and propose potential directions for future research. The open-source repository corresponding to this work can be found at https://***/liuxuannan/Awesome-Multimodal-Jailbreak. Copyright © 2024, The Authors. All rights reserved.

关键词： Generative adversarial networks

Low-Resolution Action recognition for Tiny Actions Challenge

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Chen, Boyu Qiao, Yu Wang, Yali ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences China University of Chinese Academy of Sciences China Shanghai AI Laboratory Shanghai China SIAT Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society China

Tiny Actions Challenge focuses on understanding human activities in real-world surveillance. Basically, there are two main difficulties for activity recognition in this scenario. First, human activities are often recorded at a distance, and appear in a small resolution without much discriminative clue. Second, these activities are naturally distributed in a long-tailed way. It is hard to alleviate data bias for such heavy category imbalance. To tackle these problems, we propose a comprehensive recognition solution in this paper. First, we train video backbones with data balance, in order to alleviate overfitting in the challenge benchmark. Second, we design a dual-resolution distillation framework, which can effectively guide low-resolution action recognition by super-resolution knowledge. Finally, we apply model ensemble with post-processing, which can further boost performance on the long-tailed categories. Our solution ranks Top-1 on the leaderboard. Copyright © 2022, The Authors. All rights reserved.

关键词： Distillation

Handwritten Mathematical Expression recognition with Self-Attention

学校读者我要写书评

暂无评论

Handwritten Mathematical Expression Recognition with Self-At...

2021年第四届算法、计算和人工智能国际会议

作者： Xueke Chi Da-Han Wang Yuefeng Wu Yun Wu Fujian Key Laboratory of Pattern Recognition and Image Understanding School of Computer and Information Engineering Xiamen University of Technology

Attention-based encoder-decoder models have made great success on handwritten mathematical expression recognition in recent years. However, this kind of method has the problem of attention drift, because under the local attention mechanism based on RNN,the high similarity between coding features can cause attention confusion. To settle this problem, we propose an encoder-decoder model with self-attention, which captures the global information of the feature map and fuses the local information of the CNN as complementary features. Experiments are conducted on the CROHME2014 and CROHME 2016 competition datasets. The experimental results show that, when only using the official training dataset, the proposed method achieves recognition accuracies of51.98% and 50.74% on the CROHME2014 and CROHME2016 competition datasets, respectively, which outperforms the other methods significantly. The improvements demonstrate the effectiveness of the self-attention module.

关键词： offline recognition handwritten mathematical expression selfattention non-local