Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computervision. Current MOT trackers rely on accurate object detection results and precise matching of t...
详细信息
Data-free knowledge distillation aims to learn a compact student network from a pre-trained large teacher network without using the original training data of the teacher network. Existing collection-based and generati...
详细信息
Attributions aim to identify input pixels that are relevant to the decision-making process. A popular approach involves using modified backpropagation (BP) rules to reverse decisions, which improves interpretability c...
Aiming at the problem of model instability and overfitting of deep neural networks with the deepening of the number of network layers, the current mainstream method is to use batch normalization (BN) to alleviate them...
详细信息
Auditory Attention Detection (AAD) aims to detect target speaker from brain signals in a multi-speaker environment. Although EEG-based AAD methods have shown promising results in recent years, current approaches prima...
详细信息
Establishing reliable correspondences between two sets of feature points is a critical preprocessing step in many computervision and patternrecognition tasks. In this paper, we propose a novel robust Local Neighbor ...
详细信息
Symmetric positive definite (SPD) matrix has been demonstrated to be an effective feature descriptor in many scientific areas, as it can encode spatiotemporal statistics of the data adequately on a curved Riemannian m...
详细信息
The rapid evolution of multimodal foundation models has led to significant advancements in cross-modal understanding and generation across diverse modalities, including text, images, audio, and video. However, these m...
详细信息
Tiny Actions Challenge focuses on understanding human activities in real-world surveillance. Basically, there are two main difficulties for activity recognition in this scenario. First, human activities are often reco...
详细信息
Attention-based encoder-decoder models have made great success on handwritten mathematical expression recognition in recent years. However, this kind of method has the problem of attention drift, because under the loc...
详细信息
Attention-based encoder-decoder models have made great success on handwritten mathematical expression recognition in recent years. However, this kind of method has the problem of attention drift, because under the local attention mechanism based on RNN,the high similarity between coding features can cause attention confusion. To settle this problem, we propose an encoder-decoder model with self-attention, which captures the global information of the feature map and fuses the local information of the CNN as complementary features. Experiments are conducted on the CROHME2014 and CROHME 2016 competition datasets. The experimental results show that, when only using the official training dataset, the proposed method achieves recognition accuracies of51.98% and 50.74% on the CROHME2014 and CROHME2016 competition datasets, respectively, which outperforms the other methods significantly. The improvements demonstrate the effectiveness of the self-attention module.
暂无评论