Recent studies on simultaneous localization and mapping (SLAM) have tended to employ implicit neural representation, which can improve the efficiency and robustness of SLAM system. However, these methodologies still f...
详细信息
Data-free knowledge distillation aims to learn a compact student network from a pre-trained large teacher network without using the original training data of the teacher network. Existing collection-based and generati...
详细信息
With the development of neural networks and the increasing popularity of automatic driving, the calibration of the LiDAR and the camera has attracted more and more attention. This calibration task is multi-modal, wher...
详细信息
DeepFakes blur the boundaries between reality and forgery, resulting in the collapse of exiting credit system, causing immeasurable consequences for national security and social order. Through analysis of existing fac...
DeepFakes blur the boundaries between reality and forgery, resulting in the collapse of exiting credit system, causing immeasurable consequences for national security and social order. Through analysis of existing face forgery techniques, it is found that most generation techniques rely on random noise distribution, and global information will be lost after up sampling. Therefore, we propose a deepfake detection algorithm based on improved MobileViT, which uses CNN local space biasing and the global space representation of the Transformer network to learn the local features and global representation of forged faces, respectively. Coordinate attention is introduced to obtain directional perception and position sensitive information, making the model locate synthetic traces of fake faces better and fusion local and global representation more effectively. For the improved generalization of the model, with the GELU activation function to solve the problem of neuron death. Our model achieved 96.2% on FF++(C23) datasets, and 93.7%,94.1%,96.3%,87.9% on DF, F2F, FS, and NT datasets, respectively. Comparing with previous methods, our model has shown detection robustness and better generalization.
To solve the problem of low accuracy of gait recognition in complex scenes, a novel skeleton-based gait recognition algorithm, GCGait, is proposed. Taking human posture as the input of gait feature, the interference c...
To solve the problem of low accuracy of gait recognition in complex scenes, a novel skeleton-based gait recognition algorithm, GCGait, is proposed. Taking human posture as the input of gait feature, the interference caused by wearing changes and other factors is reduced. To extract sufficient input features, multi-branch input is used in the early stage of the model. By introducing the multi-attention mechanism, the network can learn the semantic information of the non-directly connected joints, excavate the most discriminative features from complex videos, and further improve the recognition performance. In order to reduce the influence of cross view, the fusion loss function is used in the experiment. Experimental results show that the average recognition rate of the proposed algorithm on the CASIA-B dataset is improved by 5.2%, and the average recognition accuracy on the OU-MVLP dataset is increased by 66.3%, which proves the effectiveness of the proposed method.
Due to the similarity in mushroom features and the difficulty in distinguishing between poisonous and nonpoisonous varieties, mushrooms pose a threat to human health. To address the challenge of mushroom classificatio...
Due to the similarity in mushroom features and the difficulty in distinguishing between poisonous and nonpoisonous varieties, mushrooms pose a threat to human health. To address the challenge of mushroom classification and identification, this paper proposes a mushroom classification method based on residual networks. Firstly, a network architecture with multiple residual blocks is designed, and it is trained using an image dataset. Then, a transfer learning strategy is employed to initialize the network parameters from a pre-trained model, followed by fine-tuning to adapt to the mushroom classification task. Finally, multiple testing experiments are conducted to evaluate the effectiveness of the proposed method. The experimental results demonstrate excellent performance of the proposed method in mushroom classification tasks. Compared to traditional feature extraction methods, it can better capture the details and texture features of mushrooms, thereby improving classification accuracy. In conclusion, the mushroom classification method based on residual networks exhibits high accuracy and generalization capability. This method has potential applications in the field of mushroom classification, aiding in the better identification and differentiation of poisonous mushrooms, thereby protecting human health.
Gait planning of quadruped robots plays an important role in achieving less walking, including dynamic and static gait. In this article, a static and dynamic gait control method based on center of gravity stability ma...
Gait planning of quadruped robots plays an important role in achieving less walking, including dynamic and static gait. In this article, a static and dynamic gait control method based on center of gravity stability margin is proposed. Firstly, the robot model and kinematics modeling are introduced. Secondly, the robot’s foot static and dynamic gait were planned and the foot trajectory was designed. Finally, two types of gait of the robot were simulated using Vrep simulation software, and the differences in stability and speed between the coordinated gait with speed and stability in the static and dynamic gait of a 12 degree of freedom robot were analyzed, verifying the effectiveness of the gait control method proposed in this paper.
Remote sensing object detection is an important research area in computer vision, widely applied in both military and civilian domains. However, challenges in remote sensing image object detection such as large image ...
Remote sensing object detection is an important research area in computer vision, widely applied in both military and civilian domains. However, challenges in remote sensing image object detection such as large image sizes, complex backgrounds, and significant variations in target scales are prevalent. To address these issues, this paper proposes a new Feature Denoising and Fusion Module (FDFM) aimed at enhancing the accuracy and robustness of object detection. This module comprises a Multi-Scale Denoising Submodule(MDS) and an Attention Optimization Submodule(AOS). The Multi-Scale Denoising Module aims to suppress lower-level texture noise by utilizing higher-level semantic features before the fusion process, reducing the impact of lower-level noise on subsequent multi-scale feature fusion. Meanwhile, the Attention Optimization Module seeks to enhance the precision of self-attention computations within the Multi-Scale Denoising Module without increasing the parameter count. The efficacy of this method was evaluated on public datasets DOTA, VisDrone, VOC and COCO, showing improvements in comparison to baseline models.
Constructing the pyramidal architecture for the feature is currently a very effective way to obtain feature information of objects at different scales. Although the feature pyramid can realize the recognition and dete...
详细信息
ISBN:
(数字)9781728180281
ISBN:
(纸本)9781728180298
Constructing the pyramidal architecture for the feature is currently a very effective way to obtain feature information of objects at different scales. Although the feature pyramid can realize the recognition and detection of multi-scale objects in the object detection task well, it still has some limitations. Since the feature information of different levels is often not from the same layer of the network, it is difficult to obtain the feature of different objects information at a certain scale from a certain level feature map of the pyramid network. To solve this problem, we present a novel object detection architecture, named Enhanced Multi-scale Feature Fusion Pyramid Network (EMFFPNet). Our network consists of Enhanced Multi-scale Feature Fusion Module (EMFFM) and Predictor Optimization Module (POM). In EMFFM, Features at different levels can be fused into the Enhanced features as outputs, which are more representative and deterministic. In order to enable the enhanced features to play their respective roles in the pyramid network, we assign different weights to fusion features of different levels in POM. We perform the experiments on the COCO detection benchmark. The experimental results indicate that the performance of our model is much better than the state-of-the-art model.
Siamese-based trackers currently are the dominant tracking paradigm due to the balance between speed and performance. However, it is prone to drift and tracking failure when the environment is complex and similar obje...
详细信息
Siamese-based trackers currently are the dominant tracking paradigm due to the balance between speed and performance. However, it is prone to drift and tracking failure when the environment is complex and similar objects interfere. While the Siamese-based trackers perform the correlation operation, the responses of the target object and background appear in different channels, i.e., the feature spaces of the target object and background have some orthogonality. However, when meeting background clutters and similar objects interfere, this orthogonality becomes weaker and the wrong classification contribution of the object and the background reduces the stability of the learned similarity function, leading to many misclassified pixels in the heatmaps. In this work, we proposed a SiamORPN to solve the above issues. It is incorporated at two levels: an Orthogonal Region Proposal Network (ORPN) and an Adaptive Pixel-wise Aggregation (APA) module. Specifically, for ORPN, the orthogonality between the object and the background maximizes the inter-class inertia. Moreover, the ORPN introduces the orthogonal module to enhance this orthogonality. For APA, it introduces two lightweight networks to predict the weights of all pixels in different heatmaps and the weights of all pixels in different regression offsets. Experiments on challenging benchmarks, including OTB2015, VOT2016, VOT2018, GOT-10k test set, UAV123, LaSOT, and TrackingNet, demonstrate the proposed SiamORPN outperforms many SOTA trackers and achieves leading performance. The inference speed at GTX1080Ti can reach about 32 FPS, meeting the real-time requirements.
暂无评论