Adversarial Patch Attacks (APAs) induce prediction errors by inserting carefully crafted regions into images. This paper presents the first defence against APAs for deep networks that perform semantic segmentation of ...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Adversarial Patch Attacks (APAs) induce prediction errors by inserting carefully crafted regions into images. This paper presents the first defence against APAs for deep networks that perform semantic segmentation of scenes. We show that a conditional generator can be trained to produce patches on demand targeting specific classes and achieving superior performance versus conventional pixel-optimised patch attacks. We then leverage this generator along with the segmentation network as part of a generative adversarial network, which trains the model to ignore the adversarial patches produced by the generator, while simultaneously training the generator to produce updated patches to attack the fine-tuned network. We show that our process confers strong protection against adversarial patches, and that this protection generalises to traditional pixel-optimised adversarial patches.
In recent years, with the development of artificial intelligence technology, intelligent robots are more and more widely used in many fields. In this paper, an intelligent patrol wheeled robot based on image recogniti...
详细信息
In real-world few-shot image classification tasks the lack of abundant data makes training and testing very challenging. The classification model must learn the most meaningful features using only a few sample images ...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
In real-world few-shot image classification tasks the lack of abundant data makes training and testing very challenging. The classification model must learn the most meaningful features using only a few sample images without context knowledge. Here, interpretability methods for deep models are helpful for increased comprehensibility and verification. However, these advantages are limited without the ability to correct the model directly. Therefore, we propose an interpretable approach for few-shot object recognition that includes optional interactive teaching to close the feedback loop. We leverage pretrained vision transformers as backbones and a part-based inference particularly favors interpretability. We use a visual concept bank to translate semantic visual features between the human and the model. Even without any human interaction, our model performs competitively compared to state-of-the-art methods in few-shot image classification tasks. Beyond that, we demonstrate the benefits of our interactive interfaces. We show how they can significantly improve the robustness in fine-grained recognition tasks and help to quickly adapt the model without complex fine-tuning.
The development of brain-computer interfaces (BCI) has sparked significant interest in leveraging electroencephalography (EEG) data for diverse applications. One of the applications is age and gender prediction of ind...
详细信息
ISBN:
(纸本)9798350384826;9798350384819
The development of brain-computer interfaces (BCI) has sparked significant interest in leveraging electroencephalography (EEG) data for diverse applications. One of the applications is age and gender prediction of individuals. This paper introduces a novel approach that harnesses the potential of machine learning models, specifically Support Vector Machines (SVM) and Random Forest (RF), in conjunction with paralinguistic feature extraction to achieve highly accurate age and gender prediction. This study used a dataset consisting of EEG recordings from 64 subjects in a relaxed position, with both open and closed eyes. By extracting paralinguistic features from the EEG signals, subtle variations in brain activity associated with age and gender differences are captured. Through extensive experimentation and rigorous evaluation, the SVM model demonstrated exceptional performance, achieving an accuracy of 99.61% in both age and gender estimation. These remarkable results highlight the effectiveness of the proposed approach and the potential of paralinguistic feature extraction on EEG data as robust indicators of age and gender.
Human Action Recognition is one of the most applied research directions in the field of Computer vision, which is widely used in human-computer interaction, Augmented Reality (AR) technology, security monitoring, and ...
详细信息
ISBN:
(纸本)9798350350920
Human Action Recognition is one of the most applied research directions in the field of Computer vision, which is widely used in human-computer interaction, Augmented Reality (AR) technology, security monitoring, and other scenarios. However, due to the complexity of human action gestures, existing Human Action Recognition methods have certain deficiencies in dealing with variable human gestures and action information, and the accuracy needs to be improved. To improve the accuracy, We propose a multi-dimensional network model based on SC-LSTM(Skip-Connection + LSTM). First, a Temporal Feature Extraction Module is designed based on SC-LSTM, and a Spatial Feature Extraction Module is designed based on CNN and Multi-Attention Mechanism to extract potential human action features from both temporal and spatial dimensions, respectively. Then, a separate SC-LSTM classification network is utilized to process these spatio-temporal features to obtain the final HAR results. The experimental results show that compared to other algorithms, the present model can more fully utilize the information in the temporal dimension, and thus performs better in terms of HAR accuracy.
The environment perception technology of patrol robots within park areas faces challenges such as low accuracy in detecting small-scale targets and high false positive and false negative rates in target detection unde...
详细信息
ISBN:
(纸本)9798350367164;9798350367157
The environment perception technology of patrol robots within park areas faces challenges such as low accuracy in detecting small-scale targets and high false positive and false negative rates in target detection under low-light conditions. This paper addresses these issues by proposing several effective optimization methods. Extensive extensions and improvements are conducted on mainstream network models using the ApolloScape dataset to further enhance the accuracy of multi-target recognition within park environments. To overcome the limitations of a single sensor, a perception strategy based on the fusion of 3D and 2D detectors' image results is designed. The 3D detector adopts an improved Complex-YOLOv4 network, while the 2D detector uses an enhanced YOLOv5s network, primarily focusing on vehicles and pedestrians. Decision-level fusion of the two sensors effectively reduces false positives and false negatives in target detection under adverse environmental conditions. Experimental tests demonstrate that the proposed perception method achieves good real-time performance and accuracy in normal, adverse, and nighttime conditions, showcasing high robustness to changes in external environments.
Adapting computer vision algorithms for inspecting civil structures brings significant societal benefits. Images captured from civil structures often exhibit distinct overlap, typically to perform 3D reconstruction. I...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Adapting computer vision algorithms for inspecting civil structures brings significant societal benefits. Images captured from civil structures often exhibit distinct overlap, typically to perform 3D reconstruction. In this work, the potential of multiple overlapping views is harnessed for robust multi-view crack detection. A transformer approach, named MVCrackViT, is designed to use attention over multiple views, enabling point cloud crack segmentation from the views directly. To address quality issues such as motion blur, defocus, and low exposure commonly found in real-world data, artificial view corruption is applied to accomplish training from image data alone. With reasonable positional tolerance, a performance of approximately 90% clCloudIoU is achieved on a 3D crack dataset, the first of its kind. The powerful clCloudIoU metric is introduced to evaluate crack detection in 3D space.
Small jumping robots have great potential for application in the fields of rescue, detection and monitoring due to their small size, large load and high flexibility. In this paper, from the perspective of bionics, the...
详细信息
To achieve the recognition and positioning functions of indoor mobile robots under limited computing power conditions, a method based on color recognition for robot recognition and positioning is proposed. The global ...
详细信息
We propose a data augmentation strategy to improve Automatic Target Recognition (ATR) from Infrared (IR) imagery using vision transformers. Our method leverages external IR image repositories to select relevant sample...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
We propose a data augmentation strategy to improve Automatic Target Recognition (ATR) from Infrared (IR) imagery using vision transformers. Our method leverages external IR image repositories to select relevant samples that can boost the diversity of the training data. By doing so, we improve the model ability to learn from the challenging regions of the training feature space. Our approach uses attention-based explanations to identify under-represented regions in the feature space of the training data. We leverage this information to search for new samples that complement the current training data by covering the sparse gaps in the feature space. We evaluate the proposed approach on a public dataset with IR imagery of multiple targets. We show that our method achieves a significant 2% improvement in ATR performance compared to a baseline model trained without augmentation. We also show that our method outperforms other data augmentation techniques that do not consider under-represented regions. These results demonstrate the effectiveness of our approach in an infrared scenario, where there is high intraclass variance and large training sets are expensive to obtain.
暂无评论