Referring image segmentation (RIS) aims to segment the target object based on a natural language expression. the challenge lies in comprehending boththe image and the referring expression simultaneously, while establ...
详细信息
ISBN:
(纸本)9789819985395;9789819985401
Referring image segmentation (RIS) aims to segment the target object based on a natural language expression. the challenge lies in comprehending boththe image and the referring expression simultaneously, while establishing the alignment between these two modalities. Recently, the visual-language large-scale pre-trained model CLIP can well align the modalities. However, the alignment in these models is based on the global image. And RIS requires aligning global text features with local visual features, rather than global visual features. To this end, features extracted by CLIP can not be directly applied to RIS. In this paper, we propose a novel framework called Global Selection and Local Attention Network (GLNet), which builds upon CLIP. GLNet comprises two modules: Global Selection and Fusion Module (GSFM) and Local Attention Module (LAM). GSFM utilizes text information to adaptively select and fuse visual features from low-level and middle-level. LAM leverages attention mechanisms on both local visual features and local text features to establish relationships between objects and text. Extensive experiments demonstrate the exceptional performance of our proposed method in referring image segmentation. On RefCOCO+, GLNet achieves significant performance gains of +2.38%, +2.78%, and +2.50% on the three splits compared to SADLR.
Effective feature extraction is a key component in image recognition for robot vision. this paper presents an improved contrastive learning-based image feature extraction and classification model, termed SimCLR-Incept...
详细信息
In this paper we presented a novel approach withcomputervision and Machine/Deep Learning for building digital image albums and storytelling with images from historical newspapers. In total 456 newspaper pages have b...
详细信息
Semantic segmentation of Remote Sensing Images (RSIs) is an essential application for precision agriculture, environmental protection, and economic assessment. While UNet-based networks have made significant progress,...
详细信息
ISBN:
(纸本)9789819984619;9789819984626
Semantic segmentation of Remote Sensing Images (RSIs) is an essential application for precision agriculture, environmental protection, and economic assessment. While UNet-based networks have made significant progress, they still face challenges in capturing long-range dependencies and preserving fine-grained details. To address these limitations and improve segmentation accuracy, we propose an effective method, namely UAM-Net (UNet with Attention-based Multi-level feature fusion), to enhance global contextual understanding and maintain fine-grained information. To be specific, UAM-Net incorporates three key modules. Firstly, the Global Context Guidance Module (GCGM) integrates semantic information from the Pyramid Pooling Module (PPM) into each decoder stage. Secondly, the Triple Attention Module (TAM) effectively addresses feature discrepancies between the encoder and decoder. Finally, the computation-effective Linear Attention Module (LAM) seamlessly fuses coarse-level feature maps with multiple decoder stages. Withthe corporations of these modules, UAM-Net significantly outperforms the most state-of-the-art methods on two popular benchmarks.
Most existing unsupervised re-identification uses a clustering-based approach to generate pseudo-labels as supervised signals, allowing deep neural networks to learn discriminative representations without annotations....
详细信息
ISBN:
(纸本)9789819984619;9789819984626
Most existing unsupervised re-identification uses a clustering-based approach to generate pseudo-labels as supervised signals, allowing deep neural networks to learn discriminative representations without annotations. However, drawbacks in clustering algorithms and the absence of discriminatory ability early in training limit better performance seriously. A severe problem arises from path dependency, wherein noisy samples rarely have a chance to escape from their assigned clusters during iterative training. To tackle this challenge, we propose a novel label refinement strategy based on the stable cluster reconstruction. Our approach contains twomodules, the stable cluster reconstruction (SCR) module and the similarity recalculate (SR) module. It reconstructs more stable clusters and re-evaluates the relationship between samples and clearer cluster representatives, providing complementary information for pseudo labels at the instance level. Our proposed approach effectively improves unsupervised reID performance, achieving state-of-the-art performance on four benchmark datasets. Specifically, our method achieves 46.0% and 39.1% mAP on the challenging dataset VeRi776 and MSMT17.
Remarkable progress has been made in real-time semantic segmentation by leveraging lightweight backbone networks and auxiliary low-level training tasks. Despite several techniques have been proposed to mitigate accura...
详细信息
ISBN:
(纸本)9789819984343;9789819984350
Remarkable progress has been made in real-time semantic segmentation by leveraging lightweight backbone networks and auxiliary low-level training tasks. Despite several techniques have been proposed to mitigate accuracy degradation resulting from model reduction, challenging regions often exhibit substantial uncertainty values in segmentation results. To tackle this issue, we propose an effective structure named Uncertainty-aware Boundary Attention Network(UBANet). Specifically, we model the segmentation uncertainty via prediction variance during training and involve it as a regularization item into optimization objective to improve segmentation performance. Moreover, we employ uncertainty maps to investigate the role of low-level supervision in segmentation task. And we reveal that directly fusing high- and low-level features leads to the overshadowing of large-scale low-level features by the encompassing local contexts, thus hindering the synergy between the segmentation task and low-level tasks. To address this issue, we design a Low-level Guided Feature Fusion Module that avoids the direct fusion of high- and low-level features and instead employs low-level features as guidance for the fusion of multi-scale contexts. Extensive experiments demonstrate the efficiency and effectiveness of our proposed method by achieving the state-of-the-art latency-accuracy trade-off on Cityscapes and CamVid benchmark.
Self-supervised contrastive learning is popularly used to obtain powerful representation models. However, unlabeled data in the real world naturally exhibits a long-tailed distribution, making the traditional instance...
详细信息
ISBN:
(纸本)9789819985456;9789819985463
Self-supervised contrastive learning is popularly used to obtain powerful representation models. However, unlabeled data in the real world naturally exhibits a long-tailed distribution, making the traditional instance-wise contrastive learning unfair to tail samples. Recently, some improvements have been made from the perspective of model, loss, and data to make tail samples highly evaluated during training, but most of them explicitly or implicitly assume that the sample with a large loss is the tail. We argue that due to the lack of hard negatives, tail samples usually occupy a small loss at the initial stage of training, which will make them eliminated at the beginning of training. To address this issue, we propose a simple but effective two-stage learning scheme that decouples traditional contrastive learning to discover and enhance tail samples. Specifically, we identify the sample with a small loss in Stage I while a large loss in Stage II as the tail. Withthe discovered tail samples, we generate hard negatives for them based on their neighbors, which will balance the distribution of the hard negatives in training and help learn better representation. Additionally, we design the weight inversely proportional or proportional to the loss in each stage to achieve fairer training by reweighting. Extensive experiments on multiple unlabeled long-tailed datasets demonstrate the superiority of our DCL compared withthe state-of-the-art methods. the code will be released soon.
the safety production management of power transmission engineering is an important guarantee for improving the infrastructure work of the power grid, and safety helmets and reflective clothing play a crucial role in p...
详细信息
Face spoofing attacks have become an increasingly critical concern when face recognition is widely applied. However, attacking materials have been made visually similar to real human faces, making spoof clues hard to ...
详细信息
ISBN:
(纸本)9789819984688;9789819984695
Face spoofing attacks have become an increasingly critical concern when face recognition is widely applied. However, attacking materials have been made visually similar to real human faces, making spoof clues hard to be reliably detected. Previous methods have shown that auxiliary information extracted from the raw RGB data, including depth map, rPPG signal, HSV color space, etc., are promising ways to highlight the hidden spoofing details. In this paper, we consider extracting novel auxiliary information to expose hidden spoofing clues and remove scenarios specific, so as to help the neural network improve the generalization and interpretability of the model's decision. Considering that presenting faces from spoof mediums will introduce 3D geometry and texture differences, we propose a spoof-guided face decomposition network to disentangle a face image into the components of normal, albedo, light, and shading, respectively. Besides, we design a multi-stream fusion network, which effectively extracts features from the inherent imaging components and captures the complementarity and discrepancy between them. We evaluate the proposed method on various databases, i.e. CASIA-MFSD, Replay-Attack, MSU-MFSD, and OULU-NPU. the results show that our proposed method achieves competitive performance in both intra-dataset and inter-dataset evaluation protocols.
the application of object detection in intelligent logistics has received considerable attention. However, existing detector models face challenges such as high computational costs, slow detection speed, and difficult...
详细信息
ISBN:
(纸本)9789819985548;9789819985555
the application of object detection in intelligent logistics has received considerable attention. However, existing detector models face challenges such as high computational costs, slow detection speed, and difficulty in deployment on edge devices with limited computational resources. this paper proposes a novel real-time safety detector model (RMFFDet) based on YOLOv8s for forklift driving. A hardware-friendly FasterNeXt module is designed to optimize feature extraction and reduce the computational costs in the Backbone. Inspired by the work on RepGhost, a re-parameterization multiscale feature fusion Neck (RMFFNeck) is proposed in this paper. Reconstructing the Neck based on RMFFNeck improves the capture of contextual logistics background feature information while reducing the model parameters. Finally, the Wise-IoU (WIoU) is introduced as a bounding box regression loss combined with a dynamic non-monotonic focusing mechanism to improve the model's overall performance. Experiments show that RMFFDet achieves a mean Average Precision (mAP) of 95.2% on the KITTI dataset and 92.8% on the self-built Forklift-3k dataset. Compared to YOLOv8s, the model parameters are reduced by 34.5%. On the Jetson Nano edge platform and 640x640 input size, RMFFDet requires only 100.2ms inference time. RMFFDet offers an excellent trade-off between inference speed and detection accuracy. It meets the industrial requirements of logistics scenarios.
暂无评论