Mining accurate Class Activation Maps (CAMs) is essential for weakly-supervised semantic segmentation (WSSS). However, the CAMs only activate the most discriminative semantic regions, which can severely affect the seg...
详细信息
ISBN:
(纸本)9783031723377;9783031723384
Mining accurate Class Activation Maps (CAMs) is essential for weakly-supervised semantic segmentation (WSSS). However, the CAMs only activate the most discriminative semantic regions, which can severely affect the segmentation results. Motivated by the observation that local image can capture more details, we propose a Dual-view Label Re-assignment (DLR) framework aiming at the problems of incomplete objects and unclear boundaries in pseudo-labels. Specifically, we first extract comprehensive features and intricate features from global views and local views. In order to take advantage of these features, we further incorporate two additional components, Local View Constraint (LVC) and Foreground-Background Contrast (FBC). LVC facilitates the complementary learning of global and local features through feature transfer loss. FBC enhances boundaries by intensifying the distinction between foreground and background. Experiments show the DLR achieve 1.5% and 3.9% mIoU improvements compared with other method on the validation set of PASCAL VOC 2012 and MS COCO 2014, respectively.
weakly-supervised semantic segmentation (WSSS) methods with image-level labels generally train a classification network to generate the Class Activation Maps (CAMs) as the initial coarse segmentation labels. However, ...
详细信息
weakly-supervised semantic segmentation (WSSS) methods with image-level labels generally train a classification network to generate the Class Activation Maps (CAMs) as the initial coarse segmentation labels. However, current WSSS methods still perform far from satisfactorily because their adopted CAMs (1) typically focus on partial discriminative object regions and (2) usually contain useless background regions. These two problems are attributed to the sole image-level supervision and aggregation of global information when training the classification networks. In this work, we propose the visual words learning module and hybrid pooling approach, and incorporate them in classification network to mitigate the above problems. In visual words learning module, we counter the first problem by enforcing the classification network to learn fine-grained visual word labels so that more object extents could be discovered. Specifically, the visual words are learned with a codebook, which could be updated via two proposed strategies, i.e. learning-based strategy and memory-bank strategy. The second drawback of CAMs is alleviated with the proposed hybrid pooling, which incorporates the global average and local discriminative information to simultaneously ensure object completeness and reduce background regions. We evaluated our methods on PASCAL VOC 2012 and MS COCO 2014 datasets. Without any extra saliency prior, our method achieved 70.6% and 70.7% mIoU on the val and test set of PASCAL VOC dataset, respectively, and 36.2% mIoU on the val set of MS COCO dataset, which significantly surpassed the performance of state-of-the-art WSSS methods.
Recent years have witnessed impressive advances in the area of weakly-supervised semantic segmentation (WSSS). However, most of existing approaches are based on class activation maps (CAMs), which suffer from the unde...
详细信息
Recent years have witnessed impressive advances in the area of weakly-supervised semantic segmentation (WSSS). However, most of existing approaches are based on class activation maps (CAMs), which suffer from the under-segmentation problem (i.e., objects of interest are segmented partially). Although a number of literature works have been proposed to tackle this under-segmentation problem, we argue that these solutions built on CAMs may not be optimal for the WSSS task. Instead, in this paper we propose a network based on the object-aware activation map (OAM). The proposed network, termed OAM-Net, consists of four loss functions (foreground loss, background loss, average pixel and consistency loss) which ensure exactness, completeness, compactness and consistency of segmented objects via adversarial training. Compared to conventional CAM-based methods, our OAM-Net overcomes the under-segmentation drawback and significantly improves segmentation accuracy with negligible computational cost. A thorough comparison between OAM-Net and CAM-based approaches is carried out on the PASCAL VOC2012 dataset, and experimental results show that our network outperforms state-of-the-art approaches by a large margin. The code will be available soon.
weakly-supervised semantic segmentation (WSSS), which aims to train segmentation models solely using image-level labels, has achieved significant attention. Existing methods primarily focus on generating high-quality ...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
weakly-supervised semantic segmentation (WSSS), which aims to train segmentation models solely using image-level labels, has achieved significant attention. Existing methods primarily focus on generating high-quality pseudo labels using available images and their image-level labels. However, the quality of pseudo labels degrades significantly when the size of available dataset is limited. Thus, in this paper, we tackle this problem from a different view by introducing a novel approach called Image Augmentation with Controlled Diffusion (IACD). This framework effectively augments existing labeled datasets by generating diverse images through controlled diffusion, where the available images and image-level labels are served as the controlling information. Moreover, we also propose a high-quality image selection strategy to mitigate the potential noise introduced by the randomness of diffusion models. In the experiments, our proposed IACD approach clearly surpasses existing state-of-the-art methods. This effect is more obvious when the amount of available data is small, demonstrating the effectiveness of our method.
Generation methods for reliable class activation maps (CAMs) are essential for weakly-supervised semantic segmentation. These methods usually face the challenge of incomplete and inaccurate CAMs due to intra-class inc...
详细信息
ISBN:
(纸本)9789819985456;9789819985463
Generation methods for reliable class activation maps (CAMs) are essential for weakly-supervised semantic segmentation. These methods usually face the challenge of incomplete and inaccurate CAMs due to intra-class inconsistency of final features and inappropriate use of deep-level ones. To alleviate these issues, we propose the Global Consistency Enhancement Network (GCENet) that consists of Middle-level feature Auxiliary Module (MAM), Intra-class Consistency Enhancement Module (ICEM), and Critical Region Suppression Module (CRSM). Specifically, MAM uses middle-level features which carry clearer edges information and details to enhance output features. Then, for the problem of incomplete class activation maps caused by the high variance of local context of the image, ICEM is proposed to enhance the representation of features. It takes into account the intra-class global consistency and the local particularity. Furthermore, CRSM is proposed to solve the problem of excessive CAMs caused by the over-activation of features. It activates the low-discriminative regions appropriately, thus improving the quality of class activation maps. Through our comprehensive experiments, our method outperforms all other competitors and well demonstrates its effectiveness on the PASCAL VOC2012 dataset.
In this paper, we propose the co-attention dictionary network (CODNet) for weakly-supervised semantic segmentation using only image-level class labels. The CODNet model exploits extra semantic information by jointly l...
详细信息
In this paper, we propose the co-attention dictionary network (CODNet) for weakly-supervised semantic segmentation using only image-level class labels. The CODNet model exploits extra semantic information by jointly leveraging a pair of samples with common semantics through co-attention rather than processing them independently. The inter-sample similarities of spatially distributed deep features are computed to merge reference features through non-local connections. To discover similar patterns regardless of appearance variations, we propose to extract image representations by equipping the neural networks with dictionary learning which provides the universal basis elements for different images. Based on the CODNet model, we propose a multi-reference class activation map (MR-CAM) algorithm which generates semanticsegmentation masks for a target image by jointly merging semantic cues from multiple reference images. Experimental results on the PASCAL VOC 2012 and MSCOCO benchmark data sets for weakly-supervised semantic segmentation show that the proposed algorithm performs favorably against the state-of-the-art methods.(c) 2021 Elsevier B.V. All rights reserved.
Single-stage weakly-supervised semantic segmentation (WSSS) with image-level labels has become a new research hotspot in the community for its lower cost and higher training efficiency. However, the pseudo label of WS...
详细信息
ISBN:
(纸本)9781665468916
Single-stage weakly-supervised semantic segmentation (WSSS) with image-level labels has become a new research hotspot in the community for its lower cost and higher training efficiency. However, the pseudo label of WSSS generally suffers from somewhat noise, which limits the segmentation performance. In this paper, to explore the integral foreground activation, we propose the Channel Suppression (CS) module for preventing only activating the most discriminative regions, thereby improving the initial pseudo labels. To rectify the incorrect prediction, we explore the Self-Attention Prediction Correction (SAPC) module, which adaptively generates the category-wise prediction rectification weights. After extensive experiments, the proposed efficient single-stage framework achieves excellent performance with 67.6% mIoU and 39.9% mIoU on PASCAL VOC 2012 and MS COCO 2014 datasets, significantly exceeding several recent single-stage methods.
In weakly-supervised semantic segmentation, obtaining the class activation maps for pseudo masks is crucial. Since multiple organs appear in the same medical image, it is reasonable to obtain the activation maps of ea...
详细信息
In weakly-supervised semantic segmentation, obtaining the class activation maps for pseudo masks is crucial. Since multiple organs appear in the same medical image, it is reasonable to obtain the activation maps of each organ by the organ-level features instead of the image-level features. The image-level features are decomposed into the organ-level features, yet the prior anatomical knowledge makes a spurious association between the image-level and organ-level features. To this end, we apply the causal intervention to cut off the spurious association and propose a novel deconfounded multi-organ weakly-supervised semantic segmentation (DeMos) method. Based on the original class activation mapping (CAM) method, the model is retrained to learn the deconfounded features of each organ via cross-attention, and we approximate the expectation of the intervention instead of the traditional likelihood. When the model converges, we extract the activation maps by CAM. Our method not only generates high-quality pseudo masks on the CHAOS, ACDC and ProMRI datasets, but is also applicable to other CAM variants. Furthermore, with the refinement, DeMos achieves the dice similarity coefficient of 93.26% on the task of the left ventricle segmentation, which outperforms the state-of-the-art methods.
It is generally accepted that one of the critical parts of current vision algorithms based on deep learning and convolutional neural networks is the annotation of a sufficient number of images to achieve competitive p...
详细信息
It is generally accepted that one of the critical parts of current vision algorithms based on deep learning and convolutional neural networks is the annotation of a sufficient number of images to achieve competitive performance. This is particularly difficult for semanticsegmentation tasks since the annotation must be ideally generated at the pixel level. weakly-supervised semantic segmentation aims at reducing this cost by employing simpler annotations that, hence, are easier, cheaper and quicker to produce. In this paper, we propose and assess a new weakly-supervised semantic segmentation approach making use of a novel loss function whose goal is to counteract the effects of weak annotations. To this end, this loss function comprises several terms based on partial cross-entropy losses, being one of them the Centroid Loss. This term induces a clustering of the image pixels in the object classes under consideration, whose aim is to improve the training of the segmentation network by guiding the optimization. The performance of the approach is evaluated against datasets from two different industry-related case studies: while one involves the detection of instances of a number of different object classes in the context of a quality control application, the other stems from the visual inspection domain and deals with the localization of images areas whose pixels correspond to scene surface points affected by a specific sort of defect. The detection results that are reported for both cases show that, despite the differences among them and the particular challenges, the use of weak annotations do not prevent from achieving a competitive performance level for both.
暂无评论