This paper addresses the problem of training a deep neural network for satellite image segmentation so that it can be deployed over images whose statistics differ from those used for training. For example, in postdisa...
详细信息
This paper addresses the problem of training a deep neural network for satellite image segmentation so that it can be deployed over images whose statistics differ from those used for training. For example, in postdisaster damage assessment, the tight time constraints make it impractical to train a network from scratch for each image to be segmented. We propose a convolutional encoder- decoder network able to learn visual representations of increasing semantic level as its depth increases, allowing it to generalize over a wider range of satellite images. Then, we propose two additional methods to improve the network performance over each specific image to be segmented. First, we observe that updating the batch normalization layers' statistics over the target image improves the network performance without human intervention. Second, we show that refining a trained network over a few samples of the image boosts the network performance with minimal human intervention. We evaluate our architecture over three data sets of satellite images, showing the state-of-the-art performance in binary segmentation of previously unseen images and competitive performance with respect to more complex techniques in a multiclass segmentation task.
The salient object detection is receiving more and more attention from researchers. An accurate saliency map will be useful for subsequent tasks. However, in most saliency maps predicted by existing models, the object...
详细信息
The salient object detection is receiving more and more attention from researchers. An accurate saliency map will be useful for subsequent tasks. However, in most saliency maps predicted by existing models, the objects regions are very blurred and the edges of objects are irregular. The reason is that the hand-crafted features are the main basis for existing traditional methods to predict salient objects, which results in different pixels belonging to the same object often being predicted different saliency scores. Besides, the convolutional neural network (CNN)-based models predict saliency maps at patch scale, which causes the objects edges of the output to be fuzzy. In this paper, we attempt to add an edge convolution constraint to a modified U-Net to predict the saliency map of the image. The network structure we adopt can fuse the features of different layers to reduce the loss of information. Our SalNet predicts the saliency map pixel-by-pixel, rather than at the patch scale as the CNN-based models do. Moreover, in order to better guide the network mining the information of objects edges, we design a new loss function based on image convolution, which adds an L1 constraint to the edge information of saliency map and ground-truth. Finally, experimental results reveal that our SalNet is effective in salient object detection task and is also competitive when compared with 11 state-of-the-art models.
Road scene understanding is a difficult task in autonomous driving. In this letter, we propose a novel deep encoder-decoder architecture for road scene understanding in an end-to-end manner. This core trainable unders...
详细信息
Road scene understanding is a difficult task in autonomous driving. In this letter, we propose a novel deep encoder-decoder architecture for road scene understanding in an end-to-end manner. This core trainable understanding engine includes an encoder network, a decoder network with two streams, and a pixel-level fusion network with classification layer. The encoder network is composed of the front-end model of the classical convolution neural network, VGGNet. The decoder network with two streams includes multi-scale skip connection modules to reduce the down-scaling effect. Finally, a fusion network fuses the two-level information from the two streams of the decoder network for precise pixel-level classification. Additionally, the convolution layer is added to each skip connection module to increase the depth of the architecture. Our architecture achieves outstanding performance on the publicly available Cam Vid dataset and significantly outperforms previous architectures. This deep architecture is ideal for road scene understanding.
Semantic segmentation is a challenging task as each pixel should be labeled accurately in the image. To improve the performance of semantic segmentation, some Fully Convolutional Network (FCN) based semantic segmentat...
详细信息
Semantic segmentation is a challenging task as each pixel should be labeled accurately in the image. To improve the performance of semantic segmentation, some Fully Convolutional Network (FCN) based semantic segmentation methods adopt a spatial pyramid pooling structure to enrich contextual information. Others employ an encoder-decoder architecture to recover object details gradually. In this paper, we propose a semantic segmentation framework which combines the benefits of these approaches. Specifically, we propose a Mixed Spatial Pyramid Pooling (MSPP) module based on region-based average pooling and dilated convolution to obtain dense multi-level contextual priors. To further refine the details of objects more effectively, we also propose a Global-Attention Fusion (GAF) module to provide global context as guidance for low-level features. Our proposed method achieves mIoU of 84.1% on PASCAL VOC 2012 dataset and 80.4% on Cityscapes dataset without using any post-processing or additional datasets for pretrained model. (C) 2020 Elsevier B.V. All rights reserved.
The automated whole breast ultrasound (AWBUS) is a new breast imaging technique that can depict the whole breast anatomy. To facilitate the reading of AWBUS images and support the breast density estimation, an automat...
详细信息
The automated whole breast ultrasound (AWBUS) is a new breast imaging technique that can depict the whole breast anatomy. To facilitate the reading of AWBUS images and support the breast density estimation, an automatic breast anatomy segmentation method for AWBUS images is proposed in this study. The problem at hand is quite challenging as it needs to address issues of low image quality, ill-defined boundary, large anatomical variation, etc. To address these issues, a new deep learning encoder-decoder segmentation method based on a self-co-attention mechanism is developed. The self-attention mechanism is comprised of spatial and channel attention module (SC) and embedded in the ResNeXt (i.e., Res-SC) block in the encoder path. A non-local context block (NCB) is further incorporated to augment the learning of high-level contextual cues. The decoder path of the proposed method is equipped with the weighted up-sampling block (WUB) to attain class-specific better up-sampling effect. Meanwhile, the co-attention mechanism is also developed to improve the segmentation coherence among two consecutive slices. Extensive experiments are conducted with comparison to several the state-of-the-art deep learning segmentation methods. The experimental results corroborate the effectiveness of the proposed method on the difficult breast anatomy segmentation problem on AWBUS images. (C) 2020 Elsevier B.V. All rights reserved.
A human Visual System (HVS) has the ability to pay visual attention, which is one of the many functions of the HVS. Despite the many advancements being made in visual saliency prediction, there continues to be room fo...
详细信息
A human Visual System (HVS) has the ability to pay visual attention, which is one of the many functions of the HVS. Despite the many advancements being made in visual saliency prediction, there continues to be room for improvement. Deep learning has recently been used to deal with this task. This study proposes a novel deep learning model based on a Fully Convolutional Network (FCN) architecture. The proposed model is trained in an end-to-end style and designed to predict visual saliency. The entire proposed model is fully training style from scratch to extract distinguishing features. The proposed model is evaluated using several benchmark datasets, such as MIT300, MIT1003, TORONTO, and DUT-OMRON. The quantitative and qualitative experiment analyses demonstrate that the proposed model achieves superior performance for predicting visual saliency.
Although autonomous driving have become applicable to the industry, the prevalent application of key techniques to the autonomous vehicles still needs to be refined. For instance, how to fast and accurately segment ro...
详细信息
Although autonomous driving have become applicable to the industry, the prevalent application of key techniques to the autonomous vehicles still needs to be refined. For instance, how to fast and accurately segment road markings in order to assist the next pedestrian path prediction and the creation of high-definition (HD) map respectively is useful for autonomous driving to be more practical. Current road marking segmentation mainly rely on the techniques of semantic segmentation of computer vision with encoder-decoder architecture. However, as demonstrated in this paper, the upsampling layer of convolutional neural networks with encoder-decoder architecture plays a significant role in the efficiency and accuracy of the road marking segmentation. The bilinear upsampling layer is fast due to its intrinsic simple interpolation but with less accuracy; on the contrary, the upsampling layer with offsets is relatively accurate but with more computational cost. Therefore, at least, in terms of prevalent application, efficiency, and accuracy, the upsampling layer of decoder of convolution neural networks should be paid more attention to for the next research work of autonomous driving.
An autonomous concrete crack inspection system is necessary for preventing hazardous incidents arising from deteriorated concrete surfaces. In this paper, we present a concrete crack detection framework to aid the pro...
详细信息
An autonomous concrete crack inspection system is necessary for preventing hazardous incidents arising from deteriorated concrete surfaces. In this paper, we present a concrete crack detection framework to aid the process of automated inspection. The proposed approach employs a deep convolutional neural network architecture for crack segmentation, while addressing the effect of gradient vanishing problem. A feature silencing module is incorporated in the proposed framework, capable of eliminating non-discriminative feature maps from the network to improve performance. Experimental results support the benefit of incorporating feature silencing within a convolutional neural network architecture for improving the network's robustness, sensitivity, and specificity. An added benefit of the proposed architecture is its ability to accommodate for the trade-off between specificity (positive class detection accuracy) and sensitivity (negative class detection accuracy) with respect to the target application. Furthermore, the proposed framework achieves a high precision rate and processing time than the state-of-the-art crack detection architectures.
Recent approaches for real-time semantic segmentation usually employ the encoder-decoder architecture as the backbone to generate a high-quality segmentation prediction. There has been a lot of research on designing e...
详细信息
ISBN:
(纸本)9781728137988
Recent approaches for real-time semantic segmentation usually employ the encoder-decoder architecture as the backbone to generate a high-quality segmentation prediction. There has been a lot of research on designing efficient encoding methods. However, enhancing the performance of components in decoder is also crucial for pixel-level recognition. In this paper, we propose a self-learned feature reconstruction (SFR) method and an offset-dilated feature fusion (ODFF) module to improve the prediction reconstruction capability of the decoder. Concretely, SFR can effectively reconstruct the high-resolution feature maps by recombining feature space, in which the space transformation matrix implicitly contained in a convolution layer can selectively highlight features at each position by leveraging the knowledge of label space in a self-learned way. Moreover, ODFF module can effectively fuse multilevel features with multiscale contextual information by feeding the feature maps into designed parallel offset-dilated convolutions, which enhances the feature representation capability of the decoder. Experiments on Cityscapes and CamVid datasets demonstrate the superior performance of our proposed methods embedded in ESPNet.
Timely and accurate change detection on satellite images by using computer vision techniques has been attracting lots of research efforts in recent years. Existing approaches based on deep learning frameworks have ach...
详细信息
Timely and accurate change detection on satellite images by using computer vision techniques has been attracting lots of research efforts in recent years. Existing approaches based on deep learning frameworks have achieved good performance for the task of change detection on satellite images. However, under the scenario of disjoint changed areas in various shapes on land surface, existing methods still have shortcomings in detecting all changed areas correctly and representing the changed areas boundary. To deal with these problems, we design a coarse-to-fine detection framework via a boundary-aware attentive network with a hybrid loss to detect the change in high resolution satellite images. Specifically, we first perform an attention guided encoder-decoder subnet to obtain the coarse change map of the bi-temporal image pairs, and then apply residual learning to obtain the refined change map. We also propose a hybrid loss to provide the supervision from pixel, patch, and map levels. Comprehensive experiments are conducted on two benchmark datasets: LEBEDEV and SZTAKI to verify the effectiveness of the proposed method and the experimental results show that our model achieves state-of-the-art performance.
暂无评论