A novel framework for 3D reconstruction of buildings based on a single off-nadir satellite image is proposed in this paper. Compared with the traditional methods of reconstruction using multiple images in remote sensi...
详细信息
A novel framework for 3D reconstruction of buildings based on a single off-nadir satellite image is proposed in this paper. Compared with the traditional methods of reconstruction using multiple images in remote sensing, recovering 3D information that utilizes the single image can reduce the demands of reconstruction tasks from the perspective of input data. It solves the problem that multiple images suitable for traditional reconstruction methods cannot be acquired in some regions, where remote sensing resources are scarce. However, it is difficult to reconstruct a 3D model containing a complete shape and accurate scale from a single image. The geometric constraints are not sufficient as the view-angle, size of buildings, and spatial resolution of images are different among remote sensing images. To solve this problem, the reconstruction framework proposed consists of two convolutional neural networks: Scale-Occupancy-network (Scale-ONet) and model scale optimization network (Optim-Net). Through reconstruction using the single off-nadir satellite image, Scale-Onet can generate water-tight mesh models with the exact shape and rough scale of buildings. Meanwhile, the Optim-Net can reduce the error of scale for these mesh models. Finally, the complete reconstructed scene is recovered by Model-Image matching. Profiting from well-designed networks, our framework has good robustness for different input images, with different view-angle, size of buildings, and spatial resolution. Experimental results show that an ideal reconstruction accuracy can be obtained both on the model shape and scale of buildings.
A DCNN-based crack segmentation methodology is proposed by leveraging heterogeneous image fusion to alleviate image-related disturbances in intensity or range image data and mitigate uncertainties through crossdomain ...
详细信息
A DCNN-based crack segmentation methodology is proposed by leveraging heterogeneous image fusion to alleviate image-related disturbances in intensity or range image data and mitigate uncertainties through crossdomain (i.e., intensity and range data domains) feature correlation. Intensity and range images are captured from concrete roadways and integrated through data fusion. Three encoder-decoder networks representing different patterns on exploiting the image data (i.e., fused raw image, raw range image, filtered range image, and raw intensity image) are proposed and compared to benchmarks. Experimental results demonstrate the proposed DCNN exploiting the fused raw image through an ?extract-fuse? pattern achieves the most robust and accurate performance on crack segmentation among the implemented DCNNs.
Handwritten mathematical expression recognition (HMER) is an important research direction in handwriting recognition. The performance of HMER suffers from the two-dimensional structure of mathematical expressions (MEs...
详细信息
ISBN:
(纸本)9781728199665
Handwritten mathematical expression recognition (HMER) is an important research direction in handwriting recognition. The performance of HMER suffers from the two-dimensional structure of mathematical expressions (MEs). To address this issue, in this paper, we propose a high-performance HMER model with scale augmentation and drop attention. Specifically, tackling ME with unstable scale in both horizontal and vertical directions, scale augmentation improves the performance of the model on MEs of various scales. An attention-based encoder-decoder network is used for extracting features and generating predictions. In addition, drop attention is proposed to further improve performance when the attention distribution of the decoder is not precise. Compared with previous methods, our method achieves state-of-the-art performance on two public datasets of CROHME 2014 and CROHME 2016.
Deep neural networks have been found to be easily misled by adversarial examples that are maliciously crafted by adding small perturbations. A variety of methods have been proposed to generate adversarial examples, bu...
详细信息
ISBN:
(纸本)9789881563903
Deep neural networks have been found to be easily misled by adversarial examples that are maliciously crafted by adding small perturbations. A variety of methods have been proposed to generate adversarial examples, but more efforts are needed to generate them with high perceptual quality and low computation costs. In this paper, we propose an adversarial attack method that uses a conditional encoder-decoder network named Image-To-Perturbation to generate adversarial perturbations in residual learning fashion. Image-To-Perturbation network can learn the mapping from clean images to according adversarial perturbations, once it is trained, it can generate perturbations for any examples efficiently. We test the proposed method on different target models using MNIST and CIFAR-10 datasets. The experimental results show that our model is easy to train and the generated adversarial examples are perceptually realistic and achieve high attack success rate.
Dynamic scene deblurring is a challenging problem due to the various blurry source. Many deep learning based approaches try to train end-to-end deblurring networks, and achieve successful performance. However, the arc...
详细信息
ISBN:
(纸本)9781509066315
Dynamic scene deblurring is a challenging problem due to the various blurry source. Many deep learning based approaches try to train end-to-end deblurring networks, and achieve successful performance. However, the architectures and parameters of these methods are unchanged after training, so they need deeper network architectures and more parameters to adapt different blurry images, which increase the computational complexity. In this paper, we propose a local correlation block (LCBlock), which can adjust the weights of features adaptively according to the blurry inputs. And we use it to construct a dynamic scene deblurring network named LCNet. Experimental results show that the proposed LC-Net produces compariable performance with shorter running time and smaller network size, compared to state-of-the-art learning-based methods.
As a logger of aircraft data, the black box is the most reliable and effective means of identifying the cause of an accident after an aircraft crash. An underwater acoustic beacon was installed in the black box to dea...
详细信息
ISBN:
(纸本)9781728154466
As a logger of aircraft data, the black box is the most reliable and effective means of identifying the cause of an accident after an aircraft crash. An underwater acoustic beacon was installed in the black box to deal with the black box positioning problem in the air accident at sea. The masking effect of ocean noise, coupled with the propagation loss of the ocean, causes the signal to attenuate seriously during long-distance propagation, which makes it very difficult to detect underwater signals. Inspired by the successful application of fully convolutional networks (FCN) in the field of pixel-level image classification, an encoder-decoder network with skip connnection layers, called "Unet", is proposed to enhance the underwater acoustic beacon signals represented by short-time Fourier transform (STFT) images. The experimental data show that the enhancement method based on FCN has higher signal gain than the conventional method based on adaptive line enhancer (ALE).
Agricultural growth is an important pathway in development of any country. Its productivity contributes in full filling the basic need of the human society. The productivity is therefore must be smoothen to provide qu...
详细信息
ISBN:
(纸本)9781728144580;9781728144566
Agricultural growth is an important pathway in development of any country. Its productivity contributes in full filling the basic need of the human society. The productivity is therefore must be smoothen to provide quality and quantity. Reduction in usage of chemicals like pesticides and herbicides to provide quality and quantity. The major factor that affect the quantity is the presences of weed in the crop field. The nutrient present in the soil is therefore observed by both the weed and the crop. Manual removal of weed from crop is tedious, time consuming and costly. Spraying of herbicides over the field affects the quality of the crop. Emergence of technology in the agriculture field paves the path for selective spraying and robot removal of weed, which requires high accuracy classification of crop from weed. Therefore an encoder-decoder architecture based on VGG16 architecture is used for the pixel-wise segmentation. The architecture consists of convolutional layers with ReLU, Normalization layer and max-pooling layer.
The analysis of glandular morphology is a crucial step to determine the presence and grade of cancer. The rise of computational pathology has led to the development of automated segmentation to overcome the time-consu...
详细信息
ISBN:
(纸本)9781728162157
The analysis of glandular morphology is a crucial step to determine the presence and grade of cancer. The rise of computational pathology has led to the development of automated segmentation to overcome the time-consuming manual segmentation. Although the existing encoder-decoder networks haved made significant progress, the downsample operation causes fine-grain information loss. It deteriorates boundaries' localization especially in malignant cases. In this paper, we propose a maximal information complemented refinement network based on UNet. We extend the skip connection with two information complement, aggregate spatial detail information by reuse low-level features, and introduce semantic information by high-level feature guidance. Besides, a weighted cross-entropy loss and generalized dice loss is used to tackle the fuzzy boundary and class imbalance. We evaluated our model against a dozen recent deep learning models on the 2015 MICCAI Gland Segmentation challenge (GlaS) dataset. Extensive experiments show that our proposal achieves the best overall performance, immensely improves the performance of malignant cases.
Attention over an observed image or natural sentence is run by spotting or locating the region or position of interest for pattern classification. The attention parameter is seen as a latent variable, which was indire...
详细信息
ISBN:
(纸本)9789881476883
Attention over an observed image or natural sentence is run by spotting or locating the region or position of interest for pattern classification. The attention parameter is seen as a latent variable, which was indirectly calculated by minimizing the classification loss. Using such an attention mechanism, the target information may not be correctly identified. Therefore, in addition to minimizing the classification error, we can directly attend the region of interest by minimizing the reconstruction error due to supporting data. Our idea is to learn how to attend through the so-called supportive attention when the supporting information is available. A new attention mechanism is developed to conduct the attentive learning for translation invariance which is applied for image caption. The derived information is helpful for generating caption from input image. Moreover, this paper presents an association network which does not only implement the word-to-image attention, but also carry out the image-to-image attention via self attention. The relations between image and text are sufficiently represented. Experiments on MS-COCO task show the benefit of the proposed supportive and self attentions for image caption with the key-value memory network.
Colorectal cancer is the second most common cancer globally. Its high mortality necessitates early polyp detection to mitigate the risk of the disease. However, conventional segmentation methods are susceptible to noi...
详细信息
Colorectal cancer is the second most common cancer globally. Its high mortality necessitates early polyp detection to mitigate the risk of the disease. However, conventional segmentation methods are susceptible to noise interference and have a limited accuracy in complex environments. To address these challenges, we propose GSCCANet with an encoder-dual decoder co-design. The encoder employs hybrid Transformer (MiT) for efficient multi-scale global feature extraction. Dual decoders collaborate via SAFM and REF-RA modules to enhance segmentation precision through global semantics and boundary refinement. In particular, SAFM enhances lesion coherence via channel-space attention fusion, while REF-RA strengthens low-contrast edge response using high-frequency gradients and reverse attention, optimized through progressive fusion. Additionally, combined Focal Loss and Weighted IoU Loss mitigate the problem of undetected small polyps. Experiments on five datasets show GSCCANet surpasses baselines. It achieves 94.7% mDice and 90.1% mIoU on CVC-ClinicDB (regular) and 80.1% mDice and 72.5% mIoU on ETIS-LaribPolypDB (challenging). Cross-domain tests (CVC-ClinicDB → $$ \to $$ Kvasir) confirm strong adaptability with 0.2% mDice fluctuation. These results prove that GSCCANet offers high-precision and generalizable solutions through global–local synergy, edge enhancement, and efficient computation.
暂无评论