Attention over an observed image or natural sentence is run by spotting or locating the region or position of interest for pattern classification. The attention parameter is seen as a latent variable, which was indire...
详细信息
ISBN:
(纸本)9789881476883
Attention over an observed image or natural sentence is run by spotting or locating the region or position of interest for pattern classification. The attention parameter is seen as a latent variable, which was indirectly calculated by minimizing the classification loss. Using such an attention mechanism, the target information may not be correctly identified. Therefore, in addition to minimizing the classification error, we can directly attend the region of interest by minimizing the reconstruction error due to supporting data. Our idea is to learn how to attend through the so-called supportive attention when the supporting information is available. A new attention mechanism is developed to conduct the attentive learning for translation invariance which is applied for image caption. The derived information is helpful for generating caption from input image. Moreover, this paper presents an association network which does not only implement the word-to-image attention, but also carry out the image-to-image attention via self attention. The relations between image and text are sufficiently represented. Experiments on MS-COCO task show the benefit of the proposed supportive and self attentions for image caption with the key-value memory network.
Hydronephrosis may lead to many potential diseases, and the diagnosis of hydronephrosis is time-consuming and laborious. To assist physicians in hydronephrosis diagnosis and treatment planning, an accurate and automat...
详细信息
ISBN:
(纸本)9781665442077
Hydronephrosis may lead to many potential diseases, and the diagnosis of hydronephrosis is time-consuming and laborious. To assist physicians in hydronephrosis diagnosis and treatment planning, an accurate and automatic kidney segmentation method is highly required in clinical practice. In recent years, deep convolutional neural networks such as Unet plays a key role in the field of image segmentation, but Unet itself cannot adjust the receptive field actively, which may result in poor attention to the characteristics of the segmented target. We propose an encoder-decoder network with weighted skip connections and the idea of hierarchical equal resolution that can manually control the receptive field. We evaluated our method by comparing it with various classical networks using a dataset of 1850 annotated images. The MPA of the model is 94.12 and the MIoU is 89.49, which outperformed other classical networks we compared to.
Structural image content recognition (SICR) aims to transcribe a two-dimensional structural image (e.g., mathematical expression, chemical formula, or music score) into a token sequence. Existing methods are mainly en...
详细信息
ISBN:
(纸本)9783031278174;9783031278181
Structural image content recognition (SICR) aims to transcribe a two-dimensional structural image (e.g., mathematical expression, chemical formula, or music score) into a token sequence. Existing methods are mainly encoder-decoder based and overlook the importance of feature selection and spatial relation extraction in the feature map. In this paper, we propose DEAL (shorted for Dynamic fEAture seLection) for SICR, which contains a dynamic feature selector and a spatial relation extractor as two cornerstone modules. Specifically, we propose a novel loss function and random exploration strategy to dynamically select useful image cells for target sequence generation. Further, we consider the positional and surrounding information of cells in the feature map to extract spatial relations. We conduct extensive experiments to evaluate the performance of DEAL. Experimental results show that DEAL outperforms other state-of-the-arts significantly.
Depth images of objects can be easily obtained by depth cameras, but they can only provide limited shape information. Current widely learning-based methods generate complete 3D shapes from images, but reconstructed 3D...
详细信息
This paper proposes an encoding-decoding network with a pyramidal representation module, which will be referred to as EDPNet, and is designed for efficient semantic image segmentation. On the one hand, during the enco...
详细信息
This paper proposes an encoding-decoding network with a pyramidal representation module, which will be referred to as EDPNet, and is designed for efficient semantic image segmentation. On the one hand, during the encoding process of the proposed EDPNet, the enhancement of the Xception network, i.e., Xception+ is employed as a backbone to learn the discriminative feature maps. The obtained discriminative features are then fed into the pyramidal representation module, from which the context-augmented features are learned and optimized by leveraging a multi-level feature representation and aggregation process. On the other hand, during the image restoration decoding process, the encoded semantic-rich features are progressively recovered with the assistance of a simplified skip connection mechanism, which performs channel concatenation between high-level encoded features with rich semantic information and low-level features with spatial detail information. The proposed hybrid representation employing the proposed encoding-decoding and pyramidal structures has a global-aware perception and captures fine-grained contours of various geographical objects very well with high computational efficiency. The performance of the proposed EDPNet has been compared against PSPNet, DeepLabv3, and U-Net, employing four benchmark datasets, namely eTRIMS, Cityscapes, PASCAL VOC2012, and CamVid. EDPNet acquired the highest accuracy of 83.6% and 73.8% mIoUs on eTRIMS and PASCAL VOC2012 datasets, while its accuracy on the other two datasets was comparable to that of PSPNet, DeepLabv3, and U-Net models. EDPNet achieved the highest efficiency among the compared models on all datasets.
Background and Objective: Computed Tomography (CT) has become an important clinical imaging modality, as well as the leading source of radiation dose from medical imaging procedures. Modern CT exams are usually led by...
详细信息
Background and Objective: Computed Tomography (CT) has become an important clinical imaging modality, as well as the leading source of radiation dose from medical imaging procedures. Modern CT exams are usually led by two quick orthogonal localization scans, which are used for patient positioning and diagnostic scan parameter definition. These two localization scans contribute to the patient dose but are not used for diagnosis purposes. In this study, we investigate the possibility of using deep learning models to reconstruct one localization scan image from the other, thus reducing the patient dose and simplifying the clinical workflow. Methods: We propose a modified encoder-decoder network and a scaled mixture loss function specifically for the focal task. In this study, 12,487 clinical abdominal exams were retrieved from a clinical medical imaging storage system and randomly split for training, validation, and test in the ratio of 7:1:2. Reconstructed images were compared with the ground truth in terms of location prediction error, profile prediction error, and attenuation prediction error. Results: The average location error, profile error, and attenuation error were 1.02 +/- 3.37 mm, 4.43 +/- 2.02%, and 6.2 +/- 2.94% for lateral prediction, and 6.46 +/- 6.43 mm, 3.9 +/- 2.32%, and 7.12 +/- 3.54% for AP prediction, respectively. Conclusions: We conclude that although the reconstructed abdominal CT localization images may lack some details on the internal organ structures, they could be used effectively for tube current modulation calculation and patient positioning purposes, leading to a reduction of radiation dose and scan time in clinical CT exams. (C) 2021 Elsevier B.V. All rights reserved.
Microseismic event picking is one of the key steps in seismic processing and imaging. Manually picking is a widely used way to pick the microseismic events, which is time-consuming. The standard short-term average/lon...
详细信息
Microseismic event picking is one of the key steps in seismic processing and imaging. Manually picking is a widely used way to pick the microseismic events, which is time-consuming. The standard short-term average/long-term average (STA/LTA) is a traditional method to pick the microseismic first arrivals, which would lead to inaccurate first-arrival picks in case of low signal-to-noise ratio (SNR). We developed a workflow to automatically pick the microseismic first arrivals by using the feature pyramid networks (FPNs). To train the proposed model, we first randomly select part of the microseismic traces and manually pick the time index of the first arrivals. Next, we segment every selected trace into two parts based on the time index of the manual picking and then assign each part a label. Afterward, we train the proposed fine-tuning FPN model by using the training data and the corresponding labels. It should be noticed that we proposed a loss function, named the point-aware loss, for solving the microseismic first-arrival picking issue. Finally, we predict the microseismic first arrivals by using the well-trained fine-tuning FPN model. The numerical examples demonstrate that our proposed model successfully identifies the microseismic first arrivals. The microseismic first arrivals predicted by using our proposed model are more robust and more accurate than those obtained by using the STA/LTA and the encoder-decoder network.
Background and Objectives: The image registration methods for deformable soft tissues utilize nonlinear transformations to align a pair of images precisely. In some situations, when there is huge gray scale difference...
详细信息
Background and Objectives: The image registration methods for deformable soft tissues utilize nonlinear transformations to align a pair of images precisely. In some situations, when there is huge gray scale difference or large deformation between the images to be registered, the deformation field tends to fold at some local voxels, which will result in the breakdown of the one-to-one mapping between images and the reduction of invertibility of the deformation field. In order to address this issue, a novel registration approach based on unsupervised learning is presented for deformable soft tissue image registration. Methods: A novel unsupervised learning based registration approach, which consists of a registration network, a velocity field integration module and a grid sampling module, is presented for deformable soft tissue image registration. The main contributions are: (1) A novel encoder-decoder network is presented for the evaluation of stationary velocity field. (2) A Jacobian determinant based penalty term (Jacobian loss) is developed to reduce the folding voxels and to improve the invertibility of the deformation field. Results and Conclusions: The experimental results show that a new pair of images can be accurately registered using the trained registration model. In comparison with the conventional state-of-the-art method, SyN, the invertibility of the deformation field, accuracy and speed are all improved. Compared with the deep learning based method, VoxelMorph, the proposed method improves the invertibility of the deformation field.
Recently, image deblurring task driven by the encoder-decoder network has made a tremendous amount of progress. However, these encoder-decoder-based networks still have two disadvantages: (1) due to the lack of feedba...
详细信息
Recently, image deblurring task driven by the encoder-decoder network has made a tremendous amount of progress. However, these encoder-decoder-based networks still have two disadvantages: (1) due to the lack of feedback mechanism in the decoder design, the reconstruction results of existing networks are still sub-optimal;(2) these networks introduce multiple modules, such as the self-attention mechanism, to improve the performance, which also increases the computational burden. To overcome these issues, this paper proposes a novel feedback-mechanism-based encoder-decoder network (namely, FMNet) that is equipped with two key components: (1) the feedback-mechanism-based decoder and (2) the dual gated attention module. To improve reconstruction quality, the feedback-mechanism-based decoder is proposed to leverage the feedback information via the feedback attention module, which adaptively selects useful features in the feedback path. To decrease the computational cost, an efficient dual gated attention module is proposed to perform the attention mechanism in the frequency domain twice, which improves deblurring performance while reducing the computational cost by avoiding redundant convolutions and feature channels. The superiority of FMNet in terms of both deblurring performance and computational efficiency is demonstrated via comparisons with state-of-the-art methods on multiple public datasets.
Medical image segmentation is fundamental for computer-aided diagnosis or surgery. Various attention modules are proposed to improve segmentation results, which exist some limitations for medical image segmentation, s...
详细信息
Medical image segmentation is fundamental for computer-aided diagnosis or surgery. Various attention modules are proposed to improve segmentation results, which exist some limitations for medical image segmentation, such as large computations, weak framework applicability, etc. To solve the problems, we propose a new attention module named FGAM, short for Feature Guided Attention Module, which is a simple but pluggable and effective module for medical image segmentation. The FGAM tries to dig out the feature representation ability in the encoder and decoder features. Specifically, the decoder shallow layer always contains abundant information, which is taken as a queryable feature dictionary in the FGAM. The module contains a parameter-free activator and can be deleted after various encoder-decoder networks' training. The efficacy of the FGAM is proved on various encoder-decoder models based on five datasets, including four publicly available datasets and one inhouse dataset.
暂无评论