Medical image captioning is the process of generating clinically significant descriptions to medical images, which has many applications among which medical report generation is the most frequent one. In general, auto...
详细信息
ISBN:
(纸本)9781665469647
Medical image captioning is the process of generating clinically significant descriptions to medical images, which has many applications among which medical report generation is the most frequent one. In general, automatic captioning of medical images is of great interest for medical experts since it offers assistance in diagnosis, disease treatment and automating the workflow of the health practitioners. Recently, many efforts have been put forward to obtain accurate descriptions but medical image captioning still provides weak and incorrect descriptions. To alleviate this issue, it is important to explain why the model produced a particular caption based on some specific features. This is performed through Artificial Intelligence Explainability (XAI), which aims to unfold the 'black-box' feature of deeplearning based models. We present in this paper an explainable module for medical image captioning that provides a sound interpretation of our attention-based encoder-decoder model by explaining the correspondence between visual features and semantic features. We exploit for that, self-attention to compute word importance of semantic features and visual attention to compute relevant regions of the image that correspond to each generated word of the caption in addition to visualization of visual features extracted at each layer of the Convolutional Neural Network (CNN) encoder. We finally evaluate our model using the ImageCLEF medical captioning dataset.
Aims: Agriculture is one of the fundamental elements of human civilization. Crops and plant leaves are susceptible to many illnesses when grown for agricultural purposes. There may be less possibility of further harm ...
详细信息
This study investigates the efficacy of attention-based deep learning models for generating text-based medical reports from chest X-ray images. Four distinct models were developed and evaluated: a basic encoder-decode...
详细信息
This study investigates the efficacy of attention-based deep learning models for generating text-based medical reports from chest X-ray images. Four distinct models were developed and evaluated: a basic encoder-decoder model (Model 1), an encoder-decoder architecture using an attention mechanism (Model 2), a model incorporating spatial feature preservation (Model 3), and a model with a bidirectional GRU in the decoder (Model 4). We trained and evaluated these models using the Indiana University Chest X-ray dataset (Open-i), employing the BLEU score as the primary performance metric. Model 1, using a greedy search decoding strategy, achieved an average BLEU score of 0.619. Incorporating an attention mechanism in Model 2 resulted in a modest improvement, reaching a BLEU score of 0.667 with beam search decoding. Model 3, preserving spatial information during feature extraction, further enhanced performance, achieving a BLEU score of 0.718. Finally, Model 4, integrating a bidirectional GRU, yielded the highest performance with a BLEU score of 0.745. Our results highlight the significant impact of attention mechanisms and spatial feature preservation in generating more accurate and detailed medical reports. The study highlights the potential of deep learning models for automating medical report generation, paving the way for further research and development in this domain.
Attention has arguably become one of the most important concepts in the deep learning field. It is inspired by the biological systems of humans that tend to focus on the distinctive parts when processing large amounts...
详细信息
Attention has arguably become one of the most important concepts in the deep learning field. It is inspired by the biological systems of humans that tend to focus on the distinctive parts when processing large amounts of information. With the development of deep neural networks, attention mechanism has been widely used in diverse application domains. This paper aims to give an overview of the state-of-theart attention models proposed in recent years. Toward a better general understanding of attention mechanisms, we define a unified model that is suitable for most attention structures. Each step of the attention mechanism implemented in the model is described in detail. Furthermore, we classify existing attention models according to four criteria: the softness of attention, forms of input feature, input representation, and output representation. Besides, we summarize network architectures used in conjunction with the attention mechanism and describe some typical applications of attention mechanism. Finally, we discuss the interpretability that attention brings to deep learning and present its potential future trends. (c) 2021 Elsevier B.V. All rights reserved.
In this article, we propose a surface- and deep-level constraint-based pan- sharpening network, termed SDPNet, to address the pan-sharpening problem. Focusing on the two primary goals of pan-sharpening, i. e., spatial...
详细信息
In this article, we propose a surface- and deep-level constraint-based pan- sharpening network, termed SDPNet, to address the pan-sharpening problem. Focusing on the two primary goals of pan-sharpening, i. e., spatial and spectral information preservations, we first design two encoder-decoder networks to extract deep-level features from two types of source images, in addition to surface-level characteristics, as the enhanced information representation. The unique feature maps that characterize the unique information in source images can be obtained through the deep-level feature extraction. We further design a pan-sharpening network with densely connected blocks to strengthen feature propagation and reduce parameter number, where the unique feature maps are utilized to efficiently constrain the similarity between the pan-sharpened result and the ground truth, thus avoiding information distortion. Both qualitative and quantitative comparisons on the reduced-resolution and full-resolution source images demonstrate the advantages of our method over state-of-the-art methods. Our code is publicly available at https://***/hanna-xu/SDPNet.
An adversarial reinforced report-generation framework for chest x-ray images is proposed. Previous medical-report-generation models are mostly trained by minimizing the cross-entropy loss or further optimizing the com...
详细信息
An adversarial reinforced report-generation framework for chest x-ray images is proposed. Previous medical-report-generation models are mostly trained by minimizing the cross-entropy loss or further optimizing the common image-captioning metrics, such as CIDEr, ignoring diagnostic accuracy, which should be the first consideration in this area. Inspired by the generative adversarial network, an adversarial reinforcement learning approach is proposed for report generation of chest x-ray images considering both diagnostic accuracy and language fluency. Specifically, an accuracy discriminator (AD) and fluency discriminator (FD) are built that serve as the evaluators by which a report based on these two aspects is scored. The FD checks how likely a report originates from a human expert, while the AD determines how much a report covers the key chest observations. The weighted score is viewed as a "reward" used for training the report generator via reinforcement learning, which solves the problem that the gradient cannot be passed back to the generative model when the output is discrete. Simultaneously, these two discriminators are optimized by maximum-likelihood estimation for better assessment ability. Additionally, a multi-type medical concept fused encoder followed by a hierarchical decoder is adopted as the report generator. Experiments on two large radiograph datasets demonstrate that the proposed model outperforms all methods to which it is compared.
As an important type of science and technology service resource, energy consumption data play a vital role in the process of value chain integration between home appliance manufacturers and the state grid. Accurate el...
详细信息
As an important type of science and technology service resource, energy consumption data play a vital role in the process of value chain integration between home appliance manufacturers and the state grid. Accurate electricity consumption prediction is essential for demand response programs in smart grid planning. The vast majority of existing prediction algorithms only exploit data belonging to a single domain, i.e., historical electricity load data. However, dependencies and correlations may exist among different domains, such as the regional weather condition and local residential/industrial energy consumption profiles. To take advantage of cross-domain resources, a hybrid energy consumption prediction framework is presented in this paper. This framework combines the long short-term memory model with an encoder-decoder unit (ED-LSTM) to perform sequence-to-sequence forecasting. Extensive experiments are conducted with several of the most commonly used algorithms over integrated cross-domain datasets. The results indicate that the proposed multistep forecasting framework outperforms most of the existing approaches.
Few researches have been proposed specifically for real-time semantic segmentation in rainy environments. However, the demand in this area is huge and it is challenging for lightweight networks. Therefore, this paper ...
详细信息
Few researches have been proposed specifically for real-time semantic segmentation in rainy environments. However, the demand in this area is huge and it is challenging for lightweight networks. Therefore, this paper proposes a lightweight network which is specially designed for the foreground segmentation in rainy environments, named De-raining Semantic Segmentation Network (DRSNet). By analyzing the characteristics of raindrops, the MultiScaleSE Block is targetedly designed to encode the input image, it uses multi-scale dilated convolutions to increase the receptive field, and SE attention mechanism to learn the weights of each channels. To combine semantic information between different encoder and decoder layers, it is proposed to use Asymmetric Skip, that is, the higher semantic layer of encoder employs bilinear interpolation and the output passes through pointwise convolution, then added element-wise to the lower semantic layer of the decoder. According to the control experiments, the performances of MultiScaleSE Block and Asymmetric Skip compared with SEResNet18 and Symmetric Skip respectively are improved to a certain degree on the Foreground Accuracy index. The parameters and the floating point of operations (FLOPs) of DRSNet are only 0.54M and 0.20GFLOPs separately. The state-of-the-art results and real-time performances are achieved on both the UESTC all-day Scenery add rain (UAS-add-rain) and the Baidu People Segmentation add rain (BPS-add-rain) benchmarks with the input sizes of 192*128, 384*256 and 768*512. The speed of DRSNet exceeds all the networks within 1GFLOPs, and Foreground Accuracy index is also the best among the similar magnitude networks on both benchmarks.
Radiation exposure in CT imaging leads to increased patient risk. This motivates the pursuit of reduced-dose scanning protocols, in which noise reduction processing is indispensable to warrant clinically acceptable im...
详细信息
Radiation exposure in CT imaging leads to increased patient risk. This motivates the pursuit of reduced-dose scanning protocols, in which noise reduction processing is indispensable to warrant clinically acceptable image quality. Convolutional Neural Networks (CNNs) have received significant attention as an alternative for conventional noise reduction and are able to achieve state-of-the art results. However, the internal signal processing in such networks is often unknown, leading to sub-optimal network architectures. The need for better signal preservation and more transparency motivates the use of Wavelet Shrinkage Networks (WSNs), in which the Encoding-Decoding (ED) path is the fixed wavelet frame known as Overcomplete Haar Wavelet Transform (OHWT) and the noise reduction stage is data-driven. In this work, we considerably extend the WSN framework by focusing on three main improvements. First, we simplify the computation of the OHWT that can be easily reproduced. Second, we update the architecture of the shrinkage stage by further incorporating knowledge of conventional wavelet shrinkage methods. Finally, we extensively test its performance and generalization, by comparing it with the RED and FBPConvNet CNNs. Our results show that the proposed architecture achieves similar performance to the reference in terms of MSSIM (0.667, 0.662 and 0.657 for DHSN2, FBPConvNet and RED, respectively) and achieves excellent quality when visualizing patches of clinically important structures. Furthermore, we demonstrate the enhanced generalization and further advantages of the signal flow, by showing two additional potential applications, in which the new DHSN2 is used as regularizer: (1) iterative reconstruction and (2) ground-truth free training of the proposed noise reduction architecture. The presented results prove that the tight integration of signal processing and deep learning leads to simpler models with improved generalization.
Object detectors that solely rely on image contrast are struggling to detect camouflaged objects in images because of the high similarity between camouflaged objects and their surroundings. To address this issue, in t...
详细信息
Object detectors that solely rely on image contrast are struggling to detect camouflaged objects in images because of the high similarity between camouflaged objects and their surroundings. To address this issue, in this paper, we investigate the role of the part-object relationship for camouflaged object detection. Specifically, we propose a Part-Object relationship and Contrast Integrated Network (POCINet) covering both search and identification stages, where each stage adopts an appropriate scheme to engage the contrast information and part-object relational knowledge for camouflaged pattern decoding. Besides, we bridge these two stages via a Search-to-Identification Guidance (SIG) module, in which the search result, as well as decoded semantic knowledge, jointly enhances the features encoding ability of the identification stage. Experimental results demonstrate the superiority of our algorithm on three datasets. Notably, our algorithm raises $F_\beta $ of the best existing method by approximately 17 points on the CPD1K dataset. The source code will be released soon.
暂无评论