Radiologists are required towrite a descriptive report for each examination they perform which is a time-consuming process. Deep-learning researchers are developing models to automate this process. Currently, the most...
详细信息
ISBN:
(纸本)9783031226946;9783031226953
Radiologists are required towrite a descriptive report for each examination they perform which is a time-consuming process. Deep-learning researchers are developing models to automate this process. Currently, the most researched architecture for this task is the encoder-decoder (E-D). An issue with this approach is that these models are optimised to produce output that is more coherent and grammatically correct rather than clinically correct. The current study considers this and instead builds upon a more recent approach that generates reports using a multi-label classification model attached to a Template-based Report Generation (TRG) subsystem. In the current study two TRG models that utilise either a Transformer or CNN classifier are produced and directly compared to the most clinically accurate E-D in the literature at the time of writing. The models were trained using the MIMIC-CXR dataset, a public set of 473,057 chest X-rays and 206,563 corresponding reports. Precision, recall and F1 scores were obtained by applying a rule-based labeller to the MIMIC-CXR reports, applying those labels to the corresponding images, and then using the labeller on the generated reports. The TRG models outperformed the E-D model for clinical accuracy with the largest difference being the recall rate (T-TRG: Precision 0.38, Recall 0.58, F1 0.45;CNN-TRG: Precision 0.34, Recall 0.69, F1 0.42;E-D: Precision 0.38, Recall 0.14, F1 0.19). Examination of the quantitative metrics for each specific abnormality combined with the qualitative assessment concludes that significant progress still needs to be made before clinical integration is safe.
Trigger-action programming allows end users to write event-driven rules to automate smart devices and internet services. Users can create a trigger-action program (TAP) by specifying triggers and actions from a set of...
详细信息
ISBN:
(纸本)9781450392983
Trigger-action programming allows end users to write event-driven rules to automate smart devices and internet services. Users can create a trigger-action program (TAP) by specifying triggers and actions from a set of predefined functions along with suitable data fields for the functions. Many trigger-action programming platforms have emerged as the popularity grows, e.g., IFTTT, Microsoft Power Automate, and Samsung SmartThings. Despite their simplicity, composing trigger-action programs (TAPs) can still be challenging for end users due to the domain knowledge needed and enormous search space of many combinations of triggers and actions. We propose RecipeGen, a new deep learning-based approach that leverages Transformer sequence-to-sequence (seq2seq) architecture to generate TAPs on the fine-grained field-level granularity from natural language descriptions. Our approach adapts autoencoding pre-trained models to warm-start the encoder in the seq2seq model to boost the generation performance. We have evaluated RecipeGen on real-world datasets from the IFTTT platform against the prior state-of-the-art approach on the TAP generation task. Our empirical evaluation shows that the overall improvement against the prior best results ranges from 9.5%-26.5%. Our results also show that adopting a pre-trained autoencoding model boosts the MRR@3 further by 2.8%-10.8%. Further, in the field-level generation setting, RecipeGen achieves 0.591 and 0.575 in terms of MRR@3 and BLEU scores respectively.
Image and video compression received significant research attention and expanded their applications. Existing entropy estimation-based methods combine with hyperprior and local context, limiting their efficacy. This p...
详细信息
ISBN:
(数字)9781665496209
ISBN:
(纸本)9781665496209
Image and video compression received significant research attention and expanded their applications. Existing entropy estimation-based methods combine with hyperprior and local context, limiting their efficacy. This paper introduces an efficient end-to-end transformer-based image compression model, which generates a global receptive field to tackle the long-range correlation issues. A hyper encoder-decoder-based transformer block employs a multi-head spatial reduction self-attention (MHSRSA) layer to minimize the computational cost of the self-attention layer and enable rapid learning of multi-scale and high-resolution features. A Casual Global Anticipation Module (CGAM) is designed to construct highly informative adjacent contexts utilizing channel-wise linkages and identify global reference points in the latent space for end-to-end rate-distortion optimization (RDO). Experimental results demonstrate the effectiveness and competitive performance of the KODAK dataset.
Wind power forecasting has drawn increasing attention among researchers as the consumption of renewable energy grows. In this paper, we develop a deep learning approach based on encoder-decoder structure. Our model fo...
详细信息
ISBN:
(纸本)9781665405409
Wind power forecasting has drawn increasing attention among researchers as the consumption of renewable energy grows. In this paper, we develop a deep learning approach based on encoder-decoder structure. Our model forecasts wind power generated by a wind turbine using its spatial location relative to other turbines and historical wind speed data. In this way, we effectively integrate spatial dependency and temporal trends to make turbine-specific predictions. The advantages of our method over existing work can be summarized as 1) it directly predicts wind power based on historical wind speed, without the need for prediction of wind speed first, and then using a transformation;2) it can effectively capture long-term dependency 3) our model is more scalable and efficient compared with other deep learning based methods. We demonstrate the efficacy of our model on the benchmark real-world datasets.
Deep medical image segmentation calls for features with strong discrimination and rich scales due to ambiguous background distraction and large variations in object sizes and shapes. In this paper, we propose two modu...
详细信息
Deep medical image segmentation calls for features with strong discrimination and rich scales due to ambiguous background distraction and large variations in object sizes and shapes. In this paper, we propose two modules to obtain these features. First, existing encoders tend to extract similar foreground/background features at blurry boundaries due to mixed-label feature aggregation. To enhance the discrimination of these features, a Label- Aware Attention (LAA) module is presented to reconstruct them by fusing same-label local features. The fusion is guided by local attention maps based on label-aware affinity learning. Second, instead of relying on a single encoder for scale context mining, we propose a Multi-scale Feature Boosting (MFB) module that applies parallel convolution with different receptive fields for scale embedding and integrates an additional backbone for cross- encoder scale reference. Combining LAA and MFB, a new encoder-decoder based framework is presented, where MFBs act as encoder blocks to recursively extract features with rich scale context, while LAA operates in the decoder layer to enhance the label-aware discriminativeness of features. Extensive experiments on three standard medical segmentation datasets demonstrate the effectiveness of the proposed framework.
The classification and recognition of features play a vital role in production and daily life;however, the current semantic segmentation of remote sensing images is hampered by background interference and other factor...
详细信息
The classification and recognition of features play a vital role in production and daily life;however, the current semantic segmentation of remote sensing images is hampered by background interference and other factors, leading to issues such as fuzzy boundary segmentation. To address these challenges, we propose a novel module for encoding and reconstructing multi-dimensional feature layers. Our approach first utilizes a bilinear interpolation method to downsample the multi-dimensional feature layer in the coding stage of the U-shaped framework. Subsequently, we incorporate a fractal curve module into the encoder, which aggregates points on feature maps from different layers, effectively grouping points from diverse regions. Finally, we introduce an aggregation layer that combines the upsampling method from the UNet series, employing the multi-scale censoring of multi-dimensional feature map outputs from various layers to efficiently capture both spatial and feature information. The experimental results across diverse scenarios demonstrate that our model achieves excellent performance in aggregating point information from feature maps, significantly enhancing semantic segmentation tasks.
This paper presents a method for detecting the location of spalling and assessing the severity level of the spalling in concrete surfaces. The proposed method is constructed based on deep learning architectures and mu...
详细信息
This paper presents a method for detecting the location of spalling and assessing the severity level of the spalling in concrete surfaces. The proposed method is constructed based on deep learning architectures and multi-class semantic segmentation. The proposed method can detect each pixel as a non-spalling, a deepspalling, or a shallow-spalling. The proposed method consists of three different deep learning architectures with several encoders as backbone networks. Both qualitative and quantitative analyses show that the deep learning architecture with a certain encoder network can detect spalling with different severity levels very well. Additionally, the paper proposes a method to analyze the deep spalling areas of concrete to show their severity levels. The performance analysis shows that this approach provides very convincing results with respect to the actual affected spalling areas. The results convey that this paper achieved a higher level of performance for detecting spalling and assessing the severity of the spalling.
Multiple lesion segmentation, namely the segmentation of microaneurysms, soft exudate, hard exudate, and haemorrhage is very important to diabetic retinopathy diagnosis. However, the scales of different kinds of lesio...
详细信息
ISBN:
(纸本)9781665429238
Multiple lesion segmentation, namely the segmentation of microaneurysms, soft exudate, hard exudate, and haemorrhage is very important to diabetic retinopathy diagnosis. However, the scales of different kinds of lesions are inconsistent. This inconsistent scale problem is unavoidable in the unified architecture design in which identical time of downsampling operations is used for different kinds of lesions. To achieve better performance at different scales, multiscale features need to be captured and adjusted. In this paper, we simply consider features from different stages of an encoder-decoder network as multiscale features. To re-weight importance of multiscale features dynamically, a scale-aware attention (SAA) block which consists of a spatial path and a channel path is introduced. In SAA block, adjusting operations are performed scale-wise instead of channel-wise or uniformly for all scales. Extensive experiments were conducted on two publicly-available datasets to verify the effect of SAA. SAA surpasses popular attention blocks and state-of-the-art results in the overall evaluation while comparable performance can be achieved in the individual evaluation at the same time.
Image tampering forensics is performed by analyzing images to locate the tampered regions. However, most image tampering detection methods lack locational accuracy and are effective only for specific types of tamperin...
详细信息
ISBN:
(纸本)9783031100789;9783031100772
Image tampering forensics is performed by analyzing images to locate the tampered regions. However, most image tampering detection methods lack locational accuracy and are effective only for specific types of tampering. To address these problems, this chapter proposes a method that employs an encoder-decoder network structure with combined multiple feature encoding to segment tampered regions of an image from untampered regions. Three features, obtained using constrained convolution, steganalysis rich model filtering and common convolution, are combined. During the encoding stage, ring residual units are used to extract features. The combination of multiple features and the ring residual units makes the proposed method most suitable for image tampering detection. Channel attention with a soft threshold function is used to reinforce semantic information in the decoding stage. Experiments with three image forensic datasets, NIST16, COVERAGE and CASIA, demonstrate that the proposed method exhibits strong performance in terms of the F1 score and localization of tampered regions.
As the medical aesthetic market is growing rapidly in China, orthodontic treatment is becoming very common among the adolescent population. However, there are countless doctor-patient disputes due to treatment results...
详细信息
ISBN:
(纸本)9781665488105
As the medical aesthetic market is growing rapidly in China, orthodontic treatment is becoming very common among the adolescent population. However, there are countless doctor-patient disputes due to treatment results that do not meet patients' expectations, so there is an urgent need for a method to predict treatment results. With the development of artificial intelligence technology, generative adversarial network has provided us with a new way of thinking. The purpose of this paper is to accurately predict the face of patients after orthodontic treatment by using generative adversarial network. Therefore, we designed an evaluation index to reflect the difference between the algorithm predicted image and the patient's real image. After that, we designed a network based on encoder-decoder architecture to transform the vectors in StyleGAN latent space. Finally, we carried out experiments to verify the effectiveness of the evaluation index design and the advantages of the algorithm.
暂无评论