Numerous people die from lung cancer every year, making it a serious public health issue. Oftentimes, the symptoms of lung cancer manifest only at a later stage, when it is difficult to treat. Pulmonary nodules are co...
详细信息
Numerous people die from lung cancer every year, making it a serious public health issue. Oftentimes, the symptoms of lung cancer manifest only at a later stage, when it is difficult to treat. Pulmonary nodules are commonly found while screening the lungs using a Computed Tomography (CT) scan, and some of the nodules may be cancerous. So, an efficient automated pulmonary nodule segmentation system is needed to isolate the pulmonary nodules from the scan images. The doctors can track the nodules that are likely to be malignant and provide early treatment if they become cancerous, thereby improving the patient's chance of survival. The attention mechanism is a technique that is often used in computer vision to enhance the neural network's performance. LA-ResUNet, a pulmonary nodule segmentation model, built using ResUNet with a linear attention mechanism and the Leaky ReLU activation function is proposed. LA-ResUNet efficiently segments pulmonary nodules, while achieving a linear time and space complexity. By employing residual blocks, it is possible to construct a deep network without facing the vanishing gradient problem. Additionally, it makes deep network training simpler. Skip connections allow for better gradient flow during training and better information flow between layers. Leaky ReLU addresses the dying ReLU scenario, a situation where some neurons cease to learn when the network is being trained. LA-ResUNet was used on the dataset LIDC-IDRI (The Lung Image Database Consortium and Image Database Resource Initiative) and it produced a dice score coefficient (DSC) of 73.11% and Intersection over Union score (IoU) of 60.62%.
Fine-grained image captioning is a focal point in the vision-to-language task and has attracted considerable attention for generating accurate and contextually relevant image captions. Effective attribute prediction a...
详细信息
Fine-grained image captioning is a focal point in the vision-to-language task and has attracted considerable attention for generating accurate and contextually relevant image captions. Effective attribute prediction and their utilization play a crucial role in enhancing image captioning performance. Despite progress in prior attribute-related methods, they either focus on predicting attributes related to the input image or concentrate on predicting linguistic context-related attributes at each time step in the language model. However, these approaches often overlook the importance of balancing visual and linguistic contexts, leading to ineffective exploitation of semantic information and a subsequent decline in performance. To address these issues, an Independent Attribute Predictor (IAP) is introduced to precisely predict attributes related to the input image by leveraging relationships between visual objects and attribute embeddings. Following this, an Enhanced Attribute Predictor (EAP) is proposed, initially predicting linguistic context-related attributes and then using prior probabilities from the IAP module to rebalance image and linguistic context-related attributes, thereby generating more robust and enhanced attribute probabilities. These refined attributes are then integrated into the language LSTM layer to ensure accurate word prediction at each time step. The integration of the IAP and EAP modules in our proposed image captioning with the enhanced attribute predictor (ICEAP) model effectively incorporates high-level semantic details, enhancing overall model performance. The ICEAP outperforms contemporary models, yielding significant average improvements of 10.62% in CIDEr-D scores for MS-COCO, 9.63% for Flickr30K and 7.74% for Flickr8K datasets using cross-entropy optimization, with qualitative analysis confirming its ability to generate fine-grained captions.
Accurate building segmentation plays a crucial role in a wide range of applications such as urban planning, monitoring, and mapping. Different deep learning models were employed for building segmentation. However, the...
详细信息
Accurate building segmentation plays a crucial role in a wide range of applications such as urban planning, monitoring, and mapping. Different deep learning models were employed for building segmentation. However, these models analyze images from a single view. Given the limitations of single-view building segmentation models, our research aims to enhance accuracy by proposing a novel multi-view U-Net deep model for accurate building segmentation that incorporates multiple views of the images. We employ two pre-trained convolutional neural network architectures, MobileNetV2 and ResNet50, to extract features representing two different views of our images. By fusing these features, our proposed method effectively captures complementary information, leading to enhanced segmentation accuracy. To further improve the model's performance, we incorporate skip connections and up-convolutional layers to ensure fine-grained feature propagation. Our experimental results on a large building dataset demonstrate a significant improvement in segmentation accuracy 91% compared to state-of-theart methods, highlighting the effectiveness of our multiview fusion approach. The experimental results enhance the benefits of creating different views by adopting the novel concept proposed in this paper. This research has the potential to redefine the landscape of building segmentation in applications such as urban planning and mapping. We also conducted a test on a large study area (city scale of Belval-Luxembourg). This demonstrates the capabilities of our method and its efficiency in segmenting satellite images from a large extent area and reinforces its potential for real-world applications.
An attribute network is a form of data that contains rich semantic information. Many real scenarios can be modeled as attributed networks, such as social media, citations, and traffic networks. Anomaly detection in at...
详细信息
An attribute network is a form of data that contains rich semantic information. Many real scenarios can be modeled as attributed networks, such as social media, citations, and traffic networks. Anomaly detection in attributed networks is an interesting research topic owing to its potential in various practical applications, including spam, network intrusion, and financial fraud detection. However, attributed networks exhibit many anomaly patterns, such as structural, attribute, local, and global anomalies, making anomaly detection in attributed networks a challenging task. To address these difficulties, we designed DeepGL, a novel unsupervised deep global-local view model, for anomaly detection in attributed networks. Our model is an encoder-decoder framework with multiple views that capture node attributes and network structure information from both global and local views. Specifically, our model contains two encoders and four decoders. The two encoders are used to capture network features from local and global views, and the four decoders are used to reconstruct the local node attribute information, local structure information, global node attribute information, and global structure information. To the encoders and decoders, we applied Laplacian sharpening and smoothing techniques to maintain the integrity of normal node features while diminishing the conspicuousness of anomalous nodes in the reconstructed information, thereby facilitating the calculation of reconstruction errors. Extensive experiments on four real-world attributed network datasets demonstrate the excellent performance of the proposed method.
The electrocardiogram (ECG) is an affordable, non-invasive and quick method to gain essential information about the electrical activity of the heart. Interpreting ECGs is a time-consuming process even for experienced ...
详细信息
The electrocardiogram (ECG) is an affordable, non-invasive and quick method to gain essential information about the electrical activity of the heart. Interpreting ECGs is a time-consuming process even for experienced cardiologists, which motivates the current usage of rule-based methods in clinical practice to automatically describe ECGs. However, in comparison to descriptions created by experts, ECG-descriptions generated by such rule-based methods show considerable limitations. Inspired by image captioning methods, we instead propose a data-driven approach for ECG description generation. We introduce a label-guided Transformer model, and show that it is possible to automatically generate relevant and readable ECG descriptions with a data-driven captioning model. We incorporate prior ECG labels into our model design, and show this improves the overall quality of generated descriptions. We find that training these models on free-text annotations of ECGs - instead of the clinically-used computer generated ECG descriptions - greatly improves performance. Moreover, we perform a human expert evaluation study of our best system, which shows that our data-driven approach improves upon existing rule-based methods.
As the world is pacing towards globalization, the demand for automatic language translators is increasing rapidly. Traditional translation systems consist of multiple steps like speech recognition, text to text machin...
详细信息
ISBN:
(数字)9781665486743
ISBN:
(纸本)9781665486743
As the world is pacing towards globalization, the demand for automatic language translators is increasing rapidly. Traditional translation systems consist of multiple steps like speech recognition, text to text machine translation, and speech generation. Issue with these systems are, latency due to multiple steps and error propagation from first steps toward last steps. Another challenge is that many spoken languages do not have text representation, so traditional system involving speech to text and text to text translation do not work. In this paper, we are presenting a recurrent neural network (RNN) based translation system that can generate a direct waveform of target language audio. We have used the sparse coding technique for the extraction and inversion of audio features. An attention-based multi-layered sequence to sequence model is trained using a novel technique on a dataset of Spanish to English audio and no intermediate text representation is used while training or inference. We have done performance comparison of proposed approaches using latency, bilingual evaluation understudy (BLEU) score and Perceptual Evaluation of Speech Quality PESQ score analysis. The resulting system provides a very fast translation with good translation accuracy and audio quality.
Radiologists are required towrite a descriptive report for each examination they perform which is a time-consuming process. Deep-learning researchers are developing models to automate this process. Currently, the most...
详细信息
ISBN:
(纸本)9783031226946;9783031226953
Radiologists are required towrite a descriptive report for each examination they perform which is a time-consuming process. Deep-learning researchers are developing models to automate this process. Currently, the most researched architecture for this task is the encoder-decoder (E-D). An issue with this approach is that these models are optimised to produce output that is more coherent and grammatically correct rather than clinically correct. The current study considers this and instead builds upon a more recent approach that generates reports using a multi-label classification model attached to a Template-based Report Generation (TRG) subsystem. In the current study two TRG models that utilise either a Transformer or CNN classifier are produced and directly compared to the most clinically accurate E-D in the literature at the time of writing. The models were trained using the MIMIC-CXR dataset, a public set of 473,057 chest X-rays and 206,563 corresponding reports. Precision, recall and F1 scores were obtained by applying a rule-based labeller to the MIMIC-CXR reports, applying those labels to the corresponding images, and then using the labeller on the generated reports. The TRG models outperformed the E-D model for clinical accuracy with the largest difference being the recall rate (T-TRG: Precision 0.38, Recall 0.58, F1 0.45;CNN-TRG: Precision 0.34, Recall 0.69, F1 0.42;E-D: Precision 0.38, Recall 0.14, F1 0.19). Examination of the quantitative metrics for each specific abnormality combined with the qualitative assessment concludes that significant progress still needs to be made before clinical integration is safe.
Trigger-action programming allows end users to write event-driven rules to automate smart devices and internet services. Users can create a trigger-action program (TAP) by specifying triggers and actions from a set of...
详细信息
ISBN:
(纸本)9781450392983
Trigger-action programming allows end users to write event-driven rules to automate smart devices and internet services. Users can create a trigger-action program (TAP) by specifying triggers and actions from a set of predefined functions along with suitable data fields for the functions. Many trigger-action programming platforms have emerged as the popularity grows, e.g., IFTTT, Microsoft Power Automate, and Samsung SmartThings. Despite their simplicity, composing trigger-action programs (TAPs) can still be challenging for end users due to the domain knowledge needed and enormous search space of many combinations of triggers and actions. We propose RecipeGen, a new deep learning-based approach that leverages Transformer sequence-to-sequence (seq2seq) architecture to generate TAPs on the fine-grained field-level granularity from natural language descriptions. Our approach adapts autoencoding pre-trained models to warm-start the encoder in the seq2seq model to boost the generation performance. We have evaluated RecipeGen on real-world datasets from the IFTTT platform against the prior state-of-the-art approach on the TAP generation task. Our empirical evaluation shows that the overall improvement against the prior best results ranges from 9.5%-26.5%. Our results also show that adopting a pre-trained autoencoding model boosts the MRR@3 further by 2.8%-10.8%. Further, in the field-level generation setting, RecipeGen achieves 0.591 and 0.575 in terms of MRR@3 and BLEU scores respectively.
Image and video compression received significant research attention and expanded their applications. Existing entropy estimation-based methods combine with hyperprior and local context, limiting their efficacy. This p...
详细信息
ISBN:
(数字)9781665496209
ISBN:
(纸本)9781665496209
Image and video compression received significant research attention and expanded their applications. Existing entropy estimation-based methods combine with hyperprior and local context, limiting their efficacy. This paper introduces an efficient end-to-end transformer-based image compression model, which generates a global receptive field to tackle the long-range correlation issues. A hyper encoder-decoder-based transformer block employs a multi-head spatial reduction self-attention (MHSRSA) layer to minimize the computational cost of the self-attention layer and enable rapid learning of multi-scale and high-resolution features. A Casual Global Anticipation Module (CGAM) is designed to construct highly informative adjacent contexts utilizing channel-wise linkages and identify global reference points in the latent space for end-to-end rate-distortion optimization (RDO). Experimental results demonstrate the effectiveness and competitive performance of the KODAK dataset.
Wind power forecasting has drawn increasing attention among researchers as the consumption of renewable energy grows. In this paper, we develop a deep learning approach based on encoder-decoder structure. Our model fo...
详细信息
ISBN:
(纸本)9781665405409
Wind power forecasting has drawn increasing attention among researchers as the consumption of renewable energy grows. In this paper, we develop a deep learning approach based on encoder-decoder structure. Our model forecasts wind power generated by a wind turbine using its spatial location relative to other turbines and historical wind speed data. In this way, we effectively integrate spatial dependency and temporal trends to make turbine-specific predictions. The advantages of our method over existing work can be summarized as 1) it directly predicts wind power based on historical wind speed, without the need for prediction of wind speed first, and then using a transformation;2) it can effectively capture long-term dependency 3) our model is more scalable and efficient compared with other deep learning based methods. We demonstrate the efficacy of our model on the benchmark real-world datasets.
暂无评论