检索结果-内蒙古大学图书馆

LA-ResUNet: An Efficient Linear Attention Mechanism in ResUNet for the Semantic Segmentation of Pulmonary Nodules

IEEE ACCESS 2024年 12卷 182894-182907页

作者： Sarah Prithvika, P. C. Jani Anbarasi, L. Vellore Inst Technol Sch Comp Sci & Engn Chennai 600127 India

Numerous people die from lung cancer every year, making it a serious public health issue. Oftentimes, the symptoms of lung cancer manifest only at a later stage, when it is difficult to treat. Pulmonary nodules are commonly found while screening the lungs using a Computed Tomography (CT) scan, and some of the nodules may be cancerous. So, an efficient automated pulmonary nodule segmentation system is needed to isolate the pulmonary nodules from the scan images. The doctors can track the nodules that are likely to be malignant and provide early treatment if they become cancerous, thereby improving the patient's chance of survival. The attention mechanism is a technique that is often used in computer vision to enhance the neural network's performance. LA-ResUNet, a pulmonary nodule segmentation model, built using ResUNet with a linear attention mechanism and the Leaky ReLU activation function is proposed. LA-ResUNet efficiently segments pulmonary nodules, while achieving a linear time and space complexity. By employing residual blocks, it is possible to construct a deep network without facing the vanishing gradient problem. Additionally, it makes deep network training simpler. Skip connections allow for better gradient flow during training and better information flow between layers. Leaky ReLU addresses the dying ReLU scenario, a situation where some neurons cease to learn when the network is being trained. LA-ResUNet was used on the dataset LIDC-IDRI (The Lung Image Database Consortium and Image Database Resource Initiative) and it produced a dice score coefficient (DSC) of 73.11% and Intersection over Union score (IoU) of 60.62%.

关键词： Lungs Lung cancer Image segmentation Tumors Solids Glass Feature extraction Computed tomography Biomedical imaging Blood vessels Convolutional neural network encoder-decoder linear attention pulmonary nodule segmentation ResUNet

来源：评论

学校读者我要写书评

暂无评论

ICEAP: An advanced fine-grained image captioning network with enhanced attribute predictor

引用

DISPLAYS 2024年 84卷

作者： Hossen, Md. Bipul Ye, Zhongfu Abdussalam, Amr Hossain, Mohammad Alamgir Univ Sci & Technol China Sch Informat Sci & Technol Hefei 230027 Anhui Peoples R China

Fine-grained image captioning is a focal point in the vision-to-language task and has attracted considerable attention for generating accurate and contextually relevant image captions. Effective attribute prediction and their utilization play a crucial role in enhancing image captioning performance. Despite progress in prior attribute-related methods, they either focus on predicting attributes related to the input image or concentrate on predicting linguistic context-related attributes at each time step in the language model. However, these approaches often overlook the importance of balancing visual and linguistic contexts, leading to ineffective exploitation of semantic information and a subsequent decline in performance. To address these issues, an Independent Attribute Predictor (IAP) is introduced to precisely predict attributes related to the input image by leveraging relationships between visual objects and attribute embeddings. Following this, an Enhanced Attribute Predictor (EAP) is proposed, initially predicting linguistic context-related attributes and then using prior probabilities from the IAP module to rebalance image and linguistic context-related attributes, thereby generating more robust and enhanced attribute probabilities. These refined attributes are then integrated into the language LSTM layer to ensure accurate word prediction at each time step. The integration of the IAP and EAP modules in our proposed image captioning with the enhanced attribute predictor (ICEAP) model effectively incorporates high-level semantic details, enhancing overall model performance. The ICEAP outperforms contemporary models, yielding significant average improvements of 10.62% in CIDEr-D scores for MS-COCO, 9.63% for Flickr30K and 7.74% for Flickr8K datasets using cross-entropy optimization, with qualitative analysis confirming its ability to generate fine-grained captions.

关键词： Fine-grained image caption Attention mechanism encoder-decoder Independent attribute predictor Enhanced attribute predictor

来源：评论

学校读者我要写书评

暂无评论

Enhancing building segmentation by deep multiview classification for advancing sustainable urban development

引用

JOURNAL OF BUILDING ENGINEERING 2024年 83卷

作者： El Hajjar, Sally Kassem, Hassan Abdallah, Fahed Omrani, Hichem Luxembourg Inst Socio Econ Res LISER Urban Dev & Mobil Dept 11 Porte Sci L-4366 Esch Sur Alzette Luxembourg Lebanese Univ Beirut Lebanon Univ Lorraine Lab LCOMS Metz France

Accurate building segmentation plays a crucial role in a wide range of applications such as urban planning, monitoring, and mapping. Different deep learning models were employed for building segmentation. However, these models analyze images from a single view. Given the limitations of single-view building segmentation models, our research aims to enhance accuracy by proposing a novel multi-view U-Net deep model for accurate building segmentation that incorporates multiple views of the images. We employ two pre-trained convolutional neural network architectures, MobileNetV2 and ResNet50, to extract features representing two different views of our images. By fusing these features, our proposed method effectively captures complementary information, leading to enhanced segmentation accuracy. To further improve the model's performance, we incorporate skip connections and up-convolutional layers to ensure fine-grained feature propagation. Our experimental results on a large building dataset demonstrate a significant improvement in segmentation accuracy 91% compared to state-of-theart methods, highlighting the effectiveness of our multiview fusion approach. The experimental results enhance the benefits of creating different views by adopting the novel concept proposed in this paper. This research has the potential to redefine the landscape of building segmentation in applications such as urban planning and mapping. We also conducted a test on a large study area (city scale of Belval-Luxembourg). This demonstrates the capabilities of our method and its efficiency in segmenting satellite images from a large extent area and reinforces its potential for real-world applications.

关键词： Deep multiview building segmentation U-Net model Pix2pix Skip connection encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

An unsupervised deep global-local views model for anomaly detection in attributed networks

引用

KNOWLEDGE-BASED SYSTEMS 2024年 300卷

作者： Lei, Tianyang Ou, Mengxin Gong, Chang Li, Jichao Yang, Kewei Natl Univ Def Technol Coll Syst Engn Deya Rd 109 Changsha 410073 Peoples R China

An attribute network is a form of data that contains rich semantic information. Many real scenarios can be modeled as attributed networks, such as social media, citations, and traffic networks. Anomaly detection in attributed networks is an interesting research topic owing to its potential in various practical applications, including spam, network intrusion, and financial fraud detection. However, attributed networks exhibit many anomaly patterns, such as structural, attribute, local, and global anomalies, making anomaly detection in attributed networks a challenging task. To address these difficulties, we designed DeepGL, a novel unsupervised deep global-local view model, for anomaly detection in attributed networks. Our model is an encoder-decoder framework with multiple views that capture node attributes and network structure information from both global and local views. Specifically, our model contains two encoders and four decoders. The two encoders are used to capture network features from local and global views, and the four decoders are used to reconstruct the local node attribute information, local structure information, global node attribute information, and global structure information. To the encoders and decoders, we applied Laplacian sharpening and smoothing techniques to maintain the integrity of normal node features while diminishing the conspicuousness of anomalous nodes in the reconstructed information, thereby facilitating the calculation of reconstruction errors. Extensive experiments on four real-world attributed network datasets demonstrate the excellent performance of the proposed method.

关键词： Anomaly detection Attributed network Graph convolutional networks encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Learning to Automatically Generate Accurate ECG Captions 5

Learning to Automatically Generate Accurate ECG Captions

引用

5th International Conference on Medical Imaging with Deep Learning (MIDL)

作者： Bartels, Mathieu G. G. Najdenkoska, Ivona van de Leur, Rutger R. Sammani, Arjan Taha, Karim Knigge, David M. Doevendans, Pieter A. Worring, Marcel van Es, Rene Univ Med Ctr Utrecht Dept Cardiol Utrecht Netherlands Univ Amsterdam Amsterdam Netherlands Netherlands Heart Inst Utrecht Netherlands

The electrocardiogram (ECG) is an affordable, non-invasive and quick method to gain essential information about the electrical activity of the heart. Interpreting ECGs is a time-consuming process even for experienced cardiologists, which motivates the current usage of rule-based methods in clinical practice to automatically describe ECGs. However, in comparison to descriptions created by experts, ECG-descriptions generated by such rule-based methods show considerable limitations. Inspired by image captioning methods, we instead propose a data-driven approach for ECG description generation. We introduce a label-guided Transformer model, and show that it is possible to automatically generate relevant and readable ECG descriptions with a data-driven captioning model. We incorporate prior ECG labels into our model design, and show this improves the overall quality of generated descriptions. We find that training these models on free-text annotations of ECGs - instead of the clinically-used computer generated ECG descriptions - greatly improves performance. Moreover, we perform a human expert evaluation study of our best system, which shows that our data-driven approach improves upon existing rule-based methods.

关键词： Transformer encoder-decoder ECG Signal processing ResNet Captioning

来源：评论

学校读者我要写书评

暂无评论

A Faster Approach For Direct Speech to Speech Translation 3

A Faster Approach For Direct Speech to Speech Translation

引用

IEEE Women in Technology Conference (WINTECHCON) - Smarter Technologies for a Sustainable and Hyper-Connected World

作者： Shankarappa, Rashmi T. Tiwari, Sourabh Samsung R&D Inst Voice Intelligence Team Bengaluru India

ISBN: (数字)9781665486743

ISBN: (纸本)9781665486743

As the world is pacing towards globalization, the demand for automatic language translators is increasing rapidly. Traditional translation systems consist of multiple steps like speech recognition, text to text machine translation, and speech generation. Issue with these systems are, latency due to multiple steps and error propagation from first steps toward last steps. Another challenge is that many spoken languages do not have text representation, so traditional system involving speech to text and text to text translation do not work. In this paper, we are presenting a recurrent neural network (RNN) based translation system that can generate a direct waveform of target language audio. We have used the sparse coding technique for the extraction and inversion of audio features. An attention-based multi-layered sequence to sequence model is trained using a novel technique on a dataset of Spanish to English audio and no intermediate text representation is used while training or inference. We have done performance comparison of proposed approaches using latency, bilingual evaluation understudy (BLEU) score and Perceptual Evaluation of Speech Quality PESQ score analysis. The resulting system provides a very fast translation with good translation accuracy and audio quality.

关键词： Speech Signal Processing Machine Learning Translation System encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Automated Radiology Report Generation Using a Transformer-Template System: Improved Clinical Accuracy and an Assessment of Clinical Safety 35th

Automated Radiology Report Generation Using a Transformer-Te...

引用

35th Australasian Joint Conference on Artificial Intelligence (AI)

作者： Abela, Brandon Abu-Khalaf, Jumana Yang, Chi-Wei Robin Masek, Martin Gupta, Ashu Edith Cowan Univ Joondalup WA 6027 Australia Fiona Stanley Hosp Murdoch WA 6150 Australia

ISBN: (纸本)9783031226946;9783031226953

Radiologists are required towrite a descriptive report for each examination they perform which is a time-consuming process. Deep-learning researchers are developing models to automate this process. Currently, the most researched architecture for this task is the encoder-decoder (E-D). An issue with this approach is that these models are optimised to produce output that is more coherent and grammatically correct rather than clinically correct. The current study considers this and instead builds upon a more recent approach that generates reports using a multi-label classification model attached to a Template-based Report Generation (TRG) subsystem. In the current study two TRG models that utilise either a Transformer or CNN classifier are produced and directly compared to the most clinically accurate E-D in the literature at the time of writing. The models were trained using the MIMIC-CXR dataset, a public set of 473,057 chest X-rays and 206,563 corresponding reports. Precision, recall and F1 scores were obtained by applying a rule-based labeller to the MIMIC-CXR reports, applying those labels to the corresponding images, and then using the labeller on the generated reports. The TRG models outperformed the E-D model for clinical accuracy with the largest difference being the recall rate (T-TRG: Precision 0.38, Recall 0.58, F1 0.45;CNN-TRG: Precision 0.34, Recall 0.69, F1 0.42;E-D: Precision 0.38, Recall 0.14, F1 0.19). Examination of the quantitative metrics for each specific abnormality combined with the qualitative assessment concludes that significant progress still needs to be made before clinical integration is safe.

关键词： Medical text Medical imaging Deep learning Templates encoder-decoder CNN Transformer

来源：评论

学校读者我要写书评

暂无评论

Accurate Generation of Trigger-Action Programs with Domain-Adapted Sequence-to-Sequence Learning 30

Accurate Generation of Trigger-Action Programs with Domain-A...

引用

30th IEEE/ACM International Conference on Program Comprehension (ICPC)

作者： Yusuf, Imam Nur Bani Jiang, Lingxiao Lo, David Singapore Management Univ Sch Comp & Informat Syst Singapore Singapore

ISBN: (纸本)9781450392983

Trigger-action programming allows end users to write event-driven rules to automate smart devices and internet services. Users can create a trigger-action program (TAP) by specifying triggers and actions from a set of predefined functions along with suitable data fields for the functions. Many trigger-action programming platforms have emerged as the popularity grows, e.g., IFTTT, Microsoft Power Automate, and Samsung SmartThings. Despite their simplicity, composing trigger-action programs (TAPs) can still be challenging for end users due to the domain knowledge needed and enormous search space of many combinations of triggers and actions. We propose RecipeGen, a new deep learning-based approach that leverages Transformer sequence-to-sequence (seq2seq) architecture to generate TAPs on the fine-grained field-level granularity from natural language descriptions. Our approach adapts autoencoding pre-trained models to warm-start the encoder in the seq2seq model to boost the generation performance. We have evaluated RecipeGen on real-world datasets from the IFTTT platform against the prior state-of-the-art approach on the TAP generation task. Our empirical evaluation shows that the overall improvement against the prior best results ranges from 9.5%-26.5%. Our results also show that adopting a pre-trained autoencoding model boosts the MRR@3 further by 2.8%-10.8%. Further, in the field-level generation setting, RecipeGen achieves 0.591 and 0.575 in terms of MRR@3 and BLEU scores respectively.

关键词： Trigger-Action Programming IFTTT Program Generation Deep Learning encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

AN EFFICIENT END-TO-END IMAGE COMPRESSION TRANSFORMER 29

AN EFFICIENT END-TO-END IMAGE COMPRESSION TRANSFORMER

引用

IEEE International Conference on Image Processing (ICIP)

作者： Jeny, Afsana Ahsan Junayed, Masum Shah Islam, Md Baharul Bahcesehir Univ Dept Comp Engn Istanbul Turkey Amer Univ Malta Coll Data Sci & Engn Bormla Malta

ISBN: (数字)9781665496209

ISBN: (纸本)9781665496209

Image and video compression received significant research attention and expanded their applications. Existing entropy estimation-based methods combine with hyperprior and local context, limiting their efficacy. This paper introduces an efficient end-to-end transformer-based image compression model, which generates a global receptive field to tackle the long-range correlation issues. A hyper encoder-decoder-based transformer block employs a multi-head spatial reduction self-attention (MHSRSA) layer to minimize the computational cost of the self-attention layer and enable rapid learning of multi-scale and high-resolution features. A Casual Global Anticipation Module (CGAM) is designed to construct highly informative adjacent contexts utilizing channel-wise linkages and identify global reference points in the latent space for end-to-end rate-distortion optimization (RDO). Experimental results demonstrate the effectiveness and competitive performance of the KODAK dataset.

关键词： Image compression transformer encoder-decoder entropy model

来源：评论

学校读者我要写书评

暂无评论

DEEP SPATIO-TEMPORAL WIND POWER FORECASTING 47

DEEP SPATIO-TEMPORAL WIND POWER FORECASTING

引用

47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

作者： Li, Jiangyuan Armandpour, Mohammadreza Texas A&M Univ Dept Stat College Stn TX 77843 USA

ISBN: (纸本)9781665405409

Wind power forecasting has drawn increasing attention among researchers as the consumption of renewable energy grows. In this paper, we develop a deep learning approach based on encoder-decoder structure. Our model forecasts wind power generated by a wind turbine using its spatial location relative to other turbines and historical wind speed data. In this way, we effectively integrate spatial dependency and temporal trends to make turbine-specific predictions. The advantages of our method over existing work can be summarized as 1) it directly predicts wind power based on historical wind speed, without the need for prediction of wind speed first, and then using a transformation;2) it can effectively capture long-term dependency 3) our model is more scalable and efficient compared with other deep learning based methods. We demonstrate the efficacy of our model on the benchmark real-world datasets.

关键词： Spatio-temporal model encoder-decoder wind power forecasting temporal relation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：