检索结果-内蒙古大学图书馆

A multi-scale 3-stacked-layer coned U-net framework for tumor segmentation in whole slide images

BIOMEDICAL SIGNAL PROCESSING AND CONTROL 2023年 86卷

作者： Abdel-Nabi, Heba Ali, Mostafa Z. Awajan, Arafat Princess Sumaya Univ Technol Dept Comp Sci Amman Jordan Jordan Univ Sci & Technol Fac Comp & Informat Technol Irbid Jordan

The contribution of deep learning in medical image diagnosis has gained extensive interest due to its excellent performance. Furthermore, the interest has also grown in digital pathology since it is considered the golden standard for tumor detection and diagnosis in digital Whole Slide Images (WSIs). This paper proposes an end-toend cone-shaped encoder-decoder framework called a Multi-scale 3-stacked-Layer coned U-Net (Ms3LcU-Net) framework. It boosts performance by using many enhancements and integrating techniques such as blended mutual attention, dilated fusion, edge enhancement, and atrous pooling. Furthermore, the morphological postprocessing and test time augmentation techniques are used in Ms3LcU-Net to refine and smooth the generated segmentations. The experimental results from a quantitative perspective using multiple evaluation metrics and from a qualitative viewpoint by visualizing the generated segmentation predictions conducted on the public PAIP 2019 and DigestPath datasets demonstrated the effectiveness and competitiveness of the proposed model for tumor segmentation in WSIs. The proposed framework yielded an average clipped Jaccard Index value of 0.7211 on the validation set of the PAIP 2019 dataset. In contrast, the DigestPath dataset achieved an average dice coefficient and F1-score of 0.833 and 0.897, respectively. The code will be available publicly upon acceptance of the paper at https://***/Heba-AbdeNabi/Ms3LcU-Net-.

关键词： Medical Image Segmentation Whole Slide Images encoder-decoder framework Multiple Scales PAIP 2019 DigestPath

来源：评论

学校读者我要写书评

暂无评论

A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions

引用

REMOTE SENSING 2024年第21期16卷

作者： Zhang, Ke Li, Peijie Wang, Jianqiang North China Elect Power Univ Dept Elect & Commun Engn Baoding 071003 Peoples R China North China Elect Power Univ Hebei Key Lab Power Internet Things Technol Baoding 071003 Peoples R China

Remote sensing images contain a wealth of Earth-observation information. Efficient extraction and application of hidden knowledge from these images will greatly promote the development of resource and environment monitoring, urban planning and other related fields. Remote sensing image caption (RSIC) involves obtaining textual descriptions from remote sensing images through accurately capturing and describing the semantic-level relationships between objects and attributes in the images. However, there is currently no comprehensive review summarizing the progress in RSIC based on deep learning. After defining the scope of the papers to be discussed and summarizing them all, the paper begins by providing a comprehensive review of the recent advancements in RSIC, covering six key aspects: encoder-decoder framework, attention mechanism, reinforcement learning, learning with auxiliary task, large visual language models and few-shot learning. Subsequently a brief explanation on the datasets and evaluation metrics for RSIC is given. Furthermore, we compare and analyze the results of the latest models and the pros and cons of different deep learning methods. Lastly, future directions of RSIC are suggested. The primary objective of this review is to offer researchers a more profound understanding of RSIC.

关键词： remote sensing image caption encoder-decoder framework attention mechanism reinforcement learning auxiliary task large visual language model few-shot learning

来源：评论

学校读者我要写书评

暂无评论

Automatic medical image interpretation: State of t he art and future directions

引用

PATTERN RECOGNITION 2021年 114卷

作者： Ayesha, Hareem Iqbal, Sajid Tariq, Mehreen Abrar, Muhammad Sanaullah, Muhammad Abbas, Ishaq Rehman, Amjad Niazi, Muhammad Farooq Khan Hussain, Shafiq Bahauddin Zakariya Univ Dept Comp Sci Multan Pakistan Muhammad Nawaz Shareef Univ Agr Dept Comp Sci Multan Pakistan Prince Sultan Univ AIDA Lab CCIS Riyadh Saudi Arabia Bakhtawar Amin Mem Trust Hosp Multan Pakistan Univ Sahiwal Sahiwal Pakistan

Automatic Natural language interpretation of medical images is an emerging field of Artificial Intelligence (AI). The task combines two fields of AI;computer vision and natural language processing. This is a chal-lenging task that goes beyond object detection, segmentation, and classification because it also requires the understanding of the relationship between different objects of an image and the actions performed by these objects as visual representations. Image interpretation is helpful in many tasks like helping vi-sually impaired persons, information retrieval, early childhood learning, producing human like natural interaction between robots, and many more applications. Recently this work fascinated researchers to use the same approach by using more complex biomedical images. It has been applied from generat -ing single sentence captions to multi sentence paragraph descriptions. Medical image captioning can as-sist and speed up the diagnosis process of medical professionals and generated report can be used for many further tasks. This is a comprehensive review of recent years' research of medical image caption-ing published in different international conferences and journals. Their common parameters are extracted to compare their methods, performance, strengths, limitations, and our recommendations are discussed. Further publicly available datasets and evaluation measures used for deep-learning based captioning of medical images are also discussed. (C) 2021 Elsevier Ltd. All rights reserved.

关键词： Attention mechanism Automatic captioning Convolutional neural network (cnn) Deep learning encoder-decoder framework Image captioning Long-Short-Term-Memory (LSTM) Medical image caption

来源：评论

学校读者我要写书评

暂无评论

An improved Wavenet network for multi-step-ahead wind energy forecasting

引用

ENERGY CONVERSION AND MANAGEMENT 2023年 278卷

作者： Wang, Yun Chen, Tuo Zhou, Shengchao Zhang, Fan Zou, Ruming Hu, Qinghua Cent South Univ Sch Automation Changsha Hunan Peoples R China Tianjin Univ Coll Intelligence & Comp Tianjin Peoples R China

Accurate multi-step-ahead wind speed (WS) and wind power (WP) forecasting are critical to the scheduling, planning, and maintenance of wind farms. Previous forecasting methods tend to focus on improving forecast accuracy by integrating different models and disaggregating data while neglecting the forecasting ability of basic models. In addition, traditional multi-step-ahead output strategies have limitations that constrain the forecasting capability of models. To overcome the above challenges, this study proposes a novel forecasting model called ED-Wavenet-TF. It adopts two Wavenet networks as encoder and decoder connected by the multi-head self-attention mechanism. And, teacher forcing is used as the multi-step-ahead output strategy for WS and WP forecasting. In the training phase, ED-Wavenet-TF uses a portion of the actual data to correct the errors at the intermediate forecasting steps, while in the forecasting phase, it runs through an inference loop to make forecasts. In this study, two WS datasets and two WP datasets are used to validate the performance of ED-Wavenet-TF with univariate input. The results show that compared with Wavenet, the symmetric mean absolute percentage error of ED-Wavenet-TF at four forecasting steps is lower by at least 4.8577% on average for the WS datasets and 8.9463% on average for the WP datasets. The advantages of ED-Wavenert-TF over ten comparable models are confirmed by four evaluation indicators and the Harvey, Leybourne, and Newbold statistical hypothesis test. Moreover, ED-Wavenet-TF is extended to make multi-step-ahead forecasts with multivariate inputs, whose effectiveness is demonstrated on another open WS dataset.

关键词： Multi-step-ahead wind energy forecasting Wavenet encoder-decoder framework Teacher forcing Multi-head self-attention

来源：评论

学校读者我要写书评

暂无评论

RETRACTED: An efficient deep learning-based video captioning framework using multi-modal features (Retracted article. See vol. 42, 2025)

引用

EXPERT SYSTEMS 2021年第2期42卷

作者： Varma, Soumya James, Dinesh Peter Karunya Inst Technol & Sci Dept Comp Sci & Engn Coimbatore 641114 Tamil Nadu India

Visual understanding has become more significant in gathering information in many real-life applications. For a human, it is a trivial task to understand the content in a visual, however the same is a challenging task for a machine. Generating captions for images and videos for better understanding the situation is gaining more importance as they have wide application in assistive technologies, automatic video captioning, video summarizing, subtitling, blind navigation, and so on. The visual understanding framework will analyse the content present in the video to generate semantically accurate caption for the visual. Apart from the visual understanding of the situation, the gained semantics must be represented in a natural language like English, for which we require a language model. Hence, the semantics and grammar of the sentences being generated in English is yet another challenge. The captured description of the video is supposed to collect information of not just the objects contained in the scene, but it should also express how these objects are related to each other through the activity described in the scene, thus making the entire process a complex task for a machine. This work is an attempt to peep into the various methods for video captioning using deep learning methodologies, datasets that are widely used for these tasks and various evaluation metrics that are used for the performance comparison. The insights that we gained from our premiere work and the extensive literature review made us capable of proposing a practical, efficient video captioning architecture using deep learning which that will utilize the audio clues, external knowledge and attention context to improve the captioning process. Quantum deep learning architectures can bring about extraordinary results in object recognition tasks and feature extraction using convolutions.

关键词： attention context encoder-decoder framework language model quantum machine learning video captioning

来源：评论

学校读者我要写书评

暂无评论

A New CNN-RNN framework For Remote Sensing Image Captioning

A New CNN-RNN Framework For Remote Sensing Image Captioning

引用

Geoscience and Remote Sensing Symposium (M2GARSS), Mediterranean and Middle-East

作者： Genc Hoxha Farid Melgani Jacopo Slaghenauffi Department of Information Engineering and Computer Science University of Trento Trento Italy

ISBN: (数字)9781728121901

ISBN: (纸本)9781728121918

Remote sensing (RS) image captioning has been recently attracting the attention of the community as it provides more semantic information with respect to the traditional tasks such as scene classification. Image captioning aims to generate a coherent and comprehensive description that summarizes the content of an image. The description can be obtained directly from the ground truth descriptions of similar images (retrieval based image captioning) or can be generated through the encoder-decoder framework. The former has the limitation of not generating new descriptions. The latter may be affected by misrecognition of scenes or semantic objects. In this paper we try to address these issues by proposing a new framework which is a combination of generation and retrieval based image captioning. First a CNN-RNN framework combined with beam-search generates multiple captions for a target image. Then the best caption is selected on the basis of its lexical similarity with the reference captions of most similar images. Experimental results on RSCID dataset are reported and discussed.

关键词： Beam-search algorithm encoder-decoder framework remote sensing image captioning retrieval based image captioning captions target image images retrieval Image Imagery (Psychotherapy) ON-SCENE remote sensing images frameworks

来源：评论

学校读者我要写书评

暂无评论

Feature Fusion Based on Neural Image Captioning with Spatial Attention

Feature Fusion Based on Neural Image Captioning with Spatial...

引用

作者： Qingqing Lu Xiaomei Zhang Xin Kang Fuji Ren School of Information Science and Technology Nantong University Faculty of Engineering Tokushima University

Generating a natural language description of an image is a challenging but meaningful *** task combines two significant artificial intelligent fields:computer vision and natural language *** task is valuable for many applications,such as searching images and assisting the people who have visually impaired to view the world,*** approaches adopt an encoder-decoder framework,and some of the future methods are improved on the basis of this *** these methods,image features are extracted by VGG net or other networks,but the feature map will lose important information during the *** this paper,we fusing different kinds of image features extracted by the two networks:VGG19 and Resnet50,and put it into the neural network to *** also add an attention into the a basic neural encoder-decoder model for generating natural sentence descriptions,at each time step,our model will attend to the image feature and pick up the most meaningful parts to generate *** test our model on the benchmark dataset called I APR TC-12,comparing with other methods,we validate our model have state-of-the-art performance.

关键词： Image captioning feature fusion encoder-decoder framework attention

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：