The contribution of deep learning in medical image diagnosis has gained extensive interest due to its excellent performance. Furthermore, the interest has also grown in digital pathology since it is considered the gol...
详细信息
The contribution of deep learning in medical image diagnosis has gained extensive interest due to its excellent performance. Furthermore, the interest has also grown in digital pathology since it is considered the golden standard for tumor detection and diagnosis in digital Whole Slide Images (WSIs). This paper proposes an end-toend cone-shaped encoder-decoder framework called a Multi-scale 3-stacked-Layer coned U-Net (Ms3LcU-Net) framework. It boosts performance by using many enhancements and integrating techniques such as blended mutual attention, dilated fusion, edge enhancement, and atrous pooling. Furthermore, the morphological postprocessing and test time augmentation techniques are used in Ms3LcU-Net to refine and smooth the generated segmentations. The experimental results from a quantitative perspective using multiple evaluation metrics and from a qualitative viewpoint by visualizing the generated segmentation predictions conducted on the public PAIP 2019 and DigestPath datasets demonstrated the effectiveness and competitiveness of the proposed model for tumor segmentation in WSIs. The proposed framework yielded an average clipped Jaccard Index value of 0.7211 on the validation set of the PAIP 2019 dataset. In contrast, the DigestPath dataset achieved an average dice coefficient and F1-score of 0.833 and 0.897, respectively. The code will be available publicly upon acceptance of the paper at https://***/Heba-AbdeNabi/Ms3LcU-Net-.
Remote sensing images contain a wealth of Earth-observation information. Efficient extraction and application of hidden knowledge from these images will greatly promote the development of resource and environment moni...
详细信息
Remote sensing images contain a wealth of Earth-observation information. Efficient extraction and application of hidden knowledge from these images will greatly promote the development of resource and environment monitoring, urban planning and other related fields. Remote sensing image caption (RSIC) involves obtaining textual descriptions from remote sensing images through accurately capturing and describing the semantic-level relationships between objects and attributes in the images. However, there is currently no comprehensive review summarizing the progress in RSIC based on deep learning. After defining the scope of the papers to be discussed and summarizing them all, the paper begins by providing a comprehensive review of the recent advancements in RSIC, covering six key aspects: encoder-decoder framework, attention mechanism, reinforcement learning, learning with auxiliary task, large visual language models and few-shot learning. Subsequently a brief explanation on the datasets and evaluation metrics for RSIC is given. Furthermore, we compare and analyze the results of the latest models and the pros and cons of different deep learning methods. Lastly, future directions of RSIC are suggested. The primary objective of this review is to offer researchers a more profound understanding of RSIC.
Automatic Natural language interpretation of medical images is an emerging field of Artificial Intelligence (AI). The task combines two fields of AI;computer vision and natural language processing. This is a chal-leng...
详细信息
Automatic Natural language interpretation of medical images is an emerging field of Artificial Intelligence (AI). The task combines two fields of AI;computer vision and natural language processing. This is a chal-lenging task that goes beyond object detection, segmentation, and classification because it also requires the understanding of the relationship between different objects of an image and the actions performed by these objects as visual representations. Image interpretation is helpful in many tasks like helping vi-sually impaired persons, information retrieval, early childhood learning, producing human like natural interaction between robots, and many more applications. Recently this work fascinated researchers to use the same approach by using more complex biomedical images. It has been applied from generat -ing single sentence captions to multi sentence paragraph descriptions. Medical image captioning can as-sist and speed up the diagnosis process of medical professionals and generated report can be used for many further tasks. This is a comprehensive review of recent years' research of medical image caption-ing published in different international conferences and journals. Their common parameters are extracted to compare their methods, performance, strengths, limitations, and our recommendations are discussed. Further publicly available datasets and evaluation measures used for deep-learning based captioning of medical images are also discussed. (C) 2021 Elsevier Ltd. All rights reserved.
Accurate multi-step-ahead wind speed (WS) and wind power (WP) forecasting are critical to the scheduling, planning, and maintenance of wind farms. Previous forecasting methods tend to focus on improving forecast accur...
详细信息
Accurate multi-step-ahead wind speed (WS) and wind power (WP) forecasting are critical to the scheduling, planning, and maintenance of wind farms. Previous forecasting methods tend to focus on improving forecast accuracy by integrating different models and disaggregating data while neglecting the forecasting ability of basic models. In addition, traditional multi-step-ahead output strategies have limitations that constrain the forecasting capability of models. To overcome the above challenges, this study proposes a novel forecasting model called ED-Wavenet-TF. It adopts two Wavenet networks as encoder and decoder connected by the multi-head self-attention mechanism. And, teacher forcing is used as the multi-step-ahead output strategy for WS and WP forecasting. In the training phase, ED-Wavenet-TF uses a portion of the actual data to correct the errors at the intermediate forecasting steps, while in the forecasting phase, it runs through an inference loop to make forecasts. In this study, two WS datasets and two WP datasets are used to validate the performance of ED-Wavenet-TF with univariate input. The results show that compared with Wavenet, the symmetric mean absolute percentage error of ED-Wavenet-TF at four forecasting steps is lower by at least 4.8577% on average for the WS datasets and 8.9463% on average for the WP datasets. The advantages of ED-Wavenert-TF over ten comparable models are confirmed by four evaluation indicators and the Harvey, Leybourne, and Newbold statistical hypothesis test. Moreover, ED-Wavenet-TF is extended to make multi-step-ahead forecasts with multivariate inputs, whose effectiveness is demonstrated on another open WS dataset.
Visual understanding has become more significant in gathering information in many real-life applications. For a human, it is a trivial task to understand the content in a visual, however the same is a challenging task...
详细信息
Visual understanding has become more significant in gathering information in many real-life applications. For a human, it is a trivial task to understand the content in a visual, however the same is a challenging task for a machine. Generating captions for images and videos for better understanding the situation is gaining more importance as they have wide application in assistive technologies, automatic video captioning, video summarizing, subtitling, blind navigation, and so on. The visual understanding framework will analyse the content present in the video to generate semantically accurate caption for the visual. Apart from the visual understanding of the situation, the gained semantics must be represented in a natural language like English, for which we require a language model. Hence, the semantics and grammar of the sentences being generated in English is yet another challenge. The captured description of the video is supposed to collect information of not just the objects contained in the scene, but it should also express how these objects are related to each other through the activity described in the scene, thus making the entire process a complex task for a machine. This work is an attempt to peep into the various methods for video captioning using deep learning methodologies, datasets that are widely used for these tasks and various evaluation metrics that are used for the performance comparison. The insights that we gained from our premiere work and the extensive literature review made us capable of proposing a practical, efficient video captioning architecture using deep learning which that will utilize the audio clues, external knowledge and attention context to improve the captioning process. Quantum deep learning architectures can bring about extraordinary results in object recognition tasks and feature extraction using convolutions.
Remote sensing (RS) image captioning has been recently attracting the attention of the community as it provides more semantic information with respect to the traditional tasks such as scene classification. Image capti...
详细信息
ISBN:
(数字)9781728121901
ISBN:
(纸本)9781728121918
Remote sensing (RS) image captioning has been recently attracting the attention of the community as it provides more semantic information with respect to the traditional tasks such as scene classification. Image captioning aims to generate a coherent and comprehensive description that summarizes the content of an image. The description can be obtained directly from the ground truth descriptions of similar images (retrieval based image captioning) or can be generated through the encoder-decoder framework. The former has the limitation of not generating new descriptions. The latter may be affected by misrecognition of scenes or semantic objects. In this paper we try to address these issues by proposing a new framework which is a combination of generation and retrieval based image captioning. First a CNN-RNN framework combined with beam-search generates multiple captions for a target image. Then the best caption is selected on the basis of its lexical similarity with the reference captions of most similar images. Experimental results on RSCID dataset are reported and discussed.
Generating a natural language description of an image is a challenging but meaningful *** task combines two significant artificial intelligent fields:computer vision and natural language *** task is valuable for many ...
详细信息
Generating a natural language description of an image is a challenging but meaningful *** task combines two significant artificial intelligent fields:computer vision and natural language *** task is valuable for many applications,such as searching images and assisting the people who have visually impaired to view the world,*** approaches adopt an encoder-decoder framework,and some of the future methods are improved on the basis of this *** these methods,image features are extracted by VGG net or other networks,but the feature map will lose important information during the *** this paper,we fusing different kinds of image features extracted by the two networks:VGG19 and Resnet50,and put it into the neural network to *** also add an attention into the a basic neural encoder-decoder model for generating natural sentence descriptions,at each time step,our model will attend to the image feature and pick up the most meaningful parts to generate *** test our model on the benchmark dataset called I APR TC-12,comparing with other methods,we validate our model have state-of-the-art performance.
暂无评论