Standard modern machine-learning-based imaging methods have faced challenges in medical applications due to the high cost of dataset construction and, thereby, the limited labeled training data available. Additionally...
详细信息
ISBN:
(纸本)9798400704901
Standard modern machine-learning-based imaging methods have faced challenges in medical applications due to the high cost of dataset construction and, thereby, the limited labeled training data available. Additionally, upon deployment, these methods are usually used to process a large volume of data on a daily basis, imposing a high maintenance cost on medical facilities. In this paper, we introduce a new neural network architecture, termed LoGoNet, with a tailored self-supervised learning (SSL) method to mitigate such challenges. LoGoNet integrates a novel feature extractor within a U-shaped architecture, leveraging Large Kernel Attention (LKA) and a dual encoding strategy to capture both long-range and short-range feature dependencies adeptly. This is in contrast to existing methods that rely on increasing network capacity to enhance feature extraction. This combination of novel techniques in our model is especially beneficial in medical image segmentation, given the difficulty of learning intricate and often irregular body organ shapes, such as the spleen. Complementary, we propose a novel SSL method tailored for 3D images to compensate for the lack of large labeled datasets. Our method combines masking and contrastive learning techniques within a multi-task learning framework and is compatible with both Vision Transformer (ViT) and CNN-based models. We demonstrate the efficacy of our methods in numerous tasks across two standard datasets (i.e., BTCV and MSD). Benchmark comparisons with eight state-of-the-art models highlight LoGoNet's superior performance in both inference time and accuracy. Code available at: https://***/aminK8/Masked-LoGoNet.
Video frame interpolation is an increasingly important research task with several key industrial applications in the video coding, broadcast and production sectors. Recently, transformers have been introduced to the f...
详细信息
ISBN:
(纸本)9781728198354
Video frame interpolation is an increasingly important research task with several key industrial applications in the video coding, broadcast and production sectors. Recently, transformers have been introduced to the field resulting in substantial performance gains. However, this comes at a cost of greatly increased memory usage, training and inference time. In this paper, a novel method integrating a transformer encoder and convolutional features is proposed. This network reduces the memory burden by close to 50% and runs up to four times faster during inference time compared to existing transformer-based interpolation methods. A dual-encoder architecture is introduced which combines the strength of convolutions in modelling local correlations with those of the transformer for long-range dependencies. Quantitative evaluations are conducted on various benchmarks with complex motion to showcase the robustness of the proposed method, achieving competitive performance compared to state-of-the-art interpolation networks.
Dense retrieval is becoming one of the standard approaches for document and passage ranking. The dual-encoder architecture is widely adopted for scoring question-passage pairs due to its efficiency and high performanc...
详细信息
ISBN:
(纸本)9781450387323
Dense retrieval is becoming one of the standard approaches for document and passage ranking. The dual-encoder architecture is widely adopted for scoring question-passage pairs due to its efficiency and high performance. Typically, dense retrieval models are evaluated on clean and curated datasets. However, when deployed in real-life applications, these models encounter noisy user-generated text. That said, the performance of state-of-the-art dense retrievers can substantially deteriorate when exposed to noisy text. In this work, we study the robustness of dense retrievers against typos in the user question. We observe a significant drop in the performance of the dual-encoder model when encountering typos and explore ways to improve its robustness by combining data augmentation with contrastive learning. Our experiments on two large-scale passage ranking and open-domain question answering datasets show that our proposed approach outperforms competing approaches. Additionally, we perform a thorough analysis on robustness. Finally, we provide insights on how different typos affect the robustness of embeddings differently and how our method alleviates the effect of some typos but not of others.
Retrieval question answering (ReQA) is an essential mechanism to automatically satisfy the users' information needs and overcome the problem of information overload. As a promising solution to achieve fast retriev...
详细信息
ISBN:
(数字)9783031109836
ISBN:
(纸本)9783031109836;9783031109829
Retrieval question answering (ReQA) is an essential mechanism to automatically satisfy the users' information needs and overcome the problem of information overload. As a promising solution to achieve fast retrieval from large-scale candidate answers, dual-encoder framework has been widely studied to improve its representation quality for text in the recent years. Inspired by that humans usually answer the question using their background knowledge, in this work, we explore the way to incorporate knowledge entities into the retrieval model to build high-quality text representations and propose novel knowledge-aware text encoding and knowledge-aware text matching modules to facilitate the fusion between text and knowledge. The promising experimental results on various benchmarks prove the potential of the proposed approach.
Traditionally, the training phase of abstractive text summarization involves inputting two sets of integer sequences;the first set representing the source text, and the second set representing words existing in the re...
详细信息
ISBN:
(数字)9781665457279
ISBN:
(纸本)9781665457279
Traditionally, the training phase of abstractive text summarization involves inputting two sets of integer sequences;the first set representing the source text, and the second set representing words existing in the reference summary, into the encoder and decoder parts of the model, respectively. However, by using this method, the model tends to perform poorly if the source text includes words which are irrelevant or insignificant to the key ideas. In order to address this issue, we propose a new keywords-based method for abstractive summarization by combining the information provided by the source text and its keywords to generate summary. We utilize a bi-directional long short-term memory model for keyword labelling, using overlapping words between the source text and the reference summary as ground truth. The results obtained from our experiments on ThaiSum dataset show that our proposed method outperforms the traditional encoder-decoder model by 0.0425 on ROUGE-1 F1, 0.0301 on ROUGE-2 F1 and 0.0140 on BERTScore F1.
Automatic surface water body mapping using remote sensing technology is greatly meaningful for studying inland water dynamics at regional to global scales. Convolutional neural networks (CNN) have become an efficient ...
详细信息
Automatic surface water body mapping using remote sensing technology is greatly meaningful for studying inland water dynamics at regional to global scales. Convolutional neural networks (CNN) have become an efficient semantic segmentation technique for the interpretation of remote sensing images. However, the receptive field value of a CNN is restricted by the convolutional kernel size because the network only focuses on local features. The Swin Transformer has recently demonstrated its outstanding performance in computer vision tasks, and it could be useful for processing multispectral remote sensing images. In this article, a Water Index and Swin Transformer Ensemble (WISTE) method for automatic water body extraction is proposed. First, a dual-branch encoder architecture is designed for the Swin Transformer, aggregating the global semantic information and pixel neighbor relationships captured by fully convolutional networks (FCN) and multihead self-attention. Second, to prevent the Swin Transformer from ignoring multispectral information, we construct a prediction map ensemble module. The predictions of the Swin Transformer and the Normalized Difference Water Index (NDWI) are combined by a Bayesian averaging strategy. Finally, the experimental results obtained on two distinct datasets demonstrate that the WISTE has advantages over other segmentation methods and achieves the best results. The method proposed in this study can be used for improving regional to continental surface water mapping and related hydrological studies.
暂无评论