Automatic brain tumor segmentation technology plays a crucial role in tumor diagnosis, particularly in the precise delineation of tumor subregions. It can assist doctors in accurately assessing the type and location o...
详细信息
Automatic brain tumor segmentation technology plays a crucial role in tumor diagnosis, particularly in the precise delineation of tumor subregions. It can assist doctors in accurately assessing the type and location of brain tumors, potentially saving patients' lives. However, the highly variable size and shape of brain tumors, along with their similarity to healthy tissue, pose significant challenges in the segmentation of multi-label brain tumor subregions. This paper proposes a network model, KIDBA-Net, based on an encoder-decoder architecture, aimed at solving the issue of pixel-level classification errors in multi-label tumor subregions. The proposed Kernel Inception Depthwise Block (KIDB) employs multi-kernel depthwise convolution to extract multi-scale features in parallel, accurately capturing the feature differences between tumor types to mitigate misclassification. To ensure the network focuses more on the lesion areas and excludes the interference of irrelevant tissues, this paper adopts Bi-Cross Attention as a skip connection hub to bridge the semantic gap between layers. Additionally, the Dynamic Feature Reconstruction Block (DFRB) exploits the complementary advantages of convolution and dynamic upsampling operators, effectively aiding the model in generating high-resolution prediction maps during the decoding phase. The proposed model surpasses other state-of-the-art brain tumor segmentation methods on the BraTS2018 and BraTS2019 datasets, particularly in the segmentation accuracy of smaller and highly overlapping tumor core (TC) and enhanced tumor (ET), achieving DSC scores of 87.8%, 82.0%, and 90.2%, 88.7%, respectively;Hausdorff distances of 2.8, 2.7 mm, and 2.7, 2.0 mm.
Fine-grained image captioning with attribute information has garnered significant attention in the realms of computer vision and natural language processing, demanding precise and contextually relevant descriptions of...
详细信息
Fine-grained image captioning with attribute information has garnered significant attention in the realms of computer vision and natural language processing, demanding precise and contextually relevant descriptions of visual content. While previous attribute-driven image captioning models have shown improvements, challenges remain, such as the independence of attribute predictors and caption generators and the semantic gap between images and attributes. Another common issue is the inclusion of all attributes at every time step, despite most attributes being irrelevant to the word currently being generated. This can divert the model's attention toward erroneous semantic details, resulting in a performance decline. To address these issues, we propose a novel Attribute-Driven Filtering (ADF) captioning network designed to provide rich and nuanced descriptions. This model incorporates a unique Attribute Predictor Module (APM) that dynamically predicts the most pertinent attributes in accordance with the textual context, utilizing different attributes at various time steps. The novelty of this approach lies in recognizing that not all attributes hold equal relevance at each time step, and the APM filters out irrelevant attributes to generate precise and contextually relevant captions. Furthermore, this model features a fusion mechanism that integrates visual information from a conventional attention module with attribute information predicted by the APM, aiming to reduce the visual semantic gap between images and attributes. Extensive experimentation demonstrates that the ADF model outperforms advanced models, achieving impressive CIDEr-D scores of 72.0 (Flickr30K) and 123.3 (MS-COCO) through reinforcement learning optimization. It consistently surpasses baseline models across diverse evaluation metrics, highlighting its effectiveness and robustness.
In this paper, we introduce a two-stage deep learning-based image restoration network and its application to remove shadow information from a single image, named by ESCNet (encoder-decoder based Shadow removal with Co...
详细信息
In this paper, we introduce a two-stage deep learning-based image restoration network and its application to remove shadow information from a single image, named by ESCNet (encoder-decoder based Shadow removal with Colorization Network). Most existed single image-based shadow removal methods may suffer from that the shadow contains multiple regions of different colors or rich image details. To tackle with the problems, our key idea is to first remove shadow(s) from an image followed by repainting the shadow-removed region(s) in this image. To accomplish this, we present a deep two-stage network, cascading a shadow removal network (SRN) and a colorization network (CN). The presented encoder-decoder-based SRN with fusion of global and local feature information is used to remove the shadow(s) in the grayscale domain of the input image while recovering the image details for the shadow-removed region(s). Then the proposed CN aims at repainting the removed shadow region(s) via re-colorization. The proposed deep model has been well trained and well evaluated on the two well-known public datasets, i.e., ISTD (Image Shadow Triplets Dataset) and SRD (Shadow Removal Dataset). Experimental results have shown that the proposed method outperforms the compared state-of-the-art (SOTA) shadow removal approaches quantitatively and qualitatively.
In the realm of Earth observation and remote sensing data analysis, the advancement of hyperspectral imaging (HSI) classification technology is of paramount importance. Nevertheless, the intricate nature of hyperspect...
详细信息
In the realm of Earth observation and remote sensing data analysis, the advancement of hyperspectral imaging (HSI) classification technology is of paramount importance. Nevertheless, the intricate nature of hyperspectral data, coupled with the scarcity of labeled data, presents significant challenges in this domain. To mitigate these issues, we introduce a self-supervised learning algorithm predicated on a spectral transformer for HSI classification under conditions of limited labeled data, with the objective of enhancing the efficacy of HSI classification. The S3L algorithm operates in two distinct phases: pretraining and fine-tuning. During the pretraining phase, the algorithm learns the spatial representation of HSI from unlabeled data, utilizing a masking mechanism and a spectral transformer, thereby augmenting the sequence dependence of spectral features. Subsequently, in the fine-tuning phase, labeled data is employed to refine the pretrained weights, thereby improving the precision of HSI classification. Within the comprehensive encoder-decoder framework, we propose a novel spectral transformer module specifically engineered to synergize spatial feature extraction with spectral domain analysis. This innovative module adeptly navigates the complex interplay among various spectral bands, capturing both global and sequential spectral dependencies. Uniquely, it incorporates a gated recurrent unit (GRU) layer within the encoder to enhance its ability to process spectral sequences. Our experimental evaluations across several public datasets reveal that our proposed method, distinguished by its spectral transformer, achieves superior classification performance, particularly in scenarios with limited labeled samples, outperforming existing state-of-the-art approaches.
Short-term metro ridership prediction is of great significance to efficient and economic operation of Urban Rail Transit (URT) systems. With the popularity of Graph Convolution Networks (GCN) and Transformers, the rec...
详细信息
Short-term metro ridership prediction is of great significance to efficient and economic operation of Urban Rail Transit (URT) systems. With the popularity of Graph Convolution Networks (GCN) and Transformers, the recent notable metro ridership forecasting methods are GCN-based and Transformer -based models. However, existing methods face the following drawbacks. First, GCN-based models fail to effectively capture global spatial correlations which are significant for accurate prediction. Second, Transformer -based models are prone to loss temporal information due to the permutation -invariant and anti -order properties of the self -attention which they used for capturing temporal correlations. To overcome the drawbacks, we propose a novel sequence -tosequence metro ridership prediction model, named SDT-GRU, with Stacked DT-GRU layers as both encoder and decoder. The core component of our model is DT-GRU, which integrates Dual -branch Transformer decoder into the GRU to effectively capture global spatial correlations and temporal correlations with Transformer decoder and GRU, separately. In particular, the DT-GRU module uses one branch Transformer encoder layer to capture spatial correlations within the same timestamp, and adopts another Transformer encoder layer to implicitly capture spatio-temporal correlations among previous timestamps. Then, outputs of the two Transformer encoder layers are fed into a GRU layer to capturing spatio-temporal patterns. To evaluate the effectiveness of the proposed SDT-GRU, we conduct comprehensive experiments on three real -world metro ridership datasets from Beijing, Shanghai and Hangzhou. Experimental results demonstrate that our SDT-GRU achieves better prediction performance than the state-of-the-art baselines.
The deep neural network model based on self-attention (SA) for obtaining rich contextual information has been widely adopted in semantic segmentation. However, the computational complexity of the standard self-attenti...
详细信息
ISBN:
(数字)9781728186719
ISBN:
(纸本)9781728186719
The deep neural network model based on self-attention (SA) for obtaining rich contextual information has been widely adopted in semantic segmentation. However, the computational complexity of the standard self-attentive module is high, which partly limits the use of this module. In this work, we propose the lightweight self-attention network (LSANet) for semantic segmentation. Specifically, the Lightweight Self-Attentive Module (LSAM) captures information using a hand-designed compact feature representation, and weighted fusion of position information. In the decoder structure, an improved up-sampling module is proposed. Compared with the bilinear upsampling, this method achieves better results in restoring image details. The experimental results on PASCAL VOC 2012, and Cityscapes datasets show the effectiveness of our method, which simplifies operations and improves performance.
We propose a novel architecture for depth estimation from a single image. The architecture itself is based on the popular encoder-decoder architecture that is frequently used as a starting point for all dense regressi...
详细信息
ISBN:
(纸本)9783031197680;9783031197697
We propose a novel architecture for depth estimation from a single image. The architecture itself is based on the popular encoder-decoder architecture that is frequently used as a starting point for all dense regression tasks. We build on AdaBins which estimates a global distribution of depth values for the input image and evolve the architecture in two ways. First, instead of predicting global depth distributions, we predict depth distributions of local neighborhoods at every pixel. Second, instead of predicting depth distributions only towards the end of the decoder, we involve all layers of the decoder. We call this new architecture LocalBins. Our results demonstrate a clear improvement over the state-of-the-art in all metrics on the NYU-Depth V2 dataset. Code and pretrained models will be made publicly available (https://***/sharigfarooq123/LocalBins).
Flash droughts (FDs) pose significant challenges for accurate detection due to their short duration. Conventional drought monitoring methods have difficultly capturing this rapidly intensifying phenomenon accurately. ...
详细信息
Flash droughts (FDs) pose significant challenges for accurate detection due to their short duration. Conventional drought monitoring methods have difficultly capturing this rapidly intensifying phenomenon accurately. Machine learning models are increasingly useful for detecting droughts after training the models with data. Northeastern Brazil (NEB) has been a hot spot for FD events with significant ecological damage in recent years. This research introduces a novel 2D convolutional neural network (CNN) designed to identify spatial FDs in historical simulations based on multiple environmental factors and thresholds as inputs. Our model, trained with hydro-climatic data, provides a probabilistic drought detection map across northeastern Brazil (NEB) in 2012 as its output. Additionally, we examine future changes in FDs using the Coupled Model Intercomparison Project Phase 6 (CMIP6) driven by outputs from Shared Socioeconomic Pathways (SSPs) under the SSP5-8.5 scenario of 2024-2050. Our results demonstrate that the proposed spatial FD-detecting model based on 2D CNN architecture and the methodology for robust learning show promise for regional comprehensive FD monitoring. Finally, considerable spatial variability of FDs across NEB was observed during 2012 and 2024-2050, which was particularly evident in the S & atilde;o Francisco River Basin. This research significantly contributes to advancing our understanding of flash droughts, offering critical insights for informed water resource management and bolstering resilience against the impacts of flash droughts.
This study introduces a novel deep learning framework for detecting leakage in water distribution systems (WDSs). The key innovation lies in a two-step process: First, the WDS is partitioned using a K-means clustering...
详细信息
This study introduces a novel deep learning framework for detecting leakage in water distribution systems (WDSs). The key innovation lies in a two-step process: First, the WDS is partitioned using a K-means clustering algorithm based on pressure sensitivity analysis. Then, an encoder-decoder neural network (EDNN) model is employed to extract and process the pressure and flow sensitivities. The core of the framework is the PP-LCNetV2 architecture that ensures the model's lightweight, which is optimized for CPU devices. This combination ensures rapid, accurate leakage detection. Three cases are employed to evaluate the method. By applying data augmentation techniques, including the demand and measurement noises, the framework demonstrates robustness across different noise levels. Compared with other methods, the results show this method can efficiently detect over 90% of leakage across different operating conditions while maintaining a higher recognition of the magnitude of leakages. This research offers a significant improvement in computational efficiency and detection accuracy over existing approaches.
Soil water content (SWC) plays a vital role in agricultural management, geotechnical engineering, hydrological modeling, and climate research. Image-based SWC recognition methods show great potential compared to tradi...
详细信息
Soil water content (SWC) plays a vital role in agricultural management, geotechnical engineering, hydrological modeling, and climate research. Image-based SWC recognition methods show great potential compared to traditional methods. However, their accuracy and efficiency limitations hinder wide application due to their status as a nascent approach. To address this, we design the LG-SWC-R3 model based on an attention mechanism to leverage its powerful learning capabilities. To enhance efficiency, we propose a simple yet effective encoder-decoder architecture (PVP-Transformer-ED) designed on the principle of eliminating redundant spatial information from images. This architecture involves masking a high proportion of soil images and predicting the original image from the unmasked area to aid the PVP-Transformer-ED in understanding the spatial information correlation of the soil image. Subsequently, we fine-tune the SWC recognition model on the pre-trained encoder of the PVP-Transformer-ED. Extensive experimental results demonstrate the excellent performance of our designed model (R2 = 0.950, RMSE = 1.351%, MAPE = 0.081, MAE = 1.369%), surpassing traditional models. Although this method involves processing only a small fraction of original image pixels (approximately 25%), which may impact model performance, it significantly reduces training time while maintaining model error within an acceptable range. Our study provides valuable references and insights for the popularization and application of image-based SWC recognition methods.
暂无评论