Real-time semantic segmentation is one of the important tasks in the field of computer vision, which is widely used in the fields of autonomous driving and medical imaging. Existing lightweight networks usually improv...
详细信息
Real-time semantic segmentation is one of the important tasks in the field of computer vision, which is widely used in the fields of autonomous driving and medical imaging. Existing lightweight networks usually improve inference speed at the sacrifice of segmentation accuracy. How to achieve a balance between accuracy and speed is still a challenging problem for real-time semantic segmentation. In this paper, we propose an attention based lightweight asymmetric network (ALANet) to address this problem. Specifically, in the encoder, a channel-wise attention based depth-wise asymmetric block (CADAB) is designed to extract sufficient features, which has a small number of parameters. In the decoder, a spatial attention based pyramid pooling (SAPP) module is presented to aggregate multi-scale context information by using a few convolutions and poolings;and a pixel-wise attention based multi-scale feature fusion (PAMFF) module is developed to fuse features from different scales and generate pixel-wise attention for improving image restoration. Our ALANet has only 1.32M parameters. Experimental results on the Cityscapes and CamVid datasets show that ALANet obtains the segmentation accuracy (mIoU) of 74.4% and 69.5% and the inference speed of 115.6FPS and 113.2FPS, respectively. These results demonstrate that ALANet achieves a good balance between accuracy and speed.
The deep neural network model based on self-attention (SA) for obtaining rich contextual information has been widely adopted in semantic segmentation. However, the computational complexity of the standard self-attenti...
详细信息
ISBN:
(数字)9781728186719
ISBN:
(纸本)9781728186719
The deep neural network model based on self-attention (SA) for obtaining rich contextual information has been widely adopted in semantic segmentation. However, the computational complexity of the standard self-attentive module is high, which partly limits the use of this module. In this work, we propose the lightweight self-attention network (LSANet) for semantic segmentation. Specifically, the Lightweight Self-Attentive Module (LSAM) captures information using a hand-designed compact feature representation, and weighted fusion of position information. In the decoder structure, an improved up-sampling module is proposed. Compared with the bilinear upsampling, this method achieves better results in restoring image details. The experimental results on PASCAL VOC 2012, and Cityscapes datasets show the effectiveness of our method, which simplifies operations and improves performance.
We propose a novel architecture for depth estimation from a single image. The architecture itself is based on the popular encoder-decoder architecture that is frequently used as a starting point for all dense regressi...
详细信息
ISBN:
(纸本)9783031197680;9783031197697
We propose a novel architecture for depth estimation from a single image. The architecture itself is based on the popular encoder-decoder architecture that is frequently used as a starting point for all dense regression tasks. We build on AdaBins which estimates a global distribution of depth values for the input image and evolve the architecture in two ways. First, instead of predicting global depth distributions, we predict depth distributions of local neighborhoods at every pixel. Second, instead of predicting depth distributions only towards the end of the decoder, we involve all layers of the decoder. We call this new architecture LocalBins. Our results demonstrate a clear improvement over the state-of-the-art in all metrics on the NYU-Depth V2 dataset. Code and pretrained models will be made publicly available (https://***/sharigfarooq123/LocalBins).
Flash droughts (FDs) pose significant challenges for accurate detection due to their short duration. Conventional drought monitoring methods have difficultly capturing this rapidly intensifying phenomenon accurately. ...
详细信息
Flash droughts (FDs) pose significant challenges for accurate detection due to their short duration. Conventional drought monitoring methods have difficultly capturing this rapidly intensifying phenomenon accurately. Machine learning models are increasingly useful for detecting droughts after training the models with data. Northeastern Brazil (NEB) has been a hot spot for FD events with significant ecological damage in recent years. This research introduces a novel 2D convolutional neural network (CNN) designed to identify spatial FDs in historical simulations based on multiple environmental factors and thresholds as inputs. Our model, trained with hydro-climatic data, provides a probabilistic drought detection map across northeastern Brazil (NEB) in 2012 as its output. Additionally, we examine future changes in FDs using the Coupled Model Intercomparison Project Phase 6 (CMIP6) driven by outputs from Shared Socioeconomic Pathways (SSPs) under the SSP5-8.5 scenario of 2024-2050. Our results demonstrate that the proposed spatial FD-detecting model based on 2D CNN architecture and the methodology for robust learning show promise for regional comprehensive FD monitoring. Finally, considerable spatial variability of FDs across NEB was observed during 2012 and 2024-2050, which was particularly evident in the S & atilde;o Francisco River Basin. This research significantly contributes to advancing our understanding of flash droughts, offering critical insights for informed water resource management and bolstering resilience against the impacts of flash droughts.
This study introduces a novel deep learning framework for detecting leakage in water distribution systems (WDSs). The key innovation lies in a two-step process: First, the WDS is partitioned using a K-means clustering...
详细信息
This study introduces a novel deep learning framework for detecting leakage in water distribution systems (WDSs). The key innovation lies in a two-step process: First, the WDS is partitioned using a K-means clustering algorithm based on pressure sensitivity analysis. Then, an encoder-decoder neural network (EDNN) model is employed to extract and process the pressure and flow sensitivities. The core of the framework is the PP-LCNetV2 architecture that ensures the model's lightweight, which is optimized for CPU devices. This combination ensures rapid, accurate leakage detection. Three cases are employed to evaluate the method. By applying data augmentation techniques, including the demand and measurement noises, the framework demonstrates robustness across different noise levels. Compared with other methods, the results show this method can efficiently detect over 90% of leakage across different operating conditions while maintaining a higher recognition of the magnitude of leakages. This research offers a significant improvement in computational efficiency and detection accuracy over existing approaches.
Soil water content (SWC) plays a vital role in agricultural management, geotechnical engineering, hydrological modeling, and climate research. Image-based SWC recognition methods show great potential compared to tradi...
详细信息
Soil water content (SWC) plays a vital role in agricultural management, geotechnical engineering, hydrological modeling, and climate research. Image-based SWC recognition methods show great potential compared to traditional methods. However, their accuracy and efficiency limitations hinder wide application due to their status as a nascent approach. To address this, we design the LG-SWC-R3 model based on an attention mechanism to leverage its powerful learning capabilities. To enhance efficiency, we propose a simple yet effective encoder-decoder architecture (PVP-Transformer-ED) designed on the principle of eliminating redundant spatial information from images. This architecture involves masking a high proportion of soil images and predicting the original image from the unmasked area to aid the PVP-Transformer-ED in understanding the spatial information correlation of the soil image. Subsequently, we fine-tune the SWC recognition model on the pre-trained encoder of the PVP-Transformer-ED. Extensive experimental results demonstrate the excellent performance of our designed model (R2 = 0.950, RMSE = 1.351%, MAPE = 0.081, MAE = 1.369%), surpassing traditional models. Although this method involves processing only a small fraction of original image pixels (approximately 25%), which may impact model performance, it significantly reduces training time while maintaining model error within an acceptable range. Our study provides valuable references and insights for the popularization and application of image-based SWC recognition methods.
For monaural speech enhancement, contextual information is important for accurate speech estimation. However, commonly used convolution neural networks (CNNs) are weak in capturing temporal contexts since they only bu...
详细信息
For monaural speech enhancement, contextual information is important for accurate speech estimation. However, commonly used convolution neural networks (CNNs) are weak in capturing temporal contexts since they only build blocks that process one local neighborhood at a time. To address this problem, we learn from human auditory perception to introduce a two-stage trainable reasoning mechanism, referred as global-local dependency (GLD) block. GLD blocks capture long-term dependency of time-frequency bins both in global level and local level from the noisy spectrogram to help detecting correlations among speech part, noise part, and whole noisy input. What is more, we conduct a monaural speech enhancement network called GLD-Net, which adopts encoder-decoder architecture and consists of speech object branch, interference branch, and global noisy branch. The extracted speech feature at global-level and local-level are efficiently reasoned and aggregated in each of the branches. We compare the proposed GLD-Net with existing state-of-art methods on WSJ0 and DEMAND dataset. The results show that GLD-Net outperforms the state-of-the-art methods in terms of PESQ and STOI.
Enhancement of low-light images is a challenging task due to the impact of low brightness, low contrast, and high noise. The inability to collect natural labeled data intensifies this problem further. Many researchers...
详细信息
Enhancement of low-light images is a challenging task due to the impact of low brightness, low contrast, and high noise. The inability to collect natural labeled data intensifies this problem further. Many researchers have attempted to solve this problem using learning-based approaches;however, most models ignore the impact of noise in low-lit images. In this paper, an encoder-decoder architecture, made up of separable convolution layers that solve the issues encountered in low-light image enhancement, is proposed. The architecture is trained end-to-end on a custom low-light image dataset (LID), comprising both clean and noisy images. We introduce a unique multi-context feature extraction module (MC-FEM) where the input first passes through a feature pyramid of dilated separable convolutions for hierarchical-context feature extraction followed by separable convolutions for feature compression. The model is optimized using a novel three-part loss function that focuses on high-level contextual features, structural similarity, and patch-wise local information. We conducted several ablation studies to determine the optimal model for low-light image enhancement under noisy and noiseless conditions. We have used performance metrics like peak-signal-to-noise ratio, structural similarity index matrix, visual information fidelity, and average brightness to demonstrate the superiority of the proposed work against the state-of-the-art algorithms. Qualitative results presented in this paper prove the strength and suitability of our model for real-time applications.
Droughts pose significant challenges for accurate monitoring due to their complex spatiotemporal characteristics. Data-driven machine learning (ML) models have shown promise in detecting extreme events when enough wel...
详细信息
Droughts pose significant challenges for accurate monitoring due to their complex spatiotemporal characteristics. Data-driven machine learning (ML) models have shown promise in detecting extreme events when enough well-annotated data is available. However, droughts do not have a unique and precise definition, which leads to noise in human-annotated events and presents an imperfect learning scenario for deep learning models. This article introduces a 3-D convolutional neural network (CNN) designed to address the complex task of drought detection, considering spatiotemporal dependencies and learning with noisy and inaccurate labels. Motivated by the shortcomings of traditional drought indices, we leverage supervised learning with labeled events from multiple sources, capturing the shared conceptual space among diverse definitions of drought. In addition, we employ several strategies to mitigate the negative effect of noisy labels (NLs) during training, including a novel label correction (LC) method that relies on model outputs, enhancing the robustness and performance of the detection model. Our model significantly outperforms state-of-the-art drought indices when detecting events in Europe between 2003 and 2015, achieving an AUROC of 72.28%, an AUPRC of 7.67%, and an ECE of 16.20%. When applying the proposed LC method, these performances improve by +5%, +15%, and +59%, respectively. Both the proposed model and the robust learning methodology aim to advance drought detection by providing a comprehensive solution to label noise and conceptual variability.
The Temporal Convolutional Network (TCN) and TCN combined with the encoder-decoder architecture (TCN-ED) are proposed to forecast runoff in this study. Both models are trained and tested using the hourly data in the J...
详细信息
The Temporal Convolutional Network (TCN) and TCN combined with the encoder-decoder architecture (TCN-ED) are proposed to forecast runoff in this study. Both models are trained and tested using the hourly data in the Jianxi basin, China. The results indicate that the forecast horizon has a great impact on the forecast ability, and the concentration time of the basin is a critical threshold to the effective forecast horizon for both models. Both models perform poorly in the low flow and well in the medium and high flow at most forecast horizons, while it is subject to the forecast horizon in forecasting peak flow. TCN-ED has better performance than TCN in runoff forecasting, with higher accuracy, better stability, and insensitivity to fluctuations in the rainfall process. Therefore, TCN-ED is an effective deep learning solution in runoff forecasting within an appropriate forecast horizon.
暂无评论