For many urban studies it is necessary to obtain remote sensing images with high hyperspectral and spatial resolution by fusing the hyperspectral and panchromatic remote sensing images. In this article, we propose a d...
详细信息
For many urban studies it is necessary to obtain remote sensing images with high hyperspectral and spatial resolution by fusing the hyperspectral and panchromatic remote sensing images. In this article, we propose a deep learning model of an encoder-decoder with a residual network (EDRN) for remote sensing image fusion. First, we combined the hyperspectral and panchromatic remote sensing images to circumvent the independence of the hyperspectral and panchromatic image features. Second, we established an encoder-decoder network for extracting representative encoded and decoded deep features. Finally, we established residual networks between the encodernetwork and the decodernetwork to enhance the extracted deep features. We evaluated the proposed method on six groups of real-world hyperspectral and panchromatic image datasets, and the experimental results confirmed the superior performance of the proposed method versus six other methods.
Objective. Automated cell nuclei segmentation is vital for the histopathological diagnosis of cancer. However, nuclei segmentation from 'hematoxylin and eosin' (HE) stained 'whole slide images' (WSIs) ...
详细信息
Objective. Automated cell nuclei segmentation is vital for the histopathological diagnosis of cancer. However, nuclei segmentation from 'hematoxylin and eosin' (HE) stained 'whole slide images' (WSIs) remains a challenge due to noise-induced intensity variations and uneven staining. The goal of this paper is to propose a novel deep learning model for accurately segmenting the nuclei in HE-stained WSIs. Approach. We introduce FEEDNet, a novel encoder-decoder network that uses LSTM units and `feature enhancement blocks' (FE-blocks). Our proposed FE-block avoids the loss of location information incurred by pooling layers by concatenating the downsampled version of the original image to preserve pixel intensities. FEEDNet uses an LSTM unit to capture multi-channel representations compactly. Secondly, for datasets that provide class information, we train a multiclass segmentation model, which generates masks corresponding to each class at the output. Using this information, we generate more accurate binary masks than that generated by conventional binary segmentation models. Main results. We have thoroughly evaluated FEEDNet on CoNSeP, Kumar, and CPM-17 datasets. FEEDNet achieves the best value of PQ (panoptic quality) on CoNSeP and CPM-17 datasets and the second best value of PQ on the Kumar dataset. The 32-bit floating-point version of FEEDNet has a model size of 64.90 MB. With INT8 quantization, the model size reduces to only 16.51 MB, with a negligible loss in predictive performance on Kumar and CPM-17 datasets and a minor loss on the CoNSeP dataset. Significance. Our proposed idea of generalized class-aware binary segmentation is shown to be accurate on a variety of datasets. FEEDNet has a smaller model size than the previous nuclei segmentation networks, which makes it suitable for execution on memory-constrained edge devices. The state-of-the-art predictive performance of FEEDNet makes it the most preferred network. The source code can be obtained from-https;//git
Rain streaks in an image appear in different sizes and orientations,resulting in severe blurring and visual quality *** CNNbased algorithms have achieved encouraging deraining results although there are certain limita...
详细信息
Rain streaks in an image appear in different sizes and orientations,resulting in severe blurring and visual quality *** CNNbased algorithms have achieved encouraging deraining results although there are certain limitations in the description of rain streaks and the restoration of scene structures in different *** this paper,we propose an efficient multi-scale enhancement and aggregation network(MEAN)to solve the single-image deraining *** the importance of large receptive fields and multi-scale features,we introduce a multi-scale enhanced unit(MEU)to capture longrange dependencies and exploit features at different scales to depict ***,an attentive aggregation unit(AAU)is designed to utilize the informative features in spatial and channel dimensions,thereby aggregating effective information to eliminate redundant features for rich scenario *** improve the deraining performance of the encoder–decodernetwork,we utilized an AAU to filter the information in the encodernetwork and concatenated the useful features to the decodernetwork,which is conducive to predicting high-quality clean *** results on synthetic datasets and real-world samples show that the proposed method achieves a significant deraining performance compared to state-of-the-art approaches.
作者:
Rui, TangYan, Li HuiKai, XuYi, DingUESTC
Sch Informat & Software Engn Chengdu Peoples R China CNPC
CNPC Offshore Engn Co Ltd Tianjin Peoples R China BHDC
2 Cementing Co Tianjin Peoples R China Univ Elect Sci & Technol China
Network & Data Secur Key Lab Sichuan Prov Chengdu 610054 Peoples R China UESTC Guangdong
Inst Elect & Informat Engn Dongguan 523808 Peoples R China
Currently, the-state-of-art semantic segmentation methods often bring the huge computational cost to achieve high performance, and it is difficult to balance the inference speed and model accuracy, and it is difficult...
详细信息
ISBN:
(纸本)9781665423144
Currently, the-state-of-art semantic segmentation methods often bring the huge computational cost to achieve high performance, and it is difficult to balance the inference speed and model accuracy, and it is difficult to deploy on the equipment with limited hardware resources. In this paper, we propose an asymmetric high-resolution (1024*2048px) real-time semantic segmentation network that can balance accuracy and speed-Fast Real-time Semantic Segmentation network(FRSSNet). We combine the existing multi-branch network and encoder-decoder structure and design a new bottleneck. Meanwhile, we use multiscale to get better segmentation results. By combining spatial details with semantic information, the accuracy can reach 69.6% on Cityscapes and the FPS can reach 200.
Automated crack detection is vital for structural maintenance in areas such as construction, roads, and bridges. Accurate crack detection allows for the timely identification and repair of cracks, reducing safety risk...
详细信息
Automated crack detection is vital for structural maintenance in areas such as construction, roads, and bridges. Accurate crack detection allows for the timely identification and repair of cracks, reducing safety risks and extending the service life of structures. However, traditional methods struggle with fine cracks, complex backgrounds, and image noise. In recent years, although deep learning techniques excel in pixel-level crack segmentation, challenges like inadequate local feature processing, information loss, and class imbalance persist. To address these challenges, we propose an encoder-decoder network based on multiple selective fusion mechanisms. Initially, a star feature enhancement module is designed to resolve the issues of insufficient local feature processing and feature redundancy during the feature extraction process. Then, a multi-scale adaptive fusion module is developed to selective capture both global and local contextual information, mitigating the information loss. Finally, to tackle class imbalance, a multi-scale monitoring and selective output module is introduced to enhance the model's focus on crack features and suppress the interference from background and irrelevant information. Extensive experiments are conducted on three publicly available crack datasets: SCD, CFD, and DeepCrack. The results demonstrate that the proposed segmentation network achieves superior performance in pixel-level crack segmentation, with Dice scores of 66.2%, 54.2%, and 86.8% and mIoU values of 74.4%, 67.5%, and 87.9% on the SCD, CFD, and DeepCrack datasets, respectively. These results outperform those of existing models, such as U-Net, DeepLabv3+, and Attention UNet, particularly in handling complex backgrounds, fine cracks, and low-contrast images. Furthermore, the proposed MSF-CrackNet also significantly reduces computational complexity, with only 2.39 million parameters and 8.58 GFLOPs, making it a practical and efficient solution for real-world crack detecti
Accurate estimation of burden surface depth plays a crucial role in constructing the temperature field and optimizing reaction control in volatile kilns. However, most image-based depth estimation techniques require h...
详细信息
Accurate estimation of burden surface depth plays a crucial role in constructing the temperature field and optimizing reaction control in volatile kilns. However, most image-based depth estimation techniques require high-quality input images and achieve limited accuracy, which restrict their applications in actual harsh working conditions such as high temperature, heavy dust and dense smoke. In this study, a deep learning-based monocular depth estimation model is proposed to measure the burden surface depth in the volatile kiln head zone. The proposed model integrates an encoder-decoder network with an attention module. The encoder-decoder network outputs a set of deep semantic features, while the attention module intelligently fuses multi-level features to predict a probability distribution over depth intervals for each pixel. A volatile kiln prototype is designed and constructed to generate image datasets of the kiln head zone which approximate real data collected from industrial production sites. Results demonstrate that the proposed model has a depth prediction error of RMSE = 11.008 mm for the burden surface region, outperforming state-of-the-art neural networks and the traditional depth-from-defocus method. Code and datasets are available at https://***/LLLcong/Attention-MonoDepth.
Video frame prediction represents a fundamental challenge in computer vision, necessitating precise modeling of both spatial and temporal dynamics within video sequences. This computational task holds substantial impl...
详细信息
Video frame prediction represents a fundamental challenge in computer vision, necessitating precise modeling of both spatial and temporal dynamics within video sequences. This computational task holds substantial implications across diverse domains, including video compression optimization, robust object tracking systems, and advanced motion forecasting applications. In this investigation, we present a novel hybrid architecture that synthesizes the complementary strengths of Convolutional Long Short-Term Memory (ConvLSTM) networks and three-dimensional Convolutional Neural networks (3D CNN) for enhanced frame prediction capabilities. Our methodological framework incorporates a ConvLSTM component that fundamentally augments the traditional LSTM architecture through the integration of convolutional operations, thereby facilitating sophisticated modeling of sequential dependencies. Concurrently, the 3D CNN component employs volumetric convolutional layers to extract rich spatio-temporal features from the input sequences. Rigorous empirical evaluation demonstrates the superior performance of the ConvLSTM architecture, which consistently yields reduced validation errors and elevated coefficients of determination. Specifically, the ConvLSTM model achieves a validation Mean Squared Error (MSE) of 0.0237 and an R2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textrm{R}}<^>{2}$$\end{document} value of 0.6951, substantially outperforming the 3D CNN model, which exhibits a validation MSE of 0.0471 and an R2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textrm{R}}<^>{2}$$\end{document} value of 0.3939. These empiri
Automated polyp segmentation of colonoscopy images is crucial in clinical settings, providing indispensable data for diagnosis and surgical procedures. Deep convolutional neural networks (CNNs) have shown promise in t...
详细信息
Automated polyp segmentation of colonoscopy images is crucial in clinical settings, providing indispensable data for diagnosis and surgical procedures. Deep convolutional neural networks (CNNs) have shown promise in this domain, yet existing methods often fail to adequately address interactions between multilevel features, leading to suboptimal results. To overcome these limitations, we propose the attention-guided asymmetric multiscale polyp segmentation network (AAPCNet), a novel framework designed to effectively capture comprehensive semantic information for accurate polyp segmentation. AAPCNet leverages the Res2Net-50 backbone with a split-aggregation strategy embedded in Bottle2neck blocks to extract rich multilevel features. To enhance contextual understanding, we introduce the deep aggregation and fusion module (DAFM), which employs large-sized dilated and asymmetric convolutions to capture multiscale information, addressing the challenges posed by polyps of varying sizes. Furthermore, the spatial contextual fusion module (SCFM) utilizes spatial and channelwise attention mechanisms to refine features by emphasizing polyp-specific details while suppressing irrelevant background information. The innovation of our lightweight yet effective decoder lies in its unique architecture, which integrates a residual block (RB) between two SCFM modules, enabling feature refinement, enhanced polyp details, precise localization, and accurate boundary delineation while suppressing noise. This architecture achieves superior segmentation performance and outperforms state-of-the-art CCN-based models in both in-domain and out-of-domain datasets. Comprehensive experiments demonstrate that AAPCNet consistently achieves a favorable balance between accuracy and computational efficiency. Our codes and results are publicly available at: https://***/Mkhan143/AAPCNet.
The purpose of image inpainting is to restore and fill missing areas, and how to restore delicate and reasonable missing content has always been one key issue. In the past decade, remarkable achievements have been mad...
详细信息
The purpose of image inpainting is to restore and fill missing areas, and how to restore delicate and reasonable missing content has always been one key issue. In the past decade, remarkable achievements have been made in image inpainting based on deep learning. However, when faced with large and irregular missing areas, there are still some problems such as semantic inconsistency, blurred edges and artifacts in the inpainted images. To address these problems, this paper proposes a novel image inpainting algorithm WFIL-NET which is based on wavelet downsampling and frequency integrated learning module. The WFIL-NET adopts the generative adversarial network (GAN) structure, where the encoder-decoder network is used in the generator part. To retain rich information while reducing the image resolution, we propose to use wavelet downsampling module in the encoder part to enhance the capacity of subsequent operations to learn representative features. Moreover, the wavelet transform extracts image features at different frequency levels: low-frequency information encapsulates the primary content and structure, whereas high-frequency information captures details and texture. The proposed frequency integrated learning module employs the attention mechanism to allocate appropriate weights to high and low frequency information, effectively integrating them to ensure a more coherent structure and semantic consistency in the inpainted image. Experimental results on the CelebA-HQ and Places2 datasets demonstrate that the proposed method effectively fills large and irregular missing areas, significantly enhances the visual quality of inpainted images, and mitigates edge blurring and artifacts.
Objective: Non-invasive fetal electrocardiography has the potential to provide vital information for evaluating the health status of the fetus. However, the low signal-to-noise ratio of the fetal electrocardiogram (EC...
详细信息
Objective: Non-invasive fetal electrocardiography has the potential to provide vital information for evaluating the health status of the fetus. However, the low signal-to-noise ratio of the fetal electrocardiogram (ECG) impedes the applicability of the method in clinical practice. Quality improvement of the fetal ECG is of great importance for providing accurate information to enable support in medical decision-making. In this paper we propose the use of artificial intelligence for the task of one-channel fetal ECG enhancement as a post-processing step after maternal ECG suppression. Approach: We propose a deep fully convolutional encoder-decoder framework, learning end-to-end mappings from noise-contaminated fetal ECGs to clean ones. Symmetric skip-layer connections are used between corresponding convolutional and transposed convolutional layers to help recover the signal details. Main results: Experiments on synthetic data show an average improvement of 7.5 dB in the signal-to-noise ratio (SNR) for input SNRs in the range of -15 to 15 dB. Application of the method with real signals and subsequent ECG interval analysis demonstrates a root mean square error of 9.9 and 14 ms for the PR and QT intervals, respectively, when compared with simultaneous scalp measurements. The proposed network can achieve substantial noise removal on both synthetic and real data. In cases of highly noise-contaminated signals some morphological features might be unreliably reconstructed. Significance: The presented method has the advantage of preserving individual variations in pulse shape and beat-to-beat intervals. Moreover, no prior knowledge on the power spectra of the noise or the pulse locations is required.
暂无评论