The traditional technology of radar echo image extrapolation for rainfall nowcasting faces such problems as insufficiently high accuracy, the incomplete analysis of the data on radar echo images, and the image blurrin...
详细信息
ISBN:
(纸本)9783030916077;9783030916084
The traditional technology of radar echo image extrapolation for rainfall nowcasting faces such problems as insufficiently high accuracy, the incomplete analysis of the data on radar echo images, and the image blurring from the stacked LSTM(Long Short-Term Memory). In order to more accurately and clearly predict the radar echo image at a future moment, an adversarial prediction network based on multi-scale U-shaped encoder-decoder is proposed. To overcome the problem of insufficient details of the predicted image, the generator of the network adopts a U-shaped encoder-decoder structure with jump-layer connection. At the same time, in order to capture the echo movement at different scales, multi-scale convolution kernels is introduced to the encoder-decoder units. Then the conventional discriminator structure is improved and stacked ConvLSTM(Convolutional Long Short-Term Memory) layers were proposed to classify sequence. Based on the prediction of next ten frames from the given ten frames of images, this paper tests the network on the SRAD(Standardized Radar Dataset), and compares the prediction results of different networks. The test results show that the proposed model reduces image blurring, enhances the prediction accuracy while retaining sufficient prediction details.
U-Net is the most cited and widely-used deep learning model for biomedical image segmentation. In this paper, we propose a new enhanced version of a ubiquitous U-Net architecture, which improves upon the original one ...
详细信息
U-Net is the most cited and widely-used deep learning model for biomedical image segmentation. In this paper, we propose a new enhanced version of a ubiquitous U-Net architecture, which improves upon the original one in terms of generalization capabilities, while addressing several immanent shortcomings, such as constrained resolution and non-resilient receptive fields of the main pathway. Our novel multi-path architecture introduces a notion of an individual receptive field pathway, which is merged with other pathways at the bottom-most layer by concatenation and subsequent application of Layer Normalization and Spatial Dropout, which can improve generalization performance for small datasets. In general, our experiments show that the proposed multi-path architecture outperforms other state-of-the-art approaches that embark on similar ideas of pyramid structures, skip-connections, and encoder-decoder pathways. A significant improvement of the Dice similarity coefficient is attained at our proprietary colony-forming unit dataset, where a score of 0.809 was achieved for the foreground class.
Understanding karst spring flow is important to accommodate the increasing water demand caused by the population growth and manage the freshwater water resource effectively. However, due to the spatial and temporal he...
详细信息
Understanding karst spring flow is important to accommodate the increasing water demand caused by the population growth and manage the freshwater water resource effectively. However, due to the spatial and temporal heterogeneity and complex hydrological processes in karst systems, predicting karst spring discharge remains challenging. In this study, three deep learning-based models, including long short-term memory (LSTM), gated recurrent unit (GRU) and simple recurrent neural network (RNN), are framed with an encoder-decoder architecture to provide multiple-step-ahead spring discharge prediction. The encoder-decoder architecture includes an encoder that reads and encodes the input sequence into a vector and decoder that deciphers the vector and outputs the predicted sequence. Three hybrid models called LSTM-ED, GRU-ED and simple RNN-ED are compared with single-step models and multiple-step models without the encoder-decoder architecture to investigate the role of the encoder-decoder architecture on multi-step-ahead prediction. The sensitivity of the selection of input time and lead time steps on the karst spring discharge prediction is evaluated. The predicted results are compared with the observed spring discharge. It implies that: (1) LSTM-ED, GRU-ED and RNN-ED models obtain similar results on predicting karst spring discharge multiple time steps ahead;(2) three hybrid multiple-step models outperform the single-step models in making consistent and accurate spring discharge predictions;(3) the multiple-step models framed with an encoder-decoder architecture obtain better spring discharge prediction results than the single-step models and multiple-step models without the encoder-decoder structure;(4) the LSTM-ED, GRU-ED and simple RNN-ED models are sensitive to the selection of lead time and insensitive to the selection of input time step. A short lead time typically yields a more accurate spring discharge prediction.
With the advancement of deep learning, the newly proposed neural networks are growing increasingly complicated to achieve great performance. In this context, we propose a simple but effective neural network called Min...
详细信息
With the advancement of deep learning, the newly proposed neural networks are growing increasingly complicated to achieve great performance. In this context, we propose a simple but effective neural network called MiniCrack for narrow crack detection. We also propose a lightweight version, MiniCrack-Light, to adapt to scenarios with limited computing resources. MiniCrack and MiniCrack-Light outperform the current state-of-the-art neural networks on all three challenging testing data sets with fewer parameters and achieving stronger robustness. PixelShuffle and PixelUnshuffle designed for image super-resolution are successfully used to the field of image segmentation, which effectively alleviates the problems caused by pooling.
作者:
Hu, ShanShanChen, PengGu, PengyingWang, BingAnhui Univ
Minist Educ Key Lab Intelligent Comp & Signal Proc Hefei 230601 Peoples R China Anhui Univ
Sch Comp Sci & Technol Hefei 230601 Peoples R China Civil Aviat Flight Univ China
Coll Air Traff Management Guanghan 618307 Peoples R China Anhui Univ
Inst Phys Sci Hefei 230601 Peoples R China Anhui Univ
Inst Informat Technol Hefei 230601 Peoples R China Anhui Univ
Sch Internet Hefei 230601 Peoples R China Univ Sci & Technol China
Div Life Sci & Med Affiliated Hosp USTC 1 Cadres Ward South Dist Hefei 230001 Peoples R China Anhui Univ Technol
Sch Elect & Informat Engn Maanshan 243032 Peoples R China Anhui Educ Dept
Key Lab Power Elect & Mot Control Maanshan 243032 Peoples R China
Research on quantitative structure-activity relationships (QSAR) provides an effective approach to determine new hits and promising lead compounds during drug discovery. In the past decades, various works have gained ...
详细信息
Research on quantitative structure-activity relationships (QSAR) provides an effective approach to determine new hits and promising lead compounds during drug discovery. In the past decades, various works have gained good performance for QSAR with the development of machine learning. The rise of deep learning, along with massive accessible chemical databases, made improvement on the QSAR performance. This article proposes a novel deep-learning-based method to implement QSAR prediction by the concatenation of end-to-end encoder-decoder model and convolutional neural network (CNN) architecture. The encoder-decoder model is mainly used to generate fixed-size latent features to represent chemical molecules;while these features are then input into CNN framework to train a robust and stable model and finally to predict active chemicals. Two models with different schemes are investigated to evaluate the validity of our proposed model on the same data sets. Experimental results showed that our proposed method outperforms other state-of-the-art methods in successful identification of chemical molecule whether it is active.
The detection of retinal vessel is of great importance in the diagnosis and treatment of many ocular diseases. Many methods have been proposed for vessel detection. However, most of the algorithms neglect the connecti...
详细信息
The detection of retinal vessel is of great importance in the diagnosis and treatment of many ocular diseases. Many methods have been proposed for vessel detection. However, most of the algorithms neglect the connectivity of the vessels, which plays an important role in the diagnosis. In this paper, we propose a novel method for retinal vessel detection. The proposed method includes a dense dilated network to get an initial detection of the vessels and a probability regularized walk algorithm to address the fracture issue in the initial detection. The dense dilated network integrates newly proposed dense dilated feature extraction blocks into an encoder-decoder structure to extract and accumulate features at different scales. A multi-scale Dice loss function is adopted to train the network. To improve the connectivity of the segmented vessels, we also introduce a probability regularized walk algorithm to connect the broken vessels. The proposed method has been applied on three public data sets: DRIVE, STARE and CHASE_DB1. The results show that the proposed method outperforms the state-of-the-art methods in accuracy, sensitivity, specificity and also area under receiver operating characteristic curve.
Attention-based methods and Connectionist Temporal Classification (CTC) network have been promising research directions for end-to-end (E2E) Automatic Speech Recognition (ASR). The joint CTC/Attention model has achiev...
详细信息
Attention-based methods and Connectionist Temporal Classification (CTC) network have been promising research directions for end-to-end (E2E) Automatic Speech Recognition (ASR). The joint CTC/Attention model has achieved great success by utilizing both architectures during multi-task training and joint decoding. In this article, we present a multi-stream framework based on joint CTC/Attention E2E ASR with parallel streams represented by separate encoders aiming to capture diverse information. On top of the regular attention networks, the Hierarchical Attention Network (HAN) is introduced to steer the decoder toward the most informative encoders. A separate CTC network is assigned to each stream to force monotonic alignments. Two representative framework have been proposed and discussed, which are Multi-encoder Multi-Resolution (MEM-Res) framework and Multi-encoder Multi-Array (MEM-Array) framework, respectively. In MEM-Res framework, two heterogeneous encoders with different architectures, temporal resolutions and separate CTC networks work in parallel to extract complementary information from same acoustics. Experiments are conducted on Wall Street Journal (WSJ) and CHiME-4, resulting in relative Word Error Rate (WER) reduction of 18.0-32.1% and the best WER of 3.6% in the WSJ eval92 test set. The MEM-Array framework aims at improving the far-field ASR robustness using multiple microphone arrays which are activated by separate encoders. Compared with the best single-array results, the proposed framework has achieved relative WER reduction of 3.7% and 9.7% in AMI and DIRHA multi-array corpora, respectively, which also outperforms conventional fusion strategies.
Accurate segmentation of uterus, uterine fibroids, and spine from MR images is crucial for high intensity focused ultrasound (HIFU) therapy but remains still difficult to achieve because of 1) the large shape and size...
详细信息
Accurate segmentation of uterus, uterine fibroids, and spine from MR images is crucial for high intensity focused ultrasound (HIFU) therapy but remains still difficult to achieve because of 1) the large shape and size variations among individuals, 2) the low contrast between adjacent organs and tissues, and 3) the unknown number of uterine fibroids. To tackle this problem, in this paper, we propose a large kernel encoder-decoder Network based on a 2D segmentation model. The use of this large kernel can capturemulti-scale contexts by enlarging the valid receptive field. In addition, a deep multiple atrous convolution block is also employed to enlarge the receptive field and extract denser feature maps. Our approach is compared to both conventional and other deep learning methods and the experimental results conducted on a large dataset show its effectiveness.
Semantic segmentation of 3D point clouds is a crucial task in scene understanding and is also fundamental to indoor scene applications such as indoor navigation, mobile robotics, augmented reality. Recently, deep lear...
详细信息
Semantic segmentation of 3D point clouds is a crucial task in scene understanding and is also fundamental to indoor scene applications such as indoor navigation, mobile robotics, augmented reality. Recently, deep learning frameworks have been successfully adopted to point clouds but are limited by the size of data. While most existing works focus on individual sampling points, we use surface patches as a more efficient representation and propose a novel indoor scene segmentation framework called patch graph convolution network (PGCNet). This framework treats patches as input graph nodes and subsequently aggregates neighboring node features by dynamic graph U-Net (DGU) module, which consists of dynamic edge convolution operation inside U-shaped encoder-decoder architecture. The DGU module dynamically update graph structures at each level to encode hierarchical edge features. Incorporating PGCNet, we can segment the input scene into two types, i.e., room layout and indoor objects, which is afterward utilized to carry out final rich semantic labeling of various indoor scenes. With considerable speedup training, the proposed framework achieves effective performance equivalent to state-of-the-art for segmenting standard indoor scene dataset.
暂无评论