Underwater images get degraded for a variety of naturally occurring attributes like haze, suspended particles, light scattering and water types. The primary cause of the degradation is underwater light attenuation tha...
详细信息
Underwater images get degraded for a variety of naturally occurring attributes like haze, suspended particles, light scattering and water types. The primary cause of the degradation is underwater light attenuation that varies with wavelength, unlike the uniform attenuation that occurs in-air. In this paper, we propose an end-to-end deep convolutional neural network architecture to restore the underwater images and improve their visual perception. The encoder learns to encode the degraded image to a lower-dimensional feature map, while the decoder learns to restore the image to a degradation-free form. This is achieved due to the utilization of symmetric skip connections between the encoder-decoder blocks for the propagation of feature maps to improve the sharpness of the restored image and prevent the loss of details caused by the convolutions. We exhaustively evaluate the performance of our network both qualitatively and quantitatively on standard datasets, and the effectiveness of our network is demonstrated with existing methods of underwater image restoration and enhancement techniques.
In this paper, we present an effective encoder-decoder design utilizing Flexible Cross Correlation (FCC) code for Spectral Amplitude Coding-Optical Code Division Multiple Access (SAC-OCDMA) systems. The FCC code has a...
详细信息
ISBN:
(纸本)9781467360739;9781467360753
In this paper, we present an effective encoder-decoder design utilizing Flexible Cross Correlation (FCC) code for Spectral Amplitude Coding-Optical Code Division Multiple Access (SAC-OCDMA) systems. The FCC code has advantages, such as flexibility cross-correlation property for any given number of users and weights, as well as effectively reduces the impacts of Multiple-Access Interference (MAI). The proposed FCC SAC-OCDMA encoder-decoder had shown superior performance indicated FCC SAC-OCDMA encoder-decoder offers 100%, 287% and 331% much larger number of active users compared to MDW K=60, MFH K=31 and Hadamard K=29, respectively.
作者:
Zhu, SuYu, KaiShanghai Jiao Tong Univ
Brain Sci & Technol Res Ctr Key Lab Shanghai Educ Commiss Intelligent Interac SpeechLabDept Comp Sci & Engn Shanghai Peoples R China
This paper investigates the framework of encoder-decoder with attention for sequence labelling based spoken language understanding. We introduce Bidirectional Long Short Term Memory - Long Short Term Memory networks (...
详细信息
ISBN:
(纸本)9781509041176
This paper investigates the framework of encoder-decoder with attention for sequence labelling based spoken language understanding. We introduce Bidirectional Long Short Term Memory - Long Short Term Memory networks (BLSTM-LSTM) as the encoder-decoder model to fully utilize the power of deep learning. In the sequence labelling task, the input and output sequences are aligned word by word, while the attention mechanism cannot provide the exact alignment. To address this limitation, we propose a novel focus mechanism for encoder-decoder framework. Experiments on the standard ATIS dataset showed that BLSTM-LSTM with focus mechanism defined the new state-of-the-art by outperforming standard BLSTM and attention based encoder-decoder. Further experiments also show that the proposed model is more robust to speech recognition errors.
Spatial pyramid pooling module or encode-decoder structure are used in deep neural networks for semantic segmentation task. The former networks are able to encode multi-scale contextual information by probing the inco...
详细信息
ISBN:
(数字)9783030012342
ISBN:
(纸本)9783030012342;9783030012335
Spatial pyramid pooling module or encode-decoder structure are used in deep neural networks for semantic segmentation task. The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information. In this work, we propose to combine the advantages from both methods. Specifically, our proposed model, DeepLabv3+, extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries. We further explore the Xception model and apply the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network. We demonstrate the effectiveness of the proposed model on PASCAL VOC 2012 and Cityscapes datasets, achieving the test set performance of 89% and 82.1% without any post-processing. Our paper is accompanied with a publicly available reference implementation of the proposed models in Tensorflow at https://***/tensorflow/models/tree/master/research/deeplab.
The Earth is frequently changed by natural occurrences and human actions that have threatened our environment to a certain extent. Therefore, accurate and timely monitoring of transformations at the surface of the Ear...
详细信息
The Earth is frequently changed by natural occurrences and human actions that have threatened our environment to a certain extent. Therefore, accurate and timely monitoring of transformations at the surface of the Earth is crucial for precisely facing their harmful effects and consequences. This paper aims to perform a change detection (CD) analysis and assessment of the Dakshina Kannada region, being one of the coastal districts of Karnataka, India. The spatial and temporal variations in land use and land cover (LULC) are being monitored and examined from the data received as LULC maps from the National Remote Sensing Agency, Indian Space Research Organization, India. The time-series data from advanced wide-field sensor (AWiFS) Resourcesat2 satellite as LULC maps (1:250k) are analyzed using a deep learning approach with an encoder-decoder architecture with dual-attention modules for the change analysis. The model provides an overall accuracy and meanIOU(intersection over union) of 94.11% and 74.1%. The LULC maps from 2005 to 2018 (13 years) are utilized to decide the variations in the LULC, including urban development, agricultural variations, vegetation dynamics, forest areas, barren land, littoral swamp, and water bodies, current fallow, etc. The multiclass area-wise changes in terms of percentage show a decline in most LULC classes, which raises a point of concern for the environmental safety of the considered area, which is highly exposed to coastal flooding due to increased urbanization.
The particle-based meshfree methods provide an effective means for large deformation simulation of the slope failure. Despite the advances of various efficient meshfree algorithmic developments, the computational effi...
详细信息
The particle-based meshfree methods provide an effective means for large deformation simulation of the slope failure. Despite the advances of various efficient meshfree algorithmic developments, the computational efficiency still limits the application of meshfree methods for practical problems. This study aims at accelerating the meshfree prediction of the slope failure through introducing an encoder-decoder model, which is particularly enhanced by the attention-mechanism. The encoder-decoder model is designed to capture the long sequence character of meshfree slope failure analysis. The discretization flexibility of meshfree methods offers an easy match between the meshfree particles and machine learning samples and thus the resulting surrogate model for meshfree slope failure prediction has a quite wide applicability. In the meantime, the embedding of the attention-mechanism into the encoder-decoder neural network not only enables a significant reduction of the number of meshfree model parameters, but also maintains the key features of meshfree simulation and effectively alleviates the information dilution issue. It is shown that the proposed encoder-decoder model with embedded attention mechanism gives a more favorable prediction on the meshfree slope failure simulation in comparison to the general encoder-decoder formalism.
Developing deep learning models for accurate segmentation of biomedical CT images is challenging due to their complex structures, anatomy variations, noise, and unavailability of sufficient labeled data to train the m...
详细信息
Developing deep learning models for accurate segmentation of biomedical CT images is challenging due to their complex structures, anatomy variations, noise, and unavailability of sufficient labeled data to train the models. There are many models in the literature, but the researchers are yet to be satisfied with their performance in analyzing biomedical Computed Tomography (CT) images. In this article, we pioneer a deep quasi-recurrent self-attention structure that works with a dual encoder-decoder. The proposed novel deep quasi-recurrent self-attention architecture evokes parameter reuse capability that offers consistency in learning and quick convergence of the model. Furthermore, the quasi-recurrent structure leverages the features acquired from the previous time points and elevates the segmentation quality. The model also efficiently addresses long-range dependencies through a selective focus on contextual information and hierarchical representation. Moreover, the dynamic and adaptive operation, incremental and efficient information processing of the deep quasi-recurrent self-attention structure leads to improved generalization across different scales and levels of abstraction. Along with the model, we innovate a new training strategy that fits with the proposed deep quasi-recurrent self-attention architecture. The model performance is evaluated on various publicly available CT scan datasets and compared with state-of-the-art models. The result shows that the proposed model outperforms them in segmentation quality and training speed. The model can assist physicians in improving the accuracy of medical diagnoses.
In this paper, we present a new iris ROI segmentation algorithm using a deep convolutional neural network (NN) to achieve the state-of-the-art segmentation performance on well-known iris image data sets. The authors&#...
详细信息
In this paper, we present a new iris ROI segmentation algorithm using a deep convolutional neural network (NN) to achieve the state-of-the-art segmentation performance on well-known iris image data sets. The authors' model surpasses the performance of state-of-the-art Iris DenseNet framework by applying several strategies, including multi-scale/ multi-orientation training, model training from scratch, and proper hyper-parameterisation of crucial parameters. The proposed PixISegNet consists of an autoencoder which primarily uses long and short skip connections and a stacked hourglass network between encoder and decoder. There is a continuous scale up-down in stacked hourglass networks, which helps in extracting features at multiple scales and robustly segments the iris even in an occluded environment. Furthermore, cross-entropy loss and content loss optimise the proposed model. The content loss considers the high-level features, thus operating at a different scale of abstraction, which compliments the cross-entropy loss, which considers pixel-to-pixel classification loss. Additionally, they have checked the robustness of the proposed network by rotating images to certain degrees with a change in the aspect ratio along with blurring and a change in contrast. Experimental results on the various iris characteristics demonstrate the superiority of the proposed method over state-of-the-art iris segmentation methods considered in this study. In order to demonstrate the network generalisation, they deploy a very stringent TOTA (i.e. train-once-test-all) strategy. Their proposed method achieves $E_1$E1 scores of 0.00672, 0.00916 and 0.00117 on UBIRIS-V2, IIT-D and CASIA V3.0 Interval data sets, respectively. Moreover, such a deep convolutional NN for segmentation when included in an end-to-end iris recognition system with a siamese based matching network will augment the performance of the siamese network.
In the field of security, faces are usually blurry, occluded, diverse pose and small in the image captured by an outdoor surveillance camera, which is affected by the external environment such as the camera pose and r...
详细信息
In the field of security, faces are usually blurry, occluded, diverse pose and small in the image captured by an outdoor surveillance camera, which is affected by the external environment such as the camera pose and range, weather conditions, etc. It can be described as a problem of hard face detection in natural images. To solve this problem, we propose a deep convolutional neural network named feature hierarchy encoder-decoder network (FHEDN). It is motivated by two observations from contextual semantic information and the mechanism of multi-scale face detection. The proposed network is a scale-variant style architecture and single stage, which are composed of encoder and decoder subnetworks. Based on the assumption that contextual semantic information around face being auxiliary to detect faces, we introduce a residual mechanism to fuse context prior-based information into face feature and formulate the learning chain to train each encoder-decoder pair. In addition, we discuss some important factors in implement details such as the distribution of training dataset, the scale of feature hierarchy, and anchor box size, etc. They have some impact on the detection performance of the final network. Compared with some state-of-the-art algorithms, our method achieves promising performance on the popular benchmarks including AFW, PASCAL FACE, FDDB, and WIDER FACE. Consequently, the proposed approach can be efficiently implemented and routinely applied to detect faces with severe occlusion and arbitrary pose variations in unconstrained scenes. Our code and results are available on https://***/zzxcoder/EvaluationFHEDN.
Accurate spatiotemporal flood simulations are essential for making informed decisions regarding flood release in affected regions, such as flood detention areas. Traditional spatiotemporal flood simulation approach us...
详细信息
Accurate spatiotemporal flood simulations are essential for making informed decisions regarding flood release in affected regions, such as flood detention areas. Traditional spatiotemporal flood simulation approach uses partial differential equation (PDE) models (or physics-based models), which need high computational time. Although many machine learning (ML) models for inundation are increasingly being used to emulate the PDE models to address this issue, utilizing conventional ML models to achieve large-scale spatiotemporal flood prediction (i.e., simulation output in tens of thousands of grids and time steps over the whole flood event) remains a significant challenge. Therefore, we developed a new inundation model (IM) using encoder-decoder long short-term memory (ED-LSTM) with Time Distributed Spatial output model (ED-LSTM-TDS) that can acquire accurate spatially distributed flood information more rapidly. In the new IM framework, each ED-LSTM-TDS is built to simultaneously generate output for multiple (K) cell grids and multiple ED-LSTM-TDS models are built for prediction at all grids of entire PDE model. This study is the first of its kind to employ the ED-LSTM-TDS method to address spatiotemporal flood inundation simulation problems for flood detention areas. A 1994 km2 flood detention area in northeastern China was used as a case study. ED-LSTM-TDS exhibited better performance in predicting flood characteristics (e.g., water depth, velocity) than alternative methods, including ordinary LSTM, artificial neural network (ANN), and multiple linear regression (MLR). In addition, we investigated the trade-off relationship between the accuracy of flood characteristic prediction and the computation time of the proposed model by considering different numbers of K cell grids in each ED-LSTM-TDS model. The final proposed inundation model could accurately predict the spatiotemporal flood characteristics within 1.5 min to acquire the same information that required approx
暂无评论