In this paper, an encoder-decoder long short-term memory network-based anomaly detector (denoted as EDLAD) is proposed for hyperspectral images. The proposed EDLAD aims to simultaneously alleviate anomaly contaminatio...
详细信息
ISBN:
(纸本)9781665403696
In this paper, an encoder-decoder long short-term memory network-based anomaly detector (denoted as EDLAD) is proposed for hyperspectral images. The proposed EDLAD aims to simultaneously alleviate anomaly contamination and build a stable background component for anomaly detection. To reduce anomaly contamination, the EDLAD first utilizes a well-designed encoder-decoder LSTM to reconstruct the hyperspectral image. Based on the concept that the anomaly pixels occupy an extremely small fraction of the image, the well-designed encoder-decoder LSTM network tends to maintain the background and alleviate anomaly during the reconstruction process since the whole image is employed for training the network. Then the dimension reduction is used to further alleviate the anomaly contamination and build a stable background component. Finally, the EDLAD applies the Mahalanobis distance differences to detect the probable anomalies. The experiments on two benchmark hyperspectral images demonstrate the superiority of the EDLAD in anomaly detection.
Single image dehazing is a challenging problem in computer vision due to it is highly ill-posed. Although recent research has made great progress, the dehazed images produced by existing models still have residual haz...
详细信息
ISBN:
(纸本)9781665449892
Single image dehazing is a challenging problem in computer vision due to it is highly ill-posed. Although recent research has made great progress, the dehazed images produced by existing models still have residual haze and lose too much detail information. To solve the above problems, we propose an end-to-end Attention-based encoder-decoder Network (AEDNet) which is capable to effectively remove haze while preserving image details well. AEDNet employs a novel channel shuffle attention mechanism to adaptively adjust the weight of each channel-wise feature. This attention mechanism is integrated in residual block which is the core feature extraction module of encoder-decoder. Extensive experiments on synthetic datasets and real-hazy images demonstrate that our AEDNet achieves better performance compared with previous state-of-the-art methods.
Multi-modal neural machine translation (MNMT) aims to integrate visual and textual information to translate source sentences into target and attracts a lot of attentions. Existing methods contribute a lot for capturin...
详细信息
Multi-modal neural machine translation (MNMT) aims to integrate visual and textual information to translate source sentences into target and attracts a lot of attentions. Existing methods contribute a lot for capturing interactions between visual and textual features to improve the performance of neural machine translation (NMT). However, most of them don't consider the multi-modal consistency for MNMT. In fact, the image provides the semantic global consistency between different languages. We believe that adding bilingual-visual agreement into the encoder and decoder simultaneously can obtain bilingual representations and is useful for NMT. In this paper, we propose to simultaneously integrate visual information in encoder and decoder to learn the interactions between visual and textual features, in this paper called the model VMNMT. As the visual information provide global context, the encoder and decoder can learn the bilingual representations. Besides, we introduce a new bilingual-visual agreement decoder to learn to better representations of corresponding imagesentence pairs. In the experiment, the improvement was 2.02 BLEU on the English-German16 dataset and 1.9 BLEU on the English-German17 dataset. The results show that our method can outperform baselines on several widely-used datasets in terms of various metrics.
In the present work, a novel Convolutional LSTM encoder-decoder structure for the implementation of Weather Forecast for the Andean city of Quito is presented. Aside from the above, the encoder-decoder structure uses ...
详细信息
ISBN:
(纸本)9781728188645
In the present work, a novel Convolutional LSTM encoder-decoder structure for the implementation of Weather Forecast for the Andean city of Quito is presented. Aside from the above, the encoder-decoder structure uses a Walk-Forward validation, an adjustment of the Bayesian posterior predictive distribution and the ADAMW optimizer to carry out the forecast. The aforementioned stages are combined to obtain 4 error metrics per hour. The prediction is done in base of acquired data from a network of Automatic Weather Stations. The results show that the Convolutional encoder-decoder structure with a dropout probability of 0.05 and a model precision equal to 0.1 performs better than a LSTM model, LSTM Stacked model or ARIMA models reaching a maximum error of 1.03 degrees C. Finally, the methodology could be applied as an effective option to implement the post-processing stage for the physical model of a Weather Forecast System.
encoder-decoder framework with attention mechanism has become a mainstream solution to handwritten mathematical expression recognition (HMER) since "watch, attend and parse (WAP)" approach was proposed in 20...
详细信息
ISBN:
(纸本)9783030863319
encoder-decoder framework with attention mechanism has become a mainstream solution to handwritten mathematical expression recognition (HMER) since "watch, attend and parse (WAP)" approach was proposed in 2017, where a convolutional neural network is used as encoder and a gated recurrent unit with attention is used in decoder. Inspired by the recent success of Transformer in many applications, in this paper, we adopt the design of multi-head attention and stacked decoder in Transformer to improve the decoder part of the WAP framework for HMER. Experimental results on CROHME tasks show that multi-head attention can boost the expression recognition rate (ExpRate) of WAP from 54.32%/58.05% to 56.76%/59.72% and stacked decoder can further improve ExpRate to 57.72%/61.38% on CROHME 2016/2019 test sets.
Automated document analysis and parsing has been the focus of research since a long time. An important component of document parsing revolves around understanding tabular regions with respect to their structure identi...
详细信息
ISBN:
(纸本)9783030861599;9783030861582
Automated document analysis and parsing has been the focus of research since a long time. An important component of document parsing revolves around understanding tabular regions with respect to their structure identification, followed by precise information extraction. While substantial effort has gone into table detection and information extraction from documents, table structure recognition remains to be a long-standing task demanding dedicated attention. The identification of the table structure enables extraction of structured information from tabular regions which can then be utilized for further applications. To this effect, this research proposes a novel table structure recognition pipeline consisting of row identification and column identification modules. The column identification module utilizes a novel Column Detector encoder-decoder model (termed as CoDec encoderdecoder) which is trained via a novel loss function for predicting the column mask for a given input image. Experiments have been performed to analyze the different components of the proposed pipeline, thus supporting their inclusion for enhanced performance. The proposed pipeline has been evaluated on the challenging ICDAR 2013 table structure recognition dataset, where it demonstrates state-of-the-art performance.
Salient instance segmentation refers to segmenting noticeable instance objects in images. In the face of multi-scale salient instances and overlapping instances, the existing salient instance segmentation methods have...
详细信息
Salient instance segmentation refers to segmenting noticeable instance objects in images. In the face of multi-scale salient instances and overlapping instances, the existing salient instance segmentation methods have great limitations including inaccurate detection of large-scale instances, missing detection of small-scale instances, and wrong segmentation of overlapping in-stances. In order to solve these problems, a new multi-scale salient instance segmentation network (MSISNet) based on encoder-decoder is proposed. Firstly, we design a receptive field encoder (RFE), which adopts the serial dilated convolution instead of parallel dilated convolution and utilizes some common tricks to achieve better precision and speed. RFE can alleviate the problems of inaccurate detection of large-scale instances, missing detection of small-scale instances, and especially wrong segmentation of overlapping instances. Then, a pyramid decoder (PD) for the detection branch is designed to further alleviate the problem of inaccurate detection of large-scale instances and the difficulty in locating small-scale instances. Finally, a multi-stage decoder (MSD) is designed to improve the quality of the segmentation mask. In order to sufficiently evaluate the generalizability of our method, experiments are conducted not only on Salient Instance Segmentation-1K (SIS-1K) dataset, but also on Salient Objects in Clutter (SOC) dataset. The results show that the proposed method MSISNet is superior to the existing salient instance segmentation methods on mAP0:5 and some recently proposed non-salient instance segmentation methods.
Accurate and reliable multi-step-ahead flood forecasting is beneficial for reservoir operation and water resources management. The encoder-decoder (ED) that can tackle sequence-to-sequence problems is suitable for mul...
详细信息
Accurate and reliable multi-step-ahead flood forecasting is beneficial for reservoir operation and water resources management. The encoder-decoder (ED) that can tackle sequence-to-sequence problems is suitable for multistep-ahead flood forecasting. This study proposes a novel ED with an exogenous input (EDE) structure for multi-step-ahead flood forecasting. The exogenous input can be the outputs of process-based hydrological models. This study constructs four multi-step-ahead flood forecasting approaches, including the Xinanjiang (XAJ) hydrological model, the single-output long short-term memory (LSTM) neural network with recursive strategies, the recursive ED combined with the LSTM neural network (LSTM-RED), and the LSTM-EDE models. The performance of these four models is evaluated and compared by the long-term 3 h hydrologic data series of the Lushui and Jianxi basins in China. The results show that the LSTM-RED model that integrates recursive strategies into the training process of neural networks is more advantageous than the LSTM model. The proposed LSTMEDE model can overcome the exposure bias problem, simplify its model structure, increase the computational efficiency in the validation process, and improve the multi-step-ahead flood forecasting accuracy, as compared to the LSTM-RED model.
In this paper, we propose a novel method for visible and infrared image fusion by decomposing feature information, which is termed as CUFD. It adopts two pairs of encoder-decoder networks to implement feature map extr...
详细信息
In this paper, we propose a novel method for visible and infrared image fusion by decomposing feature information, which is termed as CUFD. It adopts two pairs of encoder-decoder networks to implement feature map extraction and decomposition, respectively. On the one hand, the shallow features of the image contain abundant information while the deep features focus more on extracting the thermal targets. Thus, we use an encoder-decoder network to extract both shallow and deep features. Unlike existing methods, both of the shallow and deep features are used for fusion and reconstruction with different emphases. On the other hand, the infrared and visible features of the same layer have both similarities and differences. Therefore, we train the other encoder-decoder network to decompose the feature maps into common and unique information based on their similarities and differences. After that, we apply different fusion rules according to the flexible requirements. This operation is more beneficial to retain the significant feature information in the fusion results. Qualitative and quantitative experiments on publicly available TNO and RoadScene datasets demonstrate the superiority of our CUFD over the state-of-the-art.
暂无评论