The paper classification method aims to correctly divide the paper data according to the similarity of its content. However, how to accurately classify according to the content expressed in the paper has always been a...
详细信息
The paper classification method aims to correctly divide the paper data according to the similarity of its content. However, how to accurately classify according to the content expressed in the paper has always been a problem that various classification algorithms need to face. At present, there is a kind of paper classification method based on deep learning and implemented by the encoder-decoder structure. This method inputs the words from a large number of papers into encoder, after calculating by NN (neural network) algorithm, the similarity degree of different papers is compared to achieve the purpose of classification. However, this type of method only considers the similarity between words, a NN algorithm can only calculate a large number of word information once, and it cannot find the regularity of classification through word information. But it has a difference with the similarity of the content. This paper starts from the perspective of considering the content, its label information is extracted, and the input vector of encoder-decoder structure is formed with labels and words. This improves the original paper classification method based on encoder-decoder structure. Firstly, the label information is based on the content, which can reflect the content of the paper. Secondly, the classification method which combines label information and word information can reflect the content of the paper comprehensively. Thirdly, the label information is independent of word information and NN algorithm is used separately to make this part of the content more consistent in the encoder-decoder structure. Finally, the label information and the word information are combined, respectively, with the output values obtained by different NN algorithms to realize the classification of the content. This paper proves the effectiveness of the proposed method by evaluating the paper data in web of science and obtaining relevant experimental results.
Automatic segmentation of prostate magnetic resonance (MR) images has great significance for the diagnosis and clinical application of prostate diseases. It faces enormous challenges because of the low contrast of the...
详细信息
Automatic segmentation of prostate magnetic resonance (MR) images has great significance for the diagnosis and clinical application of prostate diseases. It faces enormous challenges because of the low contrast of the tissue boundary and the small effective area of the prostate MR images. In order to solve these problems, we propose a novel end-to-end professional network which consists of an encoder-decoder structure with dense dilated spatial pyramid pooling (DDSPP) for prostate segmentation based on deep learning. First, the DDSPP module is used to extract the multi-scale convolution features in the prostate MR images, and then the decoder is used to capture the clear boundary of prostate. Competitive results are produced over state of the art on 130 MR images which key metrics Dice similarity coefficient (DSC) and Hausdorff distance (HD) are 0.954 and 1.752 mm respectively. Experimental results show that our method has high accuracy and robustness.
The prediction of time series data applied to the energy sector (prediction of renewable energy production, forecasting prosumers' consumption/generation, forecast of country-level consumption, etc.) has numerous ...
详细信息
The prediction of time series data applied to the energy sector (prediction of renewable energy production, forecasting prosumers' consumption/generation, forecast of country-level consumption, etc.) has numerous useful applications. Nevertheless, the complexity and non-linear behaviour associated with such kind of energy systems hinder the development of accurate algorithms. In such a context, this paper investigates the use of a state-of-art deep learning architecture in order to perform precise load demand forecasting 24-h-ahead in the whole country of France using RTE data. To this end, the authors propose an encoder-decoder architecture inspired by WaveNet, a deep generative model initially designed by Google DeepMind for raw audio waveforms. WaveNet uses dilated causal convolutions and skip-connection to utilise long-term information. This kind of novel ML architecture presents different advantages regarding other statistical algorithms. On the one hand, the proposed deep learning model's training process can be parallelized in GPUs, which is an advantage in terms of training times compared to recurrent networks. On the other hand, the model prevents degradations problems (explosions and vanishing gradients) due to the residual connections. In addition, this model can learn from an input sequence to produce a forecast sequence in a one-shot manner. For comparison purposes, a comparative analysis between the most performing state-of-art deep learning models and traditional statistical approaches is presented: Autoregressive-Integrated Moving Average (ARIMA), Long-Short-Term-Memory, Gated-Recurrent-Unit (GRU), Multi-Layer Perceptron (MLP), causal 1D-Convolutional Neural Networks (1D-CNN) and ConvLSTM (encoder-decoder). The values of the evaluation indicators reveal that WaveNet exhibits superior performance in both forecasting accuracy and robustness.
Aiming to the challenge of poor pixel-consistency in inter-category and pixel-similarity in inter-category, in this paper, we propose an encoder-decoder network for image semantic segmentation using pooling SE-ResNet ...
详细信息
Aiming to the challenge of poor pixel-consistency in inter-category and pixel-similarity in inter-category, in this paper, we propose an encoder-decoder network for image semantic segmentation using pooling SE-ResNet attention module, called PAEDN. It is an effective of attention mechanism to get aggregated information. According to the principle of SE-ResNet, a collection of Average, Maximum and Stochastic global pooling, which concentrate on contoured, detailed, and generalized information in a certain semantic segmentation, form attention modules. Channel Pooling Attention Module (CPAM) and Position Pooling Attention Module (PPAM) are designed and integrated into the encoder to extract discriminative features from input images, and the decoder is developed through SE-ResNet attention module to fuse the feature map in high-resolution with that in low-resolution. Experimental evaluations performed on the data sets PASCAL and Cityscapes, show the proposed encoder-decoder with pooling attention module produces good pixel-consistency semantic label, achieves 15.1% improvement to FCN.
This late-breaking report presents a method for learning sequential and temporal mapping between music and dance via the Sequence-to-Sequence (Seq2Seq) architecture. In this study, the Seq2Seq model comprises two part...
详细信息
ISBN:
(纸本)9781450370578
This late-breaking report presents a method for learning sequential and temporal mapping between music and dance via the Sequence-to-Sequence (Seq2Seq) architecture. In this study, the Seq2Seq model comprises two parts: the encoder for processing the music inputs and the decoder for generating the output motion vectors. This model has the ability to accept music features and motion inputs from the user for human-robot interactive learning sessions, which outputs the motion patterns that teach the corrective movements to follow the moves from the expert dancer. Three different types of Seq2Seq models are compared in the results and applied to a simulation platform. This model will be applied in social interaction scenarios with children with autism spectrum disorder (ASD).
This paper proposes a deep convolutional neural network with a concise and effective encoder-decoder architec-ture for saliency prediction. Local and global contextual features make a considerable contribution to sali...
详细信息
This paper proposes a deep convolutional neural network with a concise and effective encoder-decoder architec-ture for saliency prediction. Local and global contextual features make a considerable contribution to saliency prediction. In order to integrate and exploit these features more thoroughly, in the proposed pithy architecture, we deploy a dense and global context connection structure between the encoder and decoder, after that, a multi-scale readout module is designed to process various information from the previous portion of the decoder with different parallel mapping relationships for full-scale accurate results. Our model ranks first in light of multiple metrics on two famous saliency benchmarks and performs good generalization on other datasets. Besides, we evaluate the precision and the speed of our model with different backbones. The saliency prediction performance of VGGNet-Based, ResNet-based, and DenseNet-based model gradually increases while the speed also drops off. And the experiments illustrate that our model performs better than other models even if replacing the backbone of our model with the same backbone of the compared model. Therefore, we can provide optional versions of our model for different requirements of performance and efficiency. (c) 2021 Elsevier B.V. All rights reserved.
Haze removal is an essential requirement in autonomous vehicle applications for identifying different objects on the road. Most of the available techniques are based on different constraints/ priors. The important par...
详细信息
Haze removal is an essential requirement in autonomous vehicle applications for identifying different objects on the road. Most of the available techniques are based on different constraints/ priors. The important parameters required for recovering the ground truth from hazy image are transmission map and air light. In this paper, we proposed a learning-based encoder-decoder deep learning architecture for transmission map estimation. Based on the assumption that at least twenty percent of the outdoor image includes with sky region and hence airlight is calculated as average of the twenty percent brightest pixels of the image. These two parameters namely transmission map and airlight were applied in atmospheric scattering model for ground truth image recovery. In encoder-decoder architecture, Max pooling layer, dropout layer was used for feature learning and efficient generalization respectively. The proposed architecture was trained on different datasets like NYU Depth data set, FRIDA and RESIDE Dataset for better generalization on unseen data. Experimental results shows that the proposed method has shown better performance compared to the existing state of the art methods.
Multistep Human Density Prediction (MHDP) is an emerging challenge in urban mobility with lots of applications in several domains such as Smart Cities, Edge Computing and Epidemiology Modeling. The basic goal is to es...
详细信息
Multistep Human Density Prediction (MHDP) is an emerging challenge in urban mobility with lots of applications in several domains such as Smart Cities, Edge Computing and Epidemiology Modeling. The basic goal is to estimate the density of people gathered in a set of urban Regions of Interests (ROIs) or Points of Interests (POIs) in a forecast horizon of different granularities. Accordingly, this paper aims to contribute and go beyond the existing literature on human density prediction by proposing an innovative time series Deep Learning (DL) model and a geospatial feature preprocessing technique. Specifically, our research aim is to develop a highly-accurate MHDP model leveraging jointly the temporal and spatial components of mobility data. In the beginning, we compare 29 baseline and state-of-the-art methods grouped into six categories and we find that the statistical time series and Deep Learning encoders-decoders (ED) that we propose are highly accurate outperforming the other models based on a real and a synthetic mobility dataset. Our model achieves an average of 28.88 Mean Absolute Error (MAE) and 87.58 Root Mean Squared Error (RMSE) with 200,000 pedestrians per day distributed in multiple regions of interest in a 30 minutes time-window at different granularities. In addition, the geospatial feature transformation increases 4% further the RMSE of the proposed model compared to the state of the art solutions. Hence, this work provides an efficient and at the same time general applicable MHDP model that can benefit the planning and decision-making of many major urban mobility applications.
End-to-end speaker diarization for an unknown number of speakers is addressed in this paper. Recently proposed end-to-end speaker diarization outperformed conventional clustering-based speaker diarization, but it has ...
详细信息
ISBN:
(纸本)9781713820697
End-to-end speaker diarization for an unknown number of speakers is addressed in this paper. Recently proposed end-to-end speaker diarization outperformed conventional clustering-based speaker diarization, but it has one drawback: it is less flexible in terms of the number of speakers. This paper proposes a method for encoder-decoder based attractor calculation (EDA), which first generates a flexible number of attractors from a speech embedding sequence. Then, the generated multiple attractors are multiplied by the speech embedding sequence to produce the same number of speaker activities. The speech embedding sequence is extracted using the conventional self-attentive end-to-end neural speaker diarization (SA-EEND) network. In a two-speaker condition, our method achieved a 2.69% diarization error rate (DER) on simulated mixtures and a 8.07% DER on the two-speaker subset of CALLHOME, while vanilla SA-EEND attained 4.56% and 9.54 %, respectively. In unknown numbers of speakers conditions, our method attained a 15.29% DER on CALLHOME, while the x-vector-based clustering method achieved a 19.43% DER.
暂无评论