Natural language generation (NLG) plays a critical role in various natural language processing (NLP) applications. And the topics provide a powerful tool to understand the natural language. We propose a novel topic-ba...
详细信息
ISBN:
(纸本)9783319919478;9783319919461
Natural language generation (NLG) plays a critical role in various natural language processing (NLP) applications. And the topics provide a powerful tool to understand the natural language. We propose a novel topic-based NLG model which can generate topic coherent sentences given single topic or combination of topics. The model is an extension of the recurrent encoder-decoder framework by introducing a global topic embedding matrix. Experimental results show that our encoder can not only transform a source sentence to a representative topic distribution which can give a better interpretation of the source sentence, but also generate topic coherent and diversified sentences given different topic distribution without any text-level input.
Colorectal cancer is the third most common cancer which causes of cancer-related deaths. Therefore, early diagnosis of polyps by colonoscopy could result in successful treatment. Diagnosis of polyps in colonoscopy vid...
详细信息
ISBN:
(纸本)9781538695555
Colorectal cancer is the third most common cancer which causes of cancer-related deaths. Therefore, early diagnosis of polyps by colonoscopy could result in successful treatment. Diagnosis of polyps in colonoscopy videos is a challenging task due to variations in the size and shape of polyps. In this paper, we propose a polyp segmentation method based on the encoderdecoder network. Performance of the method is enhanced by two strategies, we perform a novel database augmentation method for colonoscopy images in the training phase. Besides, in the test phase, we perform an effective prediction by combining multi model to compare the probability of each image that is produced by the network. Evaluation of the proposed method using the ETIS-LariPolypDB [9] database shows that our proposed method outperforms state-of-the-art results.
Semantic segmentation in high resolution aerial image is faced with a challenge caused by ubiquitous fine-structure objects. Traditional encoder-decoder structure losses some detail information during the process of d...
详细信息
ISBN:
(纸本)9781728140940
Semantic segmentation in high resolution aerial image is faced with a challenge caused by ubiquitous fine-structure objects. Traditional encoder-decoder structure losses some detail information during the process of down-sampling, which is harmful to the location of fine-structure objects. In this work, we present a multi-resolution dense encoder and stack decoder network to deal with this problem. On the one hand, the dense encoder embeds shallow detailed feature into deep semantic feature through proposed information-reserved down-sampling method called CE-Pooling. On the other hand, the stack decoder gradually enhances the detailed feature through iterative attention fusion. Extensive experiments on several benchmark datasets have been conducted, which shows that our method is superior than the state-of-the-art approaches.
Liver lesion segmentation is a difficult yet critical task for medical image analysis. Recently, deep learning based image segmentation methods have achieved promising performance, which can be divided into three cate...
详细信息
ISBN:
(纸本)9781538636411
Liver lesion segmentation is a difficult yet critical task for medical image analysis. Recently, deep learning based image segmentation methods have achieved promising performance, which can be divided into three categories: 2D, 2.51) and 3D, based on the dimensionality of the models. However, 2.51) and 31) methods can have very high complexity and 2D methods may not perform satisfactorily. To obtain competitive performance with low complexity, in this paper, we propose a ***-decoder Network (FED-Net) based 2D segmentation model to tackle the challenging problem of liver lesion segmentation from CT images. Our feature fusion method is based on the attention mechanism, which fuses high-level features carrying semantic information with low-level features having image details. Additionally, to compensate for the information loss during the upsampling process, a dense upsampling convolution and a residual convolutional structure are proposed. We tested our method on the dataset of MICCAI 2017 Liver Tumor Segmentation (LiTS) Challenge and achieved competitive results compared with other state-of-the-art methods.
Real-time streaming speech recognition is required by most applications for a nice interactive experience. To naturally support online recognition, a common strategy used in recently proposed end-to-end models is to i...
详细信息
Real-time streaming speech recognition is required by most applications for a nice interactive experience. To naturally support online recognition, a common strategy used in recently proposed end-to-end models is to introduce a blank label to the label set and instead output alignments. However, generating the alignment means decoding much longer than the length of the linguistic sequence. Besides, there exist several blank labels between two output units in the alignment, which hinders models from learning the adjacent dependency of units in the target sequence. In this work, we propose an innovative encoder-decoder structure, called ECTC-DOCD, for online speech recognition which directly predicts the linguistic sequence without blank labels. Apart from the encoder and decoder structures, ECTC-DOCD contains an additional shrinking layer to drop the redundant acoustic information. This layer serves as a bridge connecting acoustic representation and linguistic modelling parts. Through experiments, we confirm that ECTC-DOCD can obtain better performance than a strong CTC model in online ASR tasks. We also show that ECTC-DOCD can achieve promising results on both Mandarin and English ASR datasets with first and second pass decoding.
作者:
Zhu, SuYu, KaiShanghai Jiao Tong Univ
Brain Sci & Technol Res Ctr Key Lab Shanghai Educ Commiss Intelligent Interac SpeechLabDept Comp Sci & Engn Shanghai Peoples R China
This paper investigates the framework of encoder-decoder with attention for sequence labelling based spoken language understanding. We introduce Bidirectional Long Short Term Memory - Long Short Term Memory networks (...
详细信息
ISBN:
(纸本)9781509041176
This paper investigates the framework of encoder-decoder with attention for sequence labelling based spoken language understanding. We introduce Bidirectional Long Short Term Memory - Long Short Term Memory networks (BLSTM-LSTM) as the encoder-decoder model to fully utilize the power of deep learning. In the sequence labelling task, the input and output sequences are aligned word by word, while the attention mechanism cannot provide the exact alignment. To address this limitation, we propose a novel focus mechanism for encoder-decoder framework. Experiments on the standard ATIS dataset showed that BLSTM-LSTM with focus mechanism defined the new state-of-the-art by outperforming standard BLSTM and attention based encoder-decoder. Further experiments also show that the proposed model is more robust to speech recognition errors.
This research presents the idea of a novel fully-Convolutional Neural Network (CNN)-based model for probabilistic pixel-wise segmentation, titled encoder-decoder-based CNN for Road-Scene Understanding (ECRU). Lately, ...
详细信息
This research presents the idea of a novel fully-Convolutional Neural Network (CNN)-based model for probabilistic pixel-wise segmentation, titled encoder-decoder-based CNN for Road-Scene Understanding (ECRU). Lately, scene understanding has become an evolving research area, and semantic segmentation is the most recent method for visual recognition. Among vision-based smart systems, the driving assistance system turns out to be a much preferred research topic. The proposed model is an encoder-decoder that performs pixel-wise class predictions. The encoder network is composed of a VGG-19 layer model, while the decoder network uses 16 upsampling and deconvolution units. The encoder of the network has a very flexible architecture that can be altered and trained for any size and resolution of images. The decoder network upsamples and maps the low-resolution encoder's features. Consequently, there is a substantial reduction in the trainable parameters, as the network recycles the encoder's pooling indices for pixel-wise classification and segmentation. The proposed model is intended to offer a simplified CNN model with less overhead and higher performance. The network is trained and tested on the famous road scenes dataset CamVid and offers outstanding outcomes in comparison to similar early approaches like FCN and VGG16 in terms of performance vs. trainable parameters.
Community detection or graph clustering is crucial to understanding the structure of complex networks and extracting relevant knowledge from networked data. Latent factor model, e.g., non-negative matrix factorization...
详细信息
ISBN:
(纸本)9781450349185
Community detection or graph clustering is crucial to understanding the structure of complex networks and extracting relevant knowledge from networked data. Latent factor model, e.g., non-negative matrix factorization and mixed membership block model, is one of the most successful methods for community detection. Latent factor models for community detection aim to find a distributed and generally low-dimensional representation, or coding, that captures the structural regularity of network and reflects the community membership of nodes. Existing latent factor models are mainly based on reconstructing a network from the representation of its nodes, namely network decoder, while constraining the representation to have certain desirable properties. These methods, however, lack an encoder that transforms nodes into their representation. Consequently, they fail to give a clear explanation about the meaning of a community and suffer from undesired computational problems. In this paper, we propose a non-negative symmetric encoder-decoder approach for community detection. By explicitly integrating a decoder and an encoder into a unified loss function, the proposed approach achieves better performance over state-of-the-art latent factor models for community detection task. Moreover, different from existing methods that explicitly impose the sparsity constraint on the representation of nodes, the proposed approach implicitly achieves the sparsity of node representation through its symmetric and non-negative properties, making the optimization much easier than competing methods based on sparse matrix factorization.
In this study, we present a novel end-to-end approach based on the encoder-decoder framework with the attention mechanism for online handwritten mathematical expression recognition (OHMER). First, the input two-dimens...
详细信息
ISBN:
(纸本)9781538635865
In this study, we present a novel end-to-end approach based on the encoder-decoder framework with the attention mechanism for online handwritten mathematical expression recognition (OHMER). First, the input two-dimensional ink trajectory information of handwritten expression is encoded via the gated recurrent unit based recurrent neural network (GRU-RNN). Then the decoder is also implemented by the GRU-RNN with a coverage-based attention model. The proposed approach can simultaneously accomplish the symbol recognition and structural analysis to output a character sequence in LaTeX format. Validated on the CROHME 2014 competition task, our approach significantly outperforms the state-of-the-art with an expression recognition accuracy of 52.43% by only using the official training dataset. Furthermore, the alignments between the input trajectories of handwritten expressions and the output LaTeX sequences are visualized by the attention mechanism to show the effectiveness of the proposed method.
End-to-end training of deep learning-based models allows for implicit learning of intermediate representations based on the final task loss. However, the end-to-end approach ignores the useful domain knowledge encoded...
详细信息
ISBN:
(纸本)9781510848764
End-to-end training of deep learning-based models allows for implicit learning of intermediate representations based on the final task loss. However, the end-to-end approach ignores the useful domain knowledge encoded in explicit intermediate-level supervision. We hypothesize that using intermediate representations as auxiliary supervision at lower levels of deep networks may be a good way of combining the advantages of end-to-end training and more traditional pipeline approaches. We present experiments on conversational speech recognition where we use lower-level tasks, such as phoneme recognition, in a multitask training approach with an encoder-decoder model for direct character transcription. We compare multiple types of lower-level tasks and analyze the effects of the auxiliary tasks. Our results on the Switchboard corpus show that this approach improves recognition accuracy over a standard encoder-decoder model on the Eva12000 test set.
暂无评论