Recently, image caption which aims to generate a textual description for an image automatically has attracted researchers from various fields. Encouraging performance has been achieved by applying deep neural networks...
详细信息
ISBN:
(纸本)9783319690056;9783319690049
Recently, image caption which aims to generate a textual description for an image automatically has attracted researchers from various fields. Encouraging performance has been achieved by applying deep neural networks. Most of these works aim at generating a single caption which may be incomprehensive, especially for complex images. This paper proposes a topic-specific multi-caption generator, which infer topics from image first and then generate a variety of topic-specific captions, each of which depicts the image from a particular topic. We perform experiments on flickr8k, flickr30k and MSCOCO. The results show that the proposed model performs better than single-caption generator when generating topic-specific captions. The proposed model effectively generates diversity of captions under reasonable topics and they differ from each other in topic level.
This paper presents a character-level encoder-decoder modeling method for question answering (QA) from large-scale knowledge bases (KB). This method improves the existing approach [9] from three aspects. First, long s...
详细信息
ISBN:
(纸本)9783319690056;9783319690049
This paper presents a character-level encoder-decoder modeling method for question answering (QA) from large-scale knowledge bases (KB). This method improves the existing approach [9] from three aspects. First, long short-term memory (LSTM) structures are adopted to replace the convolutional neural networks (CNN) for encoding the candidate entities and predicates. Second, a new strategy of generating negative samples for model training is adopted. Third, a data augmentation strategy is applied to increase the size of the training set by generating factoid questions using another trained encoder-decoder model. Experimental results on the SimpleQuestions dataset and the Freebase5M KB demonstrates the effectiveness of the proposed method, which improves the state-of-the-art accuracy from 70.3% to 78.8% when augmenting the training set with 70,000 generated triple-question pairs.
Recently end-to-end speech recognition has obtained much attention. One of the popular models to achieve end-to-end speech recognition is attention based encoder-decoder model, which usually generating output sequence...
详细信息
ISBN:
(纸本)9781510848764
Recently end-to-end speech recognition has obtained much attention. One of the popular models to achieve end-to-end speech recognition is attention based encoder-decoder model, which usually generating output sequences iteratively by attending the whole representations of the input sequences. However. predicting outputs until receiving the whole input sequence is not practical for online or low time latency speech recognition. In this paper, we present a simple but effective attention mechanism which can make the encoder-decoder model generate outputs without attending the entire input sequence and can apply to online speech recognition. At each prediction step, the attention is assumed to be a time-moving gaussian window with variable size and can be predicted by using previous input and output information instead of the content based computation on the whole input sequence. To further improve the online performance of the model, we employ deep convolutional neural networks as encoder. Experiments show that the gaussian prediction based attention works well and under the help of deep convolutional neural networks the online model achieves 19.5% phoneme error rate in TIMIT ASR task.
Using sentiment analysis methods to retrieve useful information from the accumulated documents in the Internet has become an important research subject. In this paper, we proposed a semi-supervised framework, which us...
详细信息
ISBN:
(纸本)9783319686127;9783319686110
Using sentiment analysis methods to retrieve useful information from the accumulated documents in the Internet has become an important research subject. In this paper, we proposed a semi-supervised framework, which uses the unlabeled data to promote the learning ability of the long short memory (LSTM) network. It is composed of an unsupervised attention aware long short term memory (LSTM) encoder-decoder and a single LSTM model used for feature extraction and classification. Experimental study on commonly used datasets has demonstrated our framework's good potential for sentiment classification tasks. And it has shown that the unsupervised learning part can improve the LSTM network's learning ability.
Recently, encoder-decoder, a framework for sequence-to-sequence (seq2seq) tasks has been widely used in the open domain generation-based conversation system. One of the most difficult challenges in encoder-decoder bas...
详细信息
ISBN:
(纸本)9783319700960;9783319700953
Recently, encoder-decoder, a framework for sequence-to-sequence (seq2seq) tasks has been widely used in the open domain generation-based conversation system. One of the most difficult challenges in encoder-decoder based open domain conversation systems is the Unknown Words Issue, that is, numerous words become out-of-vocabulary words (OOVs) due to the restriction of vocabulary's volume, while a conversation system always tries to avoid their appearances. This paper proposes a novel approach named Low Frequency Words Compression (LFWC) to address this problem by selectively using K-Components shared symbol for word representations of low frequency words. Compared to the standard encoder-decoder works at word-level, our LFWC encoder-decoder works at symbol-level, and we propose Sequence Transform to transform a word-level sequence into a symbol-level sequence and LFWC-Predictor to decode from a symbol-level sequence into a word-level sequence. To measure the interference of OOVs in neural conversation system, besides log-perplexity (LP), we apply two more suitable metrics UP-LP and UP-Delta to evaluate the interference of OOVs. The experiment shows that the performance of decoding from compressed symbol-level sequences to word-level sequences achieves a recall@1 score of 60.9%, which is much above 16.7% of baseline, with the strongest compression ratio. It also shows our approach outperforms the standard encoder-decoder model in reducing interference of OOVs, which achieves almost the half score of UP-Delta in the most of configurations.
Exploiting multimodal features has become a standard approach towards many video applications, including the video captioning task. One problem with the existing work is that it models the relevance of each type of fe...
详细信息
ISBN:
(纸本)9781450349062
Exploiting multimodal features has become a standard approach towards many video applications, including the video captioning task. One problem with the existing work is that it models the relevance of each type of features evenly, which neutralizes the impact of each individual modality to the word to be generated. In this paper, we propose a novel Modal Attention Network (MANet) to account for this issue. Our MANet extends the standard encoder-decoder network by adapting the attention mechanism to video modalities. As a result, MANet emphasizes the impact of each modality with respect to the word to be generated. Experimental results show that our MANet effectively utilizes multimodal features to generate better video descriptions. Especially, our MANet system was ranked among the top three systems at the 2nd Video to Language Challenge in both automatic metrics and human evaluations.
This paper presents a character-level encoder-decoder mod-eling method for question answering(QA)from large-scale knowledge bases(KB).This method improves the existing approach [9] from three ***,long short-term memor...
详细信息
ISBN:
(纸本)9783319690049
This paper presents a character-level encoder-decoder mod-eling method for question answering(QA)from large-scale knowledge bases(KB).This method improves the existing approach [9] from three ***,long short-term memory(LSTM)structures are adopted to replace the convolutional neural networks(CNN)for encoding the can-didate entities and ***,a new strategy of generating neg-ative samples for model training is ***,a data augmentation strategy is applied to increase the size of the training set by generating factoid questions using another trained encoder-decoder ***-mental results on the SimpleQuestions dataset and the Freebase5M KB demonstrates the effectiveness of the proposed method,which improves the state-of-the-art accuracy from 70.3%to 78.8%when augmenting the training set with 70,000 generated triple-question pairs.
暂无评论