Exploiting multimodal features has become a standard approach towards many video applications, including the video captioning task. One problem with the existing work is that it models the relevance of each type of fe...
详细信息
ISBN:
(纸本)9781450349062
Exploiting multimodal features has become a standard approach towards many video applications, including the video captioning task. One problem with the existing work is that it models the relevance of each type of features evenly, which neutralizes the impact of each individual modality to the word to be generated. In this paper, we propose a novel Modal Attention Network (MANet) to account for this issue. Our MANet extends the standard encoder-decoder network by adapting the attention mechanism to video modalities. As a result, MANet emphasizes the impact of each modality with respect to the word to be generated. Experimental results show that our MANet effectively utilizes multimodal features to generate better video descriptions. Especially, our MANet system was ranked among the top three systems at the 2nd Video to Language Challenge in both automatic metrics and human evaluations.
This paper presents a character-level encoder-decoder mod-eling method for question answering(QA)from large-scale knowledge bases(KB).This method improves the existing approach [9] from three ***,long short-term memor...
详细信息
ISBN:
(纸本)9783319690049
This paper presents a character-level encoder-decoder mod-eling method for question answering(QA)from large-scale knowledge bases(KB).This method improves the existing approach [9] from three ***,long short-term memory(LSTM)structures are adopted to replace the convolutional neural networks(CNN)for encoding the can-didate entities and ***,a new strategy of generating neg-ative samples for model training is ***,a data augmentation strategy is applied to increase the size of the training set by generating factoid questions using another trained encoder-decoder ***-mental results on the SimpleQuestions dataset and the Freebase5M KB demonstrates the effectiveness of the proposed method,which improves the state-of-the-art accuracy from 70.3%to 78.8%when augmenting the training set with 70,000 generated triple-question pairs.
In this paper we propose a fully convolutional encoder-decoder framework for image residual transformation tasks. Instead of only using per-pixel loss function, the proposed framework learn end-to-end mapping combined...
详细信息
ISBN:
(纸本)9789811030055;9789811030048
In this paper we propose a fully convolutional encoder-decoder framework for image residual transformation tasks. Instead of only using per-pixel loss function, the proposed framework learn end-to-end mapping combined with perceptual loss function that depend on low-level features from a pre-trained network. Pointing out the mapping function in order to handle noise-free image by introduce identity mapping. And through an analysis of the interplay between the neural networks and the underlying noisy distribution which they seeking to learn. We also show how to construct a uniform transform, which is then used to make a single deep neural network work well across different levels of noise. Comparing with previous approaches, ours achieves better performance. The experimental results indicate the efficiency of the proposed algorithm to cope with image denoising tasks.
We propose an attention-enabled encoder-decoder model for the problem of grapheme-to-phoneme conversion. Most previous work has tackled the problem via joint sequence models that require explicit alignments for traini...
详细信息
ISBN:
(纸本)9781509049035
We propose an attention-enabled encoder-decoder model for the problem of grapheme-to-phoneme conversion. Most previous work has tackled the problem via joint sequence models that require explicit alignments for training. In contrast, the attention-enabled encoder-decoder model allows for jointly learning to align and convert characters to phonemes. We explore different types of attention models, including global and local attention, and our best models achieve state-of-the-art results on three standard data sets (CMU-Dict, Pronlex, and NetTalk).
In this paper, we analyze the performance of various sequence to sequence neural networks on the task of grapheme to phoneme (G2P) conversion. G2P is a very important component in applications like text-to-speech, aut...
详细信息
ISBN:
(纸本)9781509006199
In this paper, we analyze the performance of various sequence to sequence neural networks on the task of grapheme to phoneme (G2P) conversion. G2P is a very important component in applications like text-to-speech, automatic speech recognition etc,. Because the number of graphemes that a word consists of and the corresponding number of phonemes are different, they are first aligned and then mapped. With the recent advent of sequence to sequence neural networks, the alignment step can be skipped allowing us to directly map the input and output sequences. Although the sequence to sequence neural nets have been applied for this task very recently, there are some questions concerning the architecture that need to be addressed. We show in this paper that, complex recurrent neural network units (like long-short term memory cells) may not be required to achieve good performance on this task. Instead simple recurrent neural networks (RNN) will suffice. We also show that the encoder can be a uni-directional RNN as opposed to the usually preferred bi-directional RNN. Further, our experiments reveal that encoder-decoder models with soft-alignment outperforms fixed vector context counterpart. The results demonstrate that with very few parameters we can indeed achieve comparable performance to much more complicated architectures.
We present an advanced dialog state tracking system designed for the 5th Dialog State Tracking Challenge (DSTC5). The main task of DSTC5 is to track the dialog state in a human-human dialog. For each utterance, the tr...
详细信息
ISBN:
(纸本)9781509049035
We present an advanced dialog state tracking system designed for the 5th Dialog State Tracking Challenge (DSTC5). The main task of DSTC5 is to track the dialog state in a human-human dialog. For each utterance, the tracker emits a frame of slot-value pairs considering the full history of the dialog up to the current turn. Our system includes an encoder-decoder architecture with an attention mechanism to map an input word sequence to a set of semantic labels, i.e., slot-value pairs. This handles the problem of the unknown alignment between the utterances and the labels. By combining the attention-based tracker with rule-based trackers elaborated for English and Chinese, the F-score for the development set improved from 0.475 to 0.507 compared to the rule-only trackers. Moreover, we achieved 0.517 F-score by refining the combination strategy based on the topic and slot level performance of each tracker. In this paper, we also validate the efficacy of each technique and report the test set results submitted to the challenge.
Recently,image caption which aims to generate a textual description for an image automatically has attracted researchers from various *** performance has been achieved by applying deep neural *** of these works aim at...
详细信息
ISBN:
(纸本)9783319690049
Recently,image caption which aims to generate a textual description for an image automatically has attracted researchers from various *** performance has been achieved by applying deep neural *** of these works aim at generating a single caption which may be incomprehensive,especially for complex *** paper proposes a topic-specific multi-caption generator,which in-fer topics from image first and then generate a variety of topic-specific captions,each of which depicts the image from a particular *** per-form experiments on flickr8k,flickr30k and *** results show that the proposed model performs better than single-caption generator when generating topic-specific *** proposed model effectively generates diversity of captions under reasonable topics and they differ from each other in topic level.
Many approaches to transform classification problems from non-linear to linear by feature transformation have been recently presented in the literature. These notably include sparse coding methods and deep neural netw...
详细信息
Many approaches to transform classification problems from non-linear to linear by feature transformation have been recently presented in the literature. These notably include sparse coding methods and deep neural networks. However, many of these approaches require the repeated application of a learning process upon the presentation of unseen data input vectors, or else involve the use of large numbers of parameters and hyper-parameters, which must be chosen through cross-validation, thus increasing running time dramatically. In this paper, we propose and experimentally investigate a new approach for the purpose of overcoming limitations of both kinds. The proposed approach makes use of a linear auto-associative network (called SCNN) with just one hidden layer. The combination of this architecture with a specific error function to be minimized enables one to learn a linear encoder computing a sparse code which turns out to be as similar as possible to the sparse coding that one obtains by re-training the neural network. Importantly, the linearity of SCNN and the choice of the error function allow one to achieve reduced running time in the learning phase. The proposed architecture is evaluated on the basis of two standard machine learning tasks. Its performances are compared with those of recently proposed non-linear auto-associative neural networks. The overall results suggest that linear encoders can be profitably used to obtain sparse data representations in the context of machine learning problems, provided that an appropriate error function is used during the learning phase. (c) 2014 Elsevier B.V. All rights reserved.
The memristor device has emerged as the missing fourth fundamental circuit element after resistor, inductor and capacitor. Various implementations of memristors have been reported, with the one using a TiO2 layer sand...
详细信息
ISBN:
(纸本)9781479986415
The memristor device has emerged as the missing fourth fundamental circuit element after resistor, inductor and capacitor. Various implementations of memristors have been reported, with the one using a TiO2 layer sandwiched between two platinum electrodes considered to be most promising. Because of its very small feature sizes and low power consumption, it is projected to replace CMOS technology in several application areas. Various memory and logic design styles using memristors have been reported. A hybrid technology that combines memristors with CMOS gates is promising, and can be fabricated on the same silicon wafer. The present paper proposes the designs of various functional blocks like multiplexers, encoders and decoders using the hybrid memristor structure, with analyses regarding their design complexities. The design methodology is general, and can be used to synthesize arbitrary functional blocks as well.
暂无评论