This paper describes a novel end-to-end automatic speech recognition (ASR) method that takes into consideration long-range sequential context information beyond utterance boundaries. In spontaneous ASR tasks such as t...
详细信息
ISBN:
(纸本)9781479981311
This paper describes a novel end-to-end automatic speech recognition (ASR) method that takes into consideration long-range sequential context information beyond utterance boundaries. In spontaneous ASR tasks such as those for discourses and conversations, the input speech often comprises a series of utterances. Accordingly, the relationships between the utterances should be leveraged for transcribing the individual utterances. While most previous end to -end ASR methods only focus on utterance-level ASR that handles single utterances independently, the proposed method (which we call "large-context end-to-end ASR") can explicitly utilize relationships between a current target utterance and all preceding utterances. The method is modeled by combining an attention-based encoderdecoder model, which is one of the most representative end-to-end ASR models, with hierarchical recurrent encoder-decoder models, which are effective language models for capturing long-range sequential contexts beyond the utterance boundaries. Experiments on Japanese discourse speech tasks demonstrate the proposed method yields significant ASR performance improvements compared with the conventional utterance-level end-to-end ASR system.
Product recommender systems and customer profiling techniques have always been a priority in online retail. Recent machine learning research advances and also wide availability of massive parallel numerical computing ...
详细信息
ISBN:
(纸本)9781728123318
Product recommender systems and customer profiling techniques have always been a priority in online retail. Recent machine learning research advances and also wide availability of massive parallel numerical computing has enabled various approaches and directions of recommender systems advancement. Worth to mention is the fact that in past years multiple traditional "offline" retail business are gearing more and more towards employing inferential and even predictive analytics both to stock-related problems such as predictive replenishment but also to enrich customer interaction experience. One of the most important areas of recommender systems research and development is that of Deep Learning based models which employ representational learning to model consumer behavioral patterns. Current state of the art in Deep Learning based recommender systems uses multiple approaches ranging from already classical methods such as the ones based on learning product representation vector, to recurrent analysis of customer transactional time-series and up to generative models hosed on adversarial training. Each of these methods has multiple advantages and inherent weaknesses such as inability of understanding the actual user-journey, ability to propose only single product recommendation or top-k product recommendations without prediction of actual next-best-offer. In our work we will present a new and innovative architectural approach of applying state-of-the-art hierarchical multi-module encoder-decoder architecture in order to solve several of current state-of-the-art recommender systems issues. Our approach will also produce by-products such as product need-based segmentation and customer behavioral segmentation all in an end-to-end trainable approach.
In this paper, we integrate fully neural network based conversation-context language models (CCLMs) that are suitable for handling multi-turn conversational automatic speech recognition (ASR) tasks, with multiple neur...
详细信息
In this paper, we integrate fully neural network based conversation-context language models (CCLMs) that are suitable for handling multi-turn conversational automatic speech recognition (ASR) tasks, with multiple neural spoken language understanding (SLU) models. A main strength of CCLMs is their capacity to take long-range interactive contexts beyond utterance boundaries into consideration. However, it is hard to optimize the CCLMs so as to fully exploit the long-range interactive contexts because conversation-level training datasets are often limited. In order to mitigate this problem, our key idea is to introduce various SLU models that are developed for spoken dialogue systems into the CCLMs. In our proposed method (which we call "SLU-assisted CCLM"), hierarchical recurrent encoder-decoder based language modeling is extended so as to handle various utterance-level SLU results of preceding utterances in a continuous space. We expect that the SLU models will help the CCLMs to properly understand semantic meanings of long-range interactive contexts and to fully leverage them for estimating a next utterance. Our experiments on contact center dialogue ASR tasks demonstrate that SLU-assisted CCLMs combined with three types of SLU models can yield ASR performance improvements.
暂无评论