Recently, several fast speaker adaptation methods based on the so-called speaker codes (SC) have been proposed for the hybrid DNN-HMM speech recognition model [1, 2, 3]. In these methods the target speaker features ar...
详细信息
Recently, several fast speaker adaptation methods based on the so-called speaker codes (SC) have been proposed for the hybrid DNN-HMM speech recognition model [1, 2, 3]. In these methods the target speaker features are modified to match the given speaker-independent models or the speaker-independent models are transformed towards one particular speaker based on the discriminative learning of speaker codes. Previous researches have shown that these proposed SC-based adaptation methods are very effective to adapt large DNN models using only a small amount of adaptation data. In this work, we have explored the combination of direct speaker adaptation technique in model space based on speaker codes (mSA-SC) and bottleneck features where mSA-SC is used as an extraction instrument of speaker adaptive bottleneck features. We have evaluated the proposed speaker adaptive bottleneck features extraction method in two speech recognition tasks, namely PSC Mandarin task and large scale 320-hr Switchboard task. Experimental results have verified that it is quite suitable for very large scale tasks. For example, the Switchboard results have shown that it can achieve relative 9% reduction in word error rate on an unsupervised speaker adaptation scheme.
Survey questionnaires are often heterogeneous because they contain both quantitative (numeric) and qualitative (text) responses, as well as missing values. While traditional, model-based methods are commonly used by c...
详细信息
Survey questionnaires are often heterogeneous because they contain both quantitative (numeric) and qualitative (text) responses, as well as missing values. While traditional, model-based methods are commonly used by clinicians, we deploy Self Organizing Maps (SOM) as a means to visualise the data. In a survey study aiming at understanding the self-care behaviour of 611 patients with Type-1 Diabetes, we show that SOM can be used to (1) identify co-morbidities; (2) to link self-care factors that are dependent on each other; and (3) to visualise individual patient profiles; In evaluation with clinicians and experts in Type-1 Diabetes, the knowledge and insights extracted using SOM correspond well to clinical expectation. Furthermore, the output of SOM in the form of a U-matrix is found to offer an interesting alternative means of visualising patient profiles instead of a usual tabular form.
Statistical machine translation (SMT) performance suffers when models are trained on only small amounts of parallel data. The learned models typically have both low accuracy (incorrect translations and feature scores)...
详细信息
Prior research into learning translations from source and target language monolingual texts has treated the task as an unsupervised learning problem. Although many techniques take advantage of a seed bilingual lexicon...
详细信息
Prior research into learning translations from source and target language monolingual texts has treated the task as an unsupervised learning problem. Although many techniques take advantage of a seed bilingual lexicon...
详细信息
We present the 1.0 release of our paraphrase database, PPDB. Its English portion, PPDB:Eng, contains over 220 million paraphrase pairs, consisting of 73 million phrasal and 8 million lexical paraphrases, as well as 14...
We present the 1.0 release of our paraphrase database, PPDB. Its English portion, PPDB:Eng, contains over 220 million paraphrase pairs, consisting of 73 million phrasal and 8 million lexical paraphrases, as well as 14...
We describe improvements made over the past year to Joshua, an open-source translation system for parsing-based machine translation. The main contributions this past year are significant improvements in both speed and...
详细信息
GMM-UBM-based speaker verification heavily relies on a well trained UBM. In practice, it is not often easy to obtain an UBM that fully matches acoustic channels in operation. To solve this problem, we propose a novel ...
详细信息
How do humans attend to and pick out relevant auditory objects amongst all other sounds in the environment? Based on neurophysiological findings we propose two task oriented attentional mechanisms acting as Bayesian p...
详细信息
ISBN:
(纸本)9781479903573
How do humans attend to and pick out relevant auditory objects amongst all other sounds in the environment? Based on neurophysiological findings we propose two task oriented attentional mechanisms acting as Bayesian priors which act on two separate levels of processing: a sensory mapping stage and object representation stage. The former sensory stage is modeled as a high dimensional mapping which captures the spectrotemporal nuances and cues of auditory objects. The latter object representation stage then captures the statistical distribution of the different classes of acoustic scenes. This scheme shows a relative improvement in performance by 81 % compared to a baseline system.
暂无评论