作者:
Yan, KeChina Acad Engn Phys
Inst Comp Applicat New Generat Informat Technol Ctr Mianyang 621900 Sichuan Peoples R China
Golden acoustic models, which are built or optimized using standard pronunciation data, are widely used in automatic pronunciation quality assessment. However, this work points out that because of the mismatch between...
详细信息
ISBN:
(纸本)9781479967315
Golden acoustic models, which are built or optimized using standard pronunciation data, are widely used in automatic pronunciation quality assessment. However, this work points out that because of the mismatch between training and test, golden acoustic models are unable to accurately measure the pronunciation quality for accented speeches. To deal with the problem, this paper presents a novel approach which uses both standard and non-standard speeches to optimize acoustic model by minimizing the root mean square error between human and machine scores. And we also derive an ebw-like algorithm for parameter optimization. The experimental results proved the effectiveness. The cross correlation increases from 0.610 to 0.713 and the root mean square error reduces from 1.930 to 1.685.
Automatic speech recognition (ASR) is an enabling technology for a wide range of information processing applications including speech translation, voice search (i.e., information retrieval with speech input), and conv...
详细信息
ISBN:
(纸本)9781467300469
Automatic speech recognition (ASR) is an enabling technology for a wide range of information processing applications including speech translation, voice search (i.e., information retrieval with speech input), and conversational understanding. In these speech-centric applications, the output of ASR as "noisy" text is fed into down-stream processing systems to accomplish the designated tasks of translation, information retrieval, or natural language understanding, etc. In conventional applications, the ASR model as a sub-system is usually trained without considering the down-stream systems. This often leads to sub-optimal end-to-end performance. In this paper, we propose a unifying end-to-end optimization framework in which the model parameters in all subsystems including ASR are learned by Extended Baum-Welch (ebw) algorithms via optimizing the criteria directly tied to the end-to-end performance measure. We demonstrate the effectiveness of the proposed approach on a speech translation task using the spoken language translation benchmark test of IWSLT. Our experimental results show that the proposed method leads to significant improvement of translation quality over the conventional techniques based on separate modular sub-system design. We also analyze the ebw-based optimization algorithms employed in our work and discuss its relationship with other popular optimization techniques.
暂无评论