"Silent Speech Interfaces" (SSI) refers to a system which uses non-audible signals recorded during speech production to perform speech recognition and synthesis tasks. Different approaches have been proposed...
详细信息
ISBN:
(纸本)9781450368896
"Silent Speech Interfaces" (SSI) refers to a system which uses non-audible signals recorded during speech production to perform speech recognition and synthesis tasks. Different approaches have been proposed for the SSI systems. In this paper, we focus on an ultrasound-based SSI. The performance of ultrasound-based SSI system heavily relies on the feature extraction approach. However, most of the previous attempts are often limited to individual frame analysis, and the context information of the image sequence cannot be taken into account. Inspired by the recent success of the recurrent neural network and convolutionalauto-encoder, we explore a novel sequential feature extraction approach for SSI system. The architecture can extract spatial and temporal feature from the image sequence, which can be further deployed for the speech recognition and synthetic tasks. By quantitative comparison between different unsupervised feature extraction approaches, the new approach outperforms other methods on the 2010 SSI challenge.
暂无评论