版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Univ Grenoble 3 CNRS INPG Inst Commun Parlee F-38031 Grenoble France
出 版 物:《IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING》 (IEEE Trans Speech Audio Process)
年 卷 期:2004年第12卷第3期
页 面:265-276页
核心收录:
主 题:audiovisual lip parameters low-bit-rate speech coding LPC parameters matrix quantization speech processing
摘 要:A key problem for videophony, that is telephony including the processing of images of the speaker s face in addition to acoustic speech, concerns signal compression for transmission. In such systems, audio and video compression are separately achieved by using both audio and video coders. In this paper, an audio-visual approach to this problem is considered, since we claim that the fundamental property of coherence (redundancy) between the two modalities of speech should be exploited by coding systems. We consider the framework of parametric analysis, modeling and synthesis of talking faces, which allows efficient representation of video information. Thus, we propose to jointly encode several face parameters, namely lip shape geometric descriptors, together with sets of audio coefficients, namely quite usual LPC parameters. The definition of an audiovisual distance between vectors of concatenated audio and video parameters allows to generate audiovisual single stage vector and matrix quantizers by using the generalized Lloyd algorithm. Calculation of video and audio mean distortion measures shows a significant gain in quantization accuracy and/or resolution compared to separate video and audio quantization. An alternative sub-optimal tree-like structure for audiovisual joint coding is also tested and yields interesting results while decreasing the computational complexity of the quantization process.