文献详情 >PMMTalk: Speech-Driven 3D Faci... 收藏

PMMTalk: Speech-Driven 3D Facial Animation From Complementary Pseudo Multi-Modal Features

作者：Han, Tianshun Gui, Shengnan Huang, Yiqing Li, Baihui Liu, Lijian Zhou, Benjia Jiang, Ning Lu, Quan Zhi, Ruicong Liang, Yanyan Zhang, Du Wan, Jun

作者机构：Macau Univ Sci & Technol Fac Innovat Engn Sch Comp Sci & Engn Macau 999078 Peoples R China Chinese Acad Sci CASIA Inst Automat State Key Lab Multimodal Artificial Intelligence S Beijing 100190 Peoples R China Univ Chinese Acad Sci Sch Artificial Intelligence Beijing 100049 Peoples R China Univ Sci & Technol Beijing Sch Comp & Commun Engn Beijing 100083 Peoples R China Beijing Key Lab Knowledge Engn Mat Sci Beijing 100083 Peoples R China Mashang Consumer Finance Co Ltd Chongqing 400000 Peoples R China

出版物：《IEEE TRANSACTIONS ON MULTIMEDIA》 (IEEE Trans Multimedia)

年卷期：2025年第27卷

页面：2570-2581页

核心收录：

学科分类：0810[工学-信息与通信工程] 0808[工学-电气工程] 08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术（可授工学、理学学位）]

基　　金：National Key Research and Development Program of China [2021YFE0205700] Beijing Natural Science Foundation [JQ23016] Chinese National Natural Science Foundation Projects Science and Technology Development Fund of Macau Project [0123/2022/A3, 0044/2024/AGJ, 0070/2020/AMJ]

主　　题：Three-dimensional displays Facial animation Faces Visualization Feature extraction Face recognition Solid modeling Synchronization Decoding Acoustics Speech-driven 3D facial animation PMMTalk 3D Chinese Audio-Visual Facial Animation (3D-CAVFA) dataset

摘要：Speech-driven 3D facial animation has improved a lot recently while most related works only utilize acoustic modality and neglect the influence of visual and textual cues, leading to unsatisfactory results in terms of precision and coherence. We argue that visual and textual cues are not trivial information. Therefore, we present a novel framework, namely PMMTalk, using complementary Pseudo Multi-Modal features for improving the accuracy of facial animation. The framework entails three modules: PMMTalk encoder, cross-modal alignment module, and PMMTalk decoder. Specifically, the PMMTalk encoder employs the off-the-shelf talking head generation architecture and speech recognition technology to extract visual and textual information from speech, respectively. Following this, the cross-modal alignment module aligns the audio-image-text features at temporal and semantic levels. Subsequently, the PMMTalk decoder is employed to predict lip-syncing facial blendshape coefficients. Contrary to prior methods, PMMTalk only requires an additional random reference face image but yields more accurate results. Additionally, it is artist-friendly as it seamlessly integrates into standard animation production workflows by introducing facial blendshape coefficients. Finally, given the scarcity of 3D talking face datasets, we introduce a large-scale 3D Chinese Audio-Visual Facial Animation (3D-CAVFA) dataset. Extensive experiments and user studies show that our approach outperforms the state of the art. Codes and datasets are available at PMMTalk.

本地馆藏 | 借阅须知 | 我要预约

已订购，未入库

sda

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

PMMTalk: Speech-Driven 3D Facial Animation From Complementary Pseudo Multi-Modal Features

读者评论与其他读者分享你的观点

请选择收藏分类：

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

PMMTalk: Speech-Driven 3D Facial Animation From Complementary Pseudo Multi-Modal Features

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：