We present Pahi, a unified water pipeline and toolset for visual effects production. It covers procedural blocking visualization for preproduction, simulation of various water phenomena from large-scale splashes with ...
详细信息
Sign language is a natural language widely used by Deaf and hard of hearing (DHH) individuals. Advanced wearables are developed to recognize sign language automatically. However, they are limited by the lack of labele...
详细信息
Sign language is a natural language widely used by Deaf and hard of hearing (DHH) individuals. Advanced wearables are developed to recognize sign language automatically. However, they are limited by the lack of labeled data, which leads to a small vocabulary and unsatisfactory performance even though laborious efforts are put into data collection. Here we propose SignRing, an IMU-based system that breaks through the traditional data augmentation method, makes use of online videos to generate the virtual IMU (v-IMU) data, and pushes the boundary of wearable-based systems by reaching the vocabulary size of 934 with sentences up to 16 glosses. The v-IMU data is generated by reconstructing 3D hand movements from two-view videos and calculating 3-axis acceleration data, by which we are able to achieve a word error rate (WER) of 6.3% with a mix of half v-IMU and half IMU training data (2339 samples for each), and a WER of 14.7% with 100% v-IMU training data (6048 samples), compared with the baseline performance of the 8.3% WER (trained with 2339 samples of IMU data). We have conducted comparisons between v-IMU and IMU data to demonstrate the reliability and generalizability of the v-IMU data. This interdisciplinary work covers various areas such as wearable sensor development, computer vision techniques, deep learning, and linguistics, which can provide valuable insights to researchers with similar research objectives. [graphics] .
Volumetric capture is an important topic in eXtended Reality (XR) as it enables the integration of realistic three-dimensional content into virtual scenarios and immersive applications. Certain systems are even capabl...
详细信息
ISBN:
(纸本)9798400704123
Volumetric capture is an important topic in eXtended Reality (XR) as it enables the integration of realistic three-dimensional content into virtual scenarios and immersive applications. Certain systems are even capable of delivering these volumetric captures live and in real-time, opening the door to interactive use cases such as immersive videoconferencing. One example of such systems is FVV Live, a Free Viewpoint Video (FVV) application capable of working in real-time with low delay Current breakthroughs in Artificial Intelligence (AI) in general and deep learning in particular report great success when applied to the computer vision tasks involved in volumetric capture, helping to overcome the quality and bandwidth restrictions that these systems often face. Despite their promising results, state-of-the-art approaches still come with the disadvantage of requiring large processing power and time. This project aims to advance the volumetric capture state-of-the-art applying the previously mentioned deep learning techniques, optimizing the models to work in real-time while still delivering high quality. The technology developed will be validated integrating it into immersive video communication systems such as FVV Live in order to overcome their main restrictions and to improve the quality delivered to the end user.
The proceedings contain 76 papers. The topics discussed include: hustle by day, risk it all at night;auto-adaptivity: an optimization-based approach to spatial adaptivity for smoke simulations;underwater bubbles and c...
ISBN:
(纸本)9781450379717
The proceedings contain 76 papers. The topics discussed include: hustle by day, risk it all at night;auto-adaptivity: an optimization-based approach to spatial adaptivity for smoke simulations;underwater bubbles and coupling;sparse smoke simulations in Houdini;real-time ray-traced ambient occlusion of complex scenes using spatial hashing;predictable and targeted softening of the shadow terminator;fifty shades of yay: a multi-shot workflow from design to final;termite: dreamworks procedural environment rigging tool;making time for emotional intelligence in production and technology;designing effects workflows: the thinking behind tool development;and is it acid or is it fire? how to train your dragon: the hidden world.
We explored continuous changes in self-other identity by designing an interpersonal facial morphing experience where the facial images of two users are blended and then swapped over time. To explore this with diverse ...
Methods for faithfully capturing a user's holistic pose have immediate uses in AR/VR, ranging from multimodal input to expressive avatars. Although body-tracking has received the most attention, the mouth is also ...
详细信息
Given the remarkable results of motion synthesis with diffusion models, a natural question arises: how can we effectively leverage these models for motion editing? Existing diffusion-based motion editing methods overl...
详细信息
ISBN:
(纸本)9798400711312
Given the remarkable results of motion synthesis with diffusion models, a natural question arises: how can we effectively leverage these models for motion editing? Existing diffusion-based motion editing methods overlook the profound potential of the prior embedded within the weights of pre-trained models, which enables manipulating the latent feature space;hence, they primarily center on handling the motion space. In this work, we explore the attention mechanism of pre-trained motion diffusion models. We uncover the roles and interactions of attention elements in capturing and representing intricate human motion patterns, and carefully integrate these elements to transfer a leader motion to a follower one while maintaining the nuanced characteristics of the follower, resulting in zero-shot motion transfer. Manipulating features associated with selected motions allows us to confront a challenge observed in prior motion diffusion approaches, which use general directives ( e.g., text, music) for editing, ultimately failing to convey subtle nuances effectively. Our work is inspired by the phrase Monkey See, Monkey Do, relating to human mimicry. Our technique enables accomplishing tasks such as synthesizing out-of-distribution motions, style transfer, and spatial editing. Furthermore, diffusion inversion is seldom employed for motions;as a result, editing efforts focus on generated motions, limiting the editability of real ones. MoMo harnesses motion inversion, extending its application to both real and generated motions. Experimental results show the advantage of our approach over the current art. In particular, unlike methods tailored for specific applications through training, our approach is applied at inference time, requiring no training. Webpage: https://***.
暂无评论