Analyzing the geometric and semantic properties of 3D point cloud data via the deep learning networks is still challenging due to the irregularity and sparsity of samplings of their geometric structures. In our study,...
详细信息
Analyzing the geometric and semantic properties of 3D point cloud data via the deep learning networks is still challenging due to the irregularity and sparsity of samplings of their geometric structures. In our study, the authors combine the advantage of voxels and point clouds by presenting a new data form of voxel models, called Layer-Ring data. This data type can retain the fine description of the 3D data, and keep the high efficiency of feature extraction. After that, based on the Layer-Ring data, a modern network architecture, called VoxPoint Annular Network (VAN), works on the Layer-Ring data for the feature extraction and object category prediction. The design idea is based on the edge-extraction and the coordinate representation for each voxel on the separated layer. With the flexible design, our proposed VAN can adapt to the layer's geometric variability and scalability. Finally, the extensive experiments and comparisons demonstrate that our approach obtained the notable results with the state-of-the-art methods on a variety of standard benchmark datasets (e.g., ModelNet10, ModelNet40). Moreover, the tests also proved that 3D shape features could learn efficiently and robustly.
We propose a method to estimate the mechanical parameters of fabrics using a casual capture setup with a depth camera. Our approach enables to create mechanically-correct digital representations of real-world textile ...
详细信息
We propose a method to estimate the mechanical parameters of fabrics using a casual capture setup with a depth camera. Our approach enables to create mechanically-correct digital representations of real-world textile materials, which is a fundamental step for many interactive design and engineering applications. As opposed to existing capture methods, which typically require expensive setups, video sequences, or manual intervention, our solution can capture at scale, is agnostic to the optical appearance of the textile, and facilitates fabric arrangement by non-expert operators. To this end, we propose a sim-to-real strategy to train a learning-based framework that can take as input one or multiple images and outputs a full set of mechanical parameters. Thanks to carefully designed data augmentation and transfer learning protocols, our solution generalizes to real images despite being trained only on synthetic data, hence successfully closing the sim-to-real loop. Key in our work is to demonstrate that evaluating the regression accuracy based on the similarity at parameter space leads to an inaccurate distances that do not match the human perception. To overcome this, we propose a novel metric for fabric drape similarity that operates on the image domain instead on the parameter space, allowing us to evaluate our estimation within the context of a similarity rank. We show that out metric correlates with human judgments about the perception of drape similarity, and that our model predictions produce perceptually accurate results compared to the ground truth parameters.
High-quality portrait image editing has been made easier by recent advances in GANs (e.g., StyleGAN) and GAN inversion methods that project images onto a pre-trained GAN's latent space. However, extending the exis...
详细信息
High-quality portrait image editing has been made easier by recent advances in GANs (e.g., StyleGAN) and GAN inversion methods that project images onto a pre-trained GAN's latent space. However, extending the existing image editing methods, it is hard to edit videos to produce temporally coherent and natural-looking videos. We find challenges in reproducing diverse video frames and preserving the natural motion after editing. In this work, we propose solutions for these challenges. First, we propose a video adaptation method that enables the generator to reconstruct the original input identity, unusual poses, and expressions in the video. Second, we propose an expression dynamics optimization that tweaks the latent codes to maintain the meaningful motion in the original video. Based on these methods, we build a StyleGAN-based high-quality portrait video editing system that can edit videos in the wild in a temporally coherent way at up to 4K resolution.
We propose BareSkinNet, a novel method that simultaneously removes makeup and lighting influences from the face image. Our method leverages a 3D morphable model and does not require a reference clean face image or a s...
详细信息
We propose BareSkinNet, a novel method that simultaneously removes makeup and lighting influences from the face image. Our method leverages a 3D morphable model and does not require a reference clean face image or a specified light condition. By combining the process of 3D face reconstruction, we can easily obtain 3D geometry and coarse 3D textures. Using this information, we can infer normalized 3D face texture maps (diffuse, normal, roughness, and specular) by an image-translation network. Consequently, reconstructed 3D face textures without undesirable information will significantly benefit subsequent processes, such as re-lighting or re-makeup. In experiments, we show that BareSkinNet outperforms state-of-the-art makeup removal methods. In addition, our method is remarkably helpful in removing makeup to generate consistent high-fidelity texture maps, which makes it extendable to many realistic face generation applications. It can also automatically build graphic assets of face makeup images before and after with corresponding 3D data. This will assist artists in accelerating their work, such as 3D makeup avatar creation.
暂无评论