Computer vision is a very wide field, although, research is advancing rapidly. But there are still many incomplete areas of research. For this reason we are interested on 3D reconstruction field, which consists of cre...
详细信息
When working with 3D facial data, improving fidelity and avoiding the uncanny valley effect is critically dependent on accurate 3D facial performance capture. Because such methods are expensive and due to the widespre...
详细信息
ISBN:
(纸本)9798350353013;9798350353006
When working with 3D facial data, improving fidelity and avoiding the uncanny valley effect is critically dependent on accurate 3D facial performance capture. Because such methods are expensive and due to the widespread availability of 2D videos, recent methods have focused on how to perform monocular 3D face tracking. However, these methods often fall short in capturing precise facial movements due to limitations in their network architecture, training, and evaluation processes. Addressing these challenges, we propose a novel face tracker, FlowFace, that introduces an innovative 2D alignment network for dense pervertex alignment. Unlike prior work, FlowFace is trained on high-quality 3D scan annotations rather than weak supervision or synthetic data. Our 3D model fitting module jointly fits a 3D face model from one or many observations, integrating existing neutral shape priors for enhanced identity and expression disentanglement and per-vertex deformations for detailed facial feature reconstruction. Additionally, we propose a novel metric and benchmark for assessing tracking accuracy. Our method exhibits superior performance on both custom and publicly available benchmarks. We further validate the effectiveness of our tracker by generating high-quality 3D datafrom 2D videos, which leads to performance gains on downstream tasks.
Light field reconstruction, a crucial domain in computer vision, generates 3D models of real-world scenes by combining multi-view images. Multi-view stereo technology is essential in this process, leveraging images fr...
详细信息
Panorama image has a large 360 degrees field of view, providing rich contextual information for object detection, widely used in virtual reality, augmented reality, scene understanding, etc. However, existing methods ...
详细信息
ISBN:
(纸本)9783031263125;9783031263132
Panorama image has a large 360 degrees field of view, providing rich contextual information for object detection, widely used in virtual reality, augmented reality, scene understanding, etc. However, existing methods for object detection on panorama image still have some problems. When 360 degrees content is converted to the projection plane, the geometric distortion brought by the projection model makes the neural network can not extract features efficiently, the objects at the boundary of the projection image are also incomplete. To solve these problems, in this paper, we propose a novel two-stage detection network, RepF-Net, comprehensively utilizing multiple distortion-aware convolution modules to deal with geometric distortion while performing effective features extraction, and using the non-maximum fusion algorithm to fuse the content of the detected object in the post-processing stage. Our proposed unified distortion-aware convolution modules can be used to deal with distortions from geometric transforms and projection models, and be used to solve the geometric distortion caused by equirectangular projection and stereographic projection in our network. Our proposed non-maximum fusion algorithm fuses the content of detected objects to deal with incomplete object content separated by the projection boundary. Experimental results show that our RepF-Net outperforms previous state-of-the-art methods by 6% on mAP. Based on RepF-Net, we present an implementation of 3D object detection and scene layout reconstruction application.
We propose NViST, a transformer-based model for efficient and generalizable novel-view synthesis from a single image for real-world scenes. In contrast to many methods that are trained on synthetic data, object-centre...
详细信息
ISBN:
(纸本)9798350353006
We propose NViST, a transformer-based model for efficient and generalizable novel-view synthesis from a single image for real-world scenes. In contrast to many methods that are trained on synthetic data, object-centred scenarios, or in a category-specific manner, NViST is trained on MVImgNet, a large-scale dataset of casually-captured real-world videos of hundreds of object categories with diverse backgrounds. NViST transforms image inputs directly into a radiance field, conditioned on camera parameters via adaptive layer normalisation. In practice, NViST exploits fine-tuned masked autoencoder (MAE) features and translates them to 3D output tokens via cross-attention, while addressing occlusions with self-attention. To move away from object-centred datasets and enable full scene synthesis, NViST adopts a 6-DOF camera pose model and only requires relative pose, dropping the need for canonicalization of the training data, which removes a substantial barrier to it being used on casually captured datasets. We show results on unseen objects and categories from MVImgNet and even generalization to casual phone captures. We conduct qualitative and quantitative evaluations on MVImgNet and ShapeNet to show that our model represents a step forward towards enabling true in-the-wild generalizable novel-view synthesis from a single image. Project webpage: https://***/nvist_webpage.
In this paper, we explore the potential of Snapshot Compressive Imaging (SCI) technique for recovering the underlying 3D scene representation from a single temporal compressed image. SCI is a cost-effective method tha...
详细信息
ISBN:
(纸本)9798350353006
In this paper, we explore the potential of Snapshot Compressive Imaging (SCI) technique for recovering the underlying 3D scene representation from a single temporal compressed image. SCI is a cost-effective method that enables the recording of high- dimensional data, such as hyperspectral or temporal information, into a single image using low-cost 2D imaging sensors. To achieve this, a series of specially designed 2D masks are usually employed, which not only reduces storage requirements but also offers potential privacy protection. Inspired by this, to take one step further, our approach builds upon the powerful 3D scene representation capabilities of neural radiance fields (NeRF). Specifically, we formulate the physical imaging process of SCI as part of the training of NeRF, allowing us to exploit its impressive performance in capturing complex scene structures. To assess the effectiveness of our method, we conduct extensive evaluations using both synthetic data and real data captured by our SCI system. Extensive experimental results demonstrate that our proposed approach surpasses the state-of-the-art methods in terms of imagereconstruction and novel view image synthesis. Moreover, our method also exhibits the ability to restore high frame-rate multi-view consistent images by leveraging SCI and the rendering capabilities of NeRF. The code is available at https://***/WU-CVGL/SCINeRF.
Sparse-view computed tomography (CT) is an effective method for reducing radiation dose. The images reconstructed from insufficient data obtained from sparse-view CT suffer from severe star shaped artifacts. Therefore...
详细信息
ISBN:
(纸本)9798350349122;9798350349115
Sparse-view computed tomography (CT) is an effective method for reducing radiation dose. The images reconstructed from insufficient data obtained from sparse-view CT suffer from severe star shaped artifacts. Therefore, reducing radiation dose will reduce imaging quality. Although deep learning methods used for sparse-view CT reconstruction have achieved impressive success, such as convolutional neural network (CNN), the reconstruction results are still too smooth, i.e. losing many details. In this work, we propose a two-stage deep learning method to reduce the artifacts of sparse-view CT images. We conducted several numerical simulation experiments to test the performance of the proposed network. The results indicate that the denoising method can significantly reduce the artifacts caused by sparse sampling and carry more detailed information than CNN.
The text-to-image generation model has attracted significant interest from both academic and industrial communities. These models can generate the images based on the given prompt descriptions. Their potent capabiliti...
详细信息
ISBN:
(纸本)9798400706363
The text-to-image generation model has attracted significant interest from both academic and industrial communities. These models can generate the images based on the given prompt descriptions. Their potent capabilities, while beneficial, also present risks. Previous efforts relied on the approach of training binary classifiers to detect the generated fake images, which is inefficient, lacking in generalizability, and non-robust. In this paper, we propose the novel zero-shot detection method, called ZeroFake, to distinguish fake images apart from real ones by utilizing a perturbation-based DDIM inversion technique. ZeroFake is inspired by the findings that fake images are more robust than real images during the process of DDIM inversion and reconstruction. Specifically, for a given image, ZeroFake first generates noise with DDIM inversion guided by adversary prompts. Then, ZeroFake reconstructs the imagefrom the generated noise. Subsequently, it compares the reconstructed image with the original image to determine whether it is fake or real. By exploiting the differential response of fake and real images to the adversary prompts during the inversion and reconstruction process, our model offers a more robust and efficient method to detect fake images without the extensive data and training costs. Extensive results demonstrate that the proposed ZeroFake can achieve great performance in fake image detection, fake artwork detection, and fake edited image detection. We further illustrate the robustness of the proposed ZeroFake by showcasing its resilience against potential adversary attacks. We hope that our solution can better assist the community in achieving the arrival of a more efficient and fair AGI.(1)
In deformable object manipulation, we often want to interact with specific segments of an object that are only defined in non-deformed models of the object. We thus require a system that can recognize and locate these...
详细信息
ISBN:
(纸本)9798350318920;9798350318937
In deformable object manipulation, we often want to interact with specific segments of an object that are only defined in non-deformed models of the object. We thus require a system that can recognize and locate these segments in sensor data of deformed real world objects. This is normally done using deformable object registration, which is problem specific and complex to tune. Recent methods utilize neural occupancy functions to improve deformable object registration by registering to an object reconstruction. Going one step further, we propose a system that in addition to reconstruction learns segmentation of the reconstructed object. As the resulting output already contains the information about the segments, we can skip the registration process. Tested on a variety of deformable objects in simulation and the real world, we demonstrate that our method learns to robustly find these segments. We also introduce a simple sampling algorithm to generate better training data for occupancy learning.
image anomaly detection focuses on identifying areas that deviate from normal patterns. The most commonly used networks usually follow an encoder-decoder architecture. These networks learn from normal training samples...
详细信息
暂无评论