Despite the longstanding adage "an image is worth a thousand words," generating accurate hyper-detailed image descriptions remains unsolved. Trained on short web-scraped image-text, vision-language models of...
In recent years, Neural Radiance Fields (NeRF) is attracting attention for its excellent performance in reconstructing 3D scenes from 2D images by capturing volumetric scene representations and radiance properties. Ho...
详细信息
An active 3D microwave / millimeter-wave shoe scanner was previously developed at the Pacific Northwest National Laboratory (PNNL) using two linear arrays scanned over a rectilinear aperture. The radar system chirps a...
详细信息
ISBN:
(纸本)9781510674158;9781510674141
An active 3D microwave / millimeter-wave shoe scanner was previously developed at the Pacific Northwest National Laboratory (PNNL) using two linear arrays scanned over a rectilinear aperture. The radar system chirps a frequency sweep from 10-40 GHz. These frequencies allow imaging through optically opaque material such as leather, rubber, plastics, and other dielectrics. The system was designed to detect concealed items in the soles of shoes while allowing people to leave their shoes on through a security checkpoint. To shrink the footprint of the system, a new iteration of the design has been developed that scans the two linear arrays over a circular aperture. This new footprint opens the possibility of it being installed in the floor of a cylindrical millimeter-wave body scanner. The backprojection-based multilayer dielectric imagereconstruction developed at PNNL can easily handle arbitrary spatial sampling, accommodating the new rotational shoe scanner design. Commonly, the fast Fourier transform (FFT) is used to efficiently compute the range response from the data collected by the system as a preprocessing step to the backprojection algorithm. It was found that converting to range using the discrete Fourier transform (DFT) directly has some advantages over the FFT. For example, nonlinear and non-uniform frequency sweeps can easily be compensated for during the computation of the DFT and only the range bins of interest need to be computed and their spacing can be chosen arbitrarily. Because the range conversion step of the imagereconstruction is the fastest part of the process there is very little speed penalty for using the DFT over the FFT and it can even increase the speed of imagereconstruction when the ranges of interest are fewer than the total span that is calculated in the FFT.
A two-stage Deep Learning strategy for resolving the electromagnetic inverse scattering problem (ISP) is proposed in this paper. The two steps involve utilizing Deep Convolutional Neural Network (DConvNet) to draw out...
详细信息
Large-scale, diverse Synthetic Aperture Radar (SAR) imagedata plays a crucial role in Automatic Target Recognition (ATR). Directly acquiring SAR imagedata faces significant challenges in practical operation and acqu...
详细信息
Occupancy Grid Mapping is a form of Simultaneous Localisation and Mapping (SLAM) in which the world around a robot is visually represented as a grid map. This form of map can be compared to a floor plan in which featu...
详细信息
ISBN:
(纸本)9798350373141;9798350373158
Occupancy Grid Mapping is a form of Simultaneous Localisation and Mapping (SLAM) in which the world around a robot is visually represented as a grid map. This form of map can be compared to a floor plan in which features within an environment such as walls are labelled in place. Certain issues such as noise, artefacts, linear error, angular error, and incomplete rooms make this representation difficult to appropriate. Generative Adversarial Networks (GAN) [1] in the past have proven successful in and reliable methods for noise reduction, artefact removal [2], and partial observation completion [3]. We demonstrate a novel data creation process to mass produce samples of erroneous and ideal occupancy grid maps. We use this data to build two GAN models based on well-known frameworks CycleGAN [4] and CUT [5] for the task of occupancy grid cleaning. We demonstrate the generalisability of our models through making predictions of 'clean' maps on samples of real datafrom the Radish dataset [6].
Face reenactment and reconstruction benefit various applications in self-media, VR, etc. Recent face reenactment methods use 2D facial landmarks to implicitly retarget facial expressions and poses from driving videos ...
详细信息
ISBN:
(纸本)9781577358800
Face reenactment and reconstruction benefit various applications in self-media, VR, etc. Recent face reenactment methods use 2D facial landmarks to implicitly retarget facial expressions and poses from driving videos to source images, while they suffer from pose and expression preservation issues for cross-identity scenarios, i.e., when the source and the driving subjects are different. Current self-supervised face reconstruction methods also demonstrate impressive results. However, these methods do not handle large expressions well, since their training data lacks samples of large expressions, and 2D facial attributes are inaccurate on such samples. To mitigate the above problems, we propose to explore the inner connection between the two tasks, i.e., using face reconstruction to provide sufficient 3D information for reenactment, and synthesizing videos paired with captured face model parameters through face reenactment to enhance the expression module of face reconstruction. In particular, we propose a novel cascade framework named JR2Net for Joint Face reconstruction and Reenactment, which begins with the training of a coarse reconstruction network, followed by a 3D-aware face reenactment network based on the coarse reconstruction results. In the end, we train an expression tracking network based on our synthesized videos composed by image-face model parameter pairs. Such an expression tracking network can further enhance the coarse face reconstruction. Extensive experiments show that our JR2Net outperforms the state-of-the-art methods on several face reconstruction and reenactment benchmarks.
We propose a method to decompose a single eye region image in the wild into albedo, shading, specular, normal and illumination. This inverse rendering problem is particularly challenging due to inherent ambiguities an...
详细信息
ISBN:
(纸本)9798400705250
We propose a method to decompose a single eye region image in the wild into albedo, shading, specular, normal and illumination. This inverse rendering problem is particularly challenging due to inherent ambiguities and complex properties of the natural eye region. To address this problem, first we construct a synthetic eye region dataset with rich diversity. Then we propose a synthetic to real adaptation framework to leverage the supervision signals from synthetic data to guide the direction of self-supervised learning. We design region-aware self-supervised losses based on image formation and eye region intrinsic properties, which can refine each predicted component by mutual learning and reduce the artifacts caused by ambiguities of natural eye images. Particularly, we address the demanding problem of specularity removal in the eye region. We show high-quality inverse rendering results of our method and demonstrate its use for a number of applications.
This paper presents an image-based visual servoing scheme that can control robotic manipulators in 3D space using 2D stereo images without needing to perform stereo reconstruction. We use a stereo camera in an eye-to-...
详细信息
ISBN:
(纸本)9798350384581;9798350384574
This paper presents an image-based visual servoing scheme that can control robotic manipulators in 3D space using 2D stereo images without needing to perform stereo reconstruction. We use a stereo camera in an eye-to-hand configuration for controlling the robot to reach target positions by directly mapping image space errors to joint space actuation. We achieve convergence without a-priori knowledge of the target object, a reference 2D image, or 3D data. By doing so, we can reach targets in unstructured environments using high-resolution RGB images instead of utilizing relatively noisy depth data. We conduct several experiments on two different physical robots. The Panda 7DOF arm grasps a static target in 3D space, grasps a pitcher handle, and picks and places a box by determining the approach angle using 2D image features, demonstrating that this algorithm can be used for grasping practical objects in 3D space using only 2D image features for feedback. Our second platform, the Atlas humanoid robot, reaches a target from an unknown starting configuration, demonstrating that this controller achieves convergence to a target, even with the uncertainties introduced by walking to a new location. We believe that this algorithm is a step towards enabling intuitive interfaces that allow a user to initiate a grasp on an object by specifying a grasping point in a 2D image.
Deoccluding the hidden portions of objects in a scene is a formidable task, particularly when addressing real-world scenes. In this paper, we present a new self-supervised PArallel visible-to-COmplete diffusion framew...
详细信息
ISBN:
(纸本)9798400705250
Deoccluding the hidden portions of objects in a scene is a formidable task, particularly when addressing real-world scenes. In this paper, we present a new self-supervised PArallel visible-to-COmplete diffusion framework, named PACO, a foundation model for object-level scene deocclusion. Leveraging the rich prior of pre-trained models, we first design the parallel variational autoencoder, which produces a full-view feature map that simultaneously encodes multiple complete objects, and the visible-to-complete latent generator, which learns to implicitly predict the full-view feature map from partial-view feature map and text prompts extracted from the incomplete objects in the input image. To train PACO, we create a large-scale dataset with 500k samples to enable self-supervised learning, avoiding tedious annotations of the amodal masks and occluded regions. At inference, we devise a layer-wise deocclusion strategy to improve efficiency while maintaining the deocclusion quality. Extensive experiments on COCOA and various real-world scenes demonstrate the superior capability of PACO for scene deocclusion, surpassing the state of the arts by a large margin. Our method can also be extended to cross-domain scenes and novel categories that are not covered by the training set. Further, we demonstrate the deocclusion applicability of PACO in single-view 3D scene reconstruction and object recomposition. Project page: https://***/***/.
暂无评论