In this paper we present a unified formulation for a large class of relative pose problems with radial distortion and varying calibration. For minimal cases, we show that one can eliminate the number of parameters dow...
详细信息
ISBN:
(纸本)9781665448994
In this paper we present a unified formulation for a large class of relative pose problems with radial distortion and varying calibration. For minimal cases, we show that one can eliminate the number of parameters down to one to three. The relative pose can then be expressed using varying calibration constraints on the fundamental matrix, with entries that are polynomial in the parameters. We can then apply standard techniques based on the action matrix and Sturm sequences to construct our solvers. This enables efficient solvers for a large class of relative pose problems with radial distortion, using a common framework. We evaluate a number of these solvers for robust two-view inlier and epipolar geometry estimation, used as minimal solvers in RANSAC.
Depth estimation from a single 360 degrees panorama image is a difficult task. It is an ill-posed problem to estimate depth maps from an RGB panorama image due to the intrinsic scale ambiguity issue. To mitigate the s...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Depth estimation from a single 360 degrees panorama image is a difficult task. It is an ill-posed problem to estimate depth maps from an RGB panorama image due to the intrinsic scale ambiguity issue. To mitigate the scale inconsistency issue in the ground truth depth map, we propose a simple yet effective method to normalize the depth data based on estimated camera height. In addition, we design a multiple head planar-guided depth network, to provide more geometric constraints for depth estimation. Experimental results show that our relative depth estimation task is more accurate than the absolute depth estimation task, and our proposed model produces state-of-the-art performance on both Matterport3D and Stanford2D3D datasets.
This paper introduces the Neurodata Lab's approach presented at the 1st Challenge on Remote Physiological Signal Sensing (RePSS) organized within CVPR2020. The RePSS challenge was focused on measuring the average ...
详细信息
ISBN:
(纸本)9781728193601
This paper introduces the Neurodata Lab's approach presented at the 1st Challenge on Remote Physiological Signal Sensing (RePSS) organized within CVPR2020. The RePSS challenge was focused on measuring the average heart rate from color facial videos, which is one of the most fundamental problems in the field of computervision. Our deep learning-based approach includes 3D spatio-temporal attention convolutional neural network for photoplethysmogram extraction and 1D convolutional neural network pre-trained on synthetic data for time series analysis. It provides state-of-the-art results outperforming those of other participants on a mixture of VIPL and OBF databases: MAE=6.94 (12.3% improvement compared to the top-2 result), RMSE=10.68 (24.6% improvement), Pearson R = 0.755 (28.2% improvement).
Neural Radiance Fields (NeRFs) have emerged as a standard framework for representing 3D scenes and objects, introducing a novel data type for information exchange and storage. Concurrently, significant progress has be...
详细信息
ISBN:
(纸本)9798350365474
Neural Radiance Fields (NeRFs) have emerged as a standard framework for representing 3D scenes and objects, introducing a novel data type for information exchange and storage. Concurrently, significant progress has been made in multimodal representation learning for text and image data. This paper explores a novel research direction that aims to connect the NeRF modality with other modalities, similar to established methodologies for images and text. To this end, we propose a simple framework that exploits pre-trained models for NeRF representations alongside multimodal models for text and image processing. Our framework learns a bidirectional mapping between NeRF embeddings and those obtained from corresponding images and text. This mapping unlocks several novel and useful applications, including NeRF zero-shot classification and NeRF retrieval from images or text.
Automotive systems provide a unique opportunity for mobile vision technologies to improve road safety by understanding and monitoring the driver. In this work, we propose a real-time framework for early detection of d...
详细信息
ISBN:
(纸本)9781479943098
Automotive systems provide a unique opportunity for mobile vision technologies to improve road safety by understanding and monitoring the driver. In this work, we propose a real-time framework for early detection of driver maneuvers. The implications of this study would allow for better behavior prediction, and therefore the development of more efficient advanced driver assistance and warning systems. Cues are extracted from an array of sensors observing the driver (head, hand, and foot), the environment (lane and surrounding vehicles), and the ego-vehicle state (speed, steering angle, etc.). Evaluation is performed on a real-world dataset with overtaking maneuvers, showing promising results. In order to gain better insight into the processes that characterize driver behavior, temporally discriminative cues are studied and visualized.
We present WiCV 2018 - Women in computervision Workshop to increase the visibility and inclusion of women researchers in computervision field, organized in conjunction with CVPR 2018. computervision and machine lea...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
We present WiCV 2018 - Women in computervision Workshop to increase the visibility and inclusion of women researchers in computervision field, organized in conjunction with CVPR 2018. computervision and machine learning have made incredible progress over the past years, yet the number of female researchers is still low both in academia and industry. WiCV is organized to raise visibility of female researchers, to increase the collaboration, and to provide mentorship and give opportunities to female-identifying junior researchers in the field. In its fourth year, we are proud to present the changes and improvements over the past years, summary of statistics for presenters and attendees, followed by expectations from future generations.
Face recognition technique is widely used in the real-world applications over the past decade. Different from other biometric traits such as fingerprint and iris, face is the biological nature for humans to recognise ...
详细信息
ISBN:
(纸本)9780769549903
Face recognition technique is widely used in the real-world applications over the past decade. Different from other biometric traits such as fingerprint and iris, face is the biological nature for humans to recognise a person even met just once. In this paper, we propose a novel method, which simulates the mechanism of fixations and saccades in human visual perception, to handle the face recognition from single image per person problem. Our method is robust to the local deformations of the face (i.e., expression changes and occlusions). Especially for the occlusion related problems, which have not received enough attentions compared with other challenging variations of illumination, expression and pose, our method significantly outperforms the state-of-the-art approaches despite various types of occlusions. Experimental results on the FRGC and the AR databases confirm the effectiveness of our method.
We present an end-to-end trainable framework for P-frame compression in this paper. A joint motion vector (MV) and residual prediction network MV-Residual is designed to extract the ensembled features of motion repres...
详细信息
ISBN:
(数字)9781728193601
ISBN:
(纸本)9781728193601
We present an end-to-end trainable framework for P-frame compression in this paper. A joint motion vector (MV) and residual prediction network MV-Residual is designed to extract the ensembled features of motion representations and residual information by treating the two successive frames as inputs. The prior probability of the latent representations is modeled by a hyperprior auto-encoder and trained jointly with the MV-Residual network. Specially, the spatially-displaced convolution is applied for video frame prediction, in which a motion kernel for each pixel is learned to generate predicted pixel by applying the kernel at a displaced location in the source image. Finally, novel rate allocation and post-processing strategies are used to produce the final compressed bits, considering the bits constraint of the challenge. The experimental results on validation set show that the proposed optimized framework can generate the highest MS-SSIM for P-frame compression competition.
Several recent efforts in computervision indicate a trend toward studying and understanding problems in larger scale environments, beyond single images, and focus on connections to tasks in navigation, mobile manipul...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Several recent efforts in computervision indicate a trend toward studying and understanding problems in larger scale environments, beyond single images, and focus on connections to tasks in navigation, mobile manipulation, and visual question answering. A common goal of these tasks is the capability of moving in the environment, acquiring novel views during perception and while performing a task. This capability comes easily in synthetic environments, however achieving the same effect with real images is much more laborious. We propose using the existing Active vision Dataset to form a benchmark for such problems in a real-world settings with real images. The dataset is well suited for evaluating tasks of multiview active recognition, target driven navigation, and target search, and also can be effective for studying the transfer of strategies learned in simulation to real settings.
Sign languages are visual languages produced by the movement of the hands, face, and body. In this paper, we evaluate representations based on skeleton poses, as these are explainable, person-independent, privacy-pres...
详细信息
ISBN:
(纸本)9781665448994
Sign languages are visual languages produced by the movement of the hands, face, and body. In this paper, we evaluate representations based on skeleton poses, as these are explainable, person-independent, privacy-preserving, low-dimensional representations. Basically, skeletal representations generalize over an individual's appearance and background, allowing us to focus on the recognition of motion. But how much information is lost by the skeletal representation? We perform two independent studies using two state-of-the-art pose estimation systems. We analyze the applicability of the pose estimation systems to sign language recognition by evaluating the failure cases of the recognition models. Importantly, this allows us to characterize the current limitations of skeletal pose estimation approaches in sign language recognition.
暂无评论