In computervision, video-based approaches have been widely explored for the early classification and the prediction of actions or activities. However, it remains unclear whether this modality (as compared to 3D kinem...
详细信息
ISBN:
(纸本)9781538607336
In computervision, video-based approaches have been widely explored for the early classification and the prediction of actions or activities. However, it remains unclear whether this modality (as compared to 3D kinematics) can still be reliable for the prediction of human intentions, defined as the overarching goal embedded in an action sequence. Since the same action can be performed with different intentions, this problem is more challenging but yet affordable as proved by quantitative cognitive studies which exploit the 3D kinematics acquired through motion capture systems. In this paper, we bridge cognitive and computervision studies, by demonstrating the effectiveness of video-based approaches for the prediction of human intentions. Precisely, we propose Intention from Motion, a new paradigm where, without using any contextual information, we consider instantaneous grasping motor acts involving a bottle in order to forecast why the bottle itself has been reached (to pass it or to place in a box, or to pour or to drink the liquid inside). We process only the grasping onsets casting intention prediction as a classification framework. Leveraging on our multimodal acquisition (3D motion capture data and 2D optical videos), we compare the most commonly used 3D descriptors from cognitive studies with state-of-the-art video-based techniques. Since the two analyses achieve an equivalent performance, we demonstrate that computervision tools are effective in capturing the kinematics and facing the cognitive problem of human intention prediction.
This paper reviews the first challenge on single image super-resolution (restoration of rich details in an low resolution image) with focus on proposed solutions and results. A new DIVerse 2K resolution image dataset ...
详细信息
ISBN:
(纸本)9781538607336
This paper reviews the first challenge on single image super-resolution (restoration of rich details in an low resolution image) with focus on proposed solutions and results. A new DIVerse 2K resolution image dataset (DIV2K) was employed. The challenge had 6 competitions divided into 2 tracks with 3 magnification factors each. Track 1 employed the standard bicubic downscaling setup, while Track 2 had unknown downscaling operators (blur kernel and decimation) but learnable through low and high res train images. Each competition had similar to 100 registered participants and 20 teams competed in the final testing phase. They gauge the state-of-the-art in single image super-resolution.
This paper reviews the 2nd NTIRE challenge on single image super-resolution (restoration of rich details in a low resolution image) with focus on proposed solutions and results. The challenge had 4 tracks. Track 1 emp...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661017
This paper reviews the 2nd NTIRE challenge on single image super-resolution (restoration of rich details in a low resolution image) with focus on proposed solutions and results. The challenge had 4 tracks. Track 1 employed the standard bicubic downscaling setup, while Tracks 2, 3 and 4 had realistic unknown downgrading operators simulating camera image acquisition pipeline. The operators were learnable through provided pairs of low and high resolution train images. The tracks had 145, 114, 101, and 113 registered participants, resp., and 31 teams competed in the final testing phase. They gauge the state-of-the-art in single image super-resolution.
Active illumination based methods have a trade-off between acquisition time and resolution of the estimated 3D shapes. Multi-shot approaches can generate dense reconstructions but require stationary scenes. In contras...
详细信息
ISBN:
(纸本)9781479943098
Active illumination based methods have a trade-off between acquisition time and resolution of the estimated 3D shapes. Multi-shot approaches can generate dense reconstructions but require stationary scenes. In contrast, singleshot methods are applicable to dynamic objects but can only estimate sparse reconstructions and are sensitive to surface texture. In this work, we develop a single-shot approach to produce dense reconstructions of highly textured objects. The key to our approach is an image decomposition scheme that can recover the illumination and the texture images from their mixed appearance. Despite the complex appearances of the illuminated textured regions, our method can accurately compute per pixel warps from the illumination pattern and the texture template to the observed image. The texture template is obtained by interleaving the projection sequence with an all-white pattern. Our estimated warping functions are reliable even with infrequent interleaved projection. Thus, we obtain detailed shape reconstruction and dense motion tracking of the textured surfaces. We validate the approach on synthetic and real data containing subtle non-rigid surface deformations.
Estimating the position of a 3-dimensional world point given its 2-dimensional projections in a set of images is a key component in numerous computervision systems. There are several methods dealing with this problem...
详细信息
ISBN:
(纸本)9781479943098
Estimating the position of a 3-dimensional world point given its 2-dimensional projections in a set of images is a key component in numerous computervision systems. There are several methods dealing with this problem, ranging from sub-optimal, linear least square triangulation in two views, to finding the world point that minimized the L-2-reprojection error in three views. This leads to the statistically optimal estimate under the assumption of Gaussian noise. In this paper we present a solution to the optimal triangulation in three views. The standard approach for solving the three-view triangulation problem is to find a closed-form solution. In contrast to this, we propose a new method based on an iterative scheme. The method is rigorously tested on both synthetic and real image data with corresponding ground truth, on a midrange desktop PC and a Raspberry Pi, a low-end mobile platform. We are able to improve the precision achieved by the closed-form solvers and reach a speed-up of two orders of magnitude compared to the current state-of-the-art solver. In numbers, this amounts to around 300K triangulations per second on the PC and 30K triangulations per second on Raspberry Pi.
暂无评论