The use of 3D technologies to represent elements and interact with them is an open and interesting research area. In this article we discuss a novel human computer interaction method that integrates mobile computing a...
详细信息
ISBN:
(纸本)9780769549903
The use of 3D technologies to represent elements and interact with them is an open and interesting research area. In this article we discuss a novel human computer interaction method that integrates mobile computing and 3D visualization techniques with applications on free viewpoint visualization and 3D rendering for interactive and realistic environments. Especially this approach is focused on augmented reality and home entertainment and it was developed and tested on mobiles and particularly on tablet computers. Finally, an evaluation mechanism on the accuracy of this interaction system is presented.
The emerging cognitive vision paradigm is concerned with vision systems that evaluate, gather and integrate contextual knowledge for visual analysis. In reasoning about events and structures, cognitive vision systems ...
详细信息
ISBN:
(纸本)0769521584
The emerging cognitive vision paradigm is concerned with vision systems that evaluate, gather and integrate contextual knowledge for visual analysis. In reasoning about events and structures, cognitive vision systems should rely on multiple computations in order to perform robustly even in noisy domains. Action recognition in an unconstrained office environment thus provides an excellent testbed for research on cognitive computervision. In this contribution, we present a system that consists of several computational modules for object and action recognition. It applies attention mechanisms, visual learning and contextual as well as probabilistic reasoning to fuse individual results and verify their consistency. Database technologies are used for information storage and an XML based communication framework integrates all modules into a consistent architecture.
Image-based virtual reality is emerging as a major alternative to the more traditional 3D-based VR. The main advantages of the image-based VR are its photo-quality realism and 3D illusion without any 3D information. U...
详细信息
ISBN:
(纸本)0780342364
Image-based virtual reality is emerging as a major alternative to the more traditional 3D-based VR. The main advantages of the image-based VR are its photo-quality realism and 3D illusion without any 3D information. Unfortunately, creating content for image-based VR is usually a very tedious process. This paper proposes to use a non-perspective fisheye lens to capture the spherical panorama with very few images. Unlike most of camera calibration in computervision, self-calibration of the fisheye lens poses new questions regarding the parameterization of the distortion and wrap-around effects. Because of its unique projection model and large field of view (near 180 degrees), most of the ambiguity problems in self-calibrating a traditional lens can be solved trivially. We demonstrate that with four fisheye lens images, we can seamlessly register them to create the spherical panorama, while self-calibrating its distortion and field of view.
We investigate the use of the L-infinity cost function in geometric vision problems. This cost function measures the maximum of a set of model-fitting errors, rather than the sum-of-squares, or L-2 cost function that ...
详细信息
ISBN:
(纸本)0769521584
We investigate the use of the L-infinity cost function in geometric vision problems. This cost function measures the maximum of a set of model-fitting errors, rather than the sum-of-squares, or L-2 cost function that is commonly used (in least-squares fitting). We investigate its use in two problems;multiview triangulation and motion recovery from omnidirectional cameras, though the results may also apply to other related problems. It is shown that for these problems the L-infinity cost function is significantly simpler than the L-2 Cost. In particular L-infinity minimization involves finding the minimum of a cost function with a single local (and hence global) minimum on a convex parameter domain. The problem may be recast as a constrained minimization problem and solved using commonly available software. The optimal solution was reliably achieved on problems of small dimension.
We present an automotive-grade, real-time, vision-based Driver State Monitor. Upon detecting and tracking the driver's facial features, the system analyzes eye-closures and head pose to infer his/her fatigue or di...
详细信息
ISBN:
(纸本)0769523722
We present an automotive-grade, real-time, vision-based Driver State Monitor. Upon detecting and tracking the driver's facial features, the system analyzes eye-closures and head pose to infer his/her fatigue or distraction. This information is used to warn the driver and to modulate the actions of other safety systems. The purpose of this monitor is to increase road safety by preventing drivers from falling asleep or from being overly distracted, and to improve the effectiveness of other safety systems.
Shadow removal is an important computervision task aiming at the detection and successful removal of the shadow produced by an occluded light source and a photorealistic restoration of the image contents. Decades of ...
详细信息
ISBN:
(纸本)9781665448994
Shadow removal is an important computervision task aiming at the detection and successful removal of the shadow produced by an occluded light source and a photorealistic restoration of the image contents. Decades of research produced a multitude of hand-crafted restoration techniques and, more recently, learned solutions from shadowed and shadow free training image pairs. In this work, we propose a single image shadow removal solution via self-supervised learning by using a conditioned mask. We rely on self-supervision and jointly learn deep models to remove and add shadows to images. We derive two variants for learning from paired images and unpaired images, respectively. Our validation on the recently introduced ISTD and USR datasets demonstrate large quantitative and qualitative improvements over the state-of-the-art for both paired and unpaired learning settings.
Foveated vision and two-mode tracking, as inspired by the human oculomotor system, are often used in active vision system. The purpose of this paper is to provide answers to the following basic questions which arise f...
详细信息
ISBN:
(纸本)0818672587
Foveated vision and two-mode tracking, as inspired by the human oculomotor system, are often used in active vision system. The purpose of this paper is to provide answers to the following basic questions which arise from implementations. First, is it beneficial to have foveated vision and what is the optimal size of the foveal window? Second, is there a need for two control mechanisms (smooth pursuit and saccade) for improved performance and how can one efficiently switch between them? In order to do so, a setup is proposed in which these strategies can be evaluated in a systematic manner. It is shown that the fovea appears as a compromise between the tightness of the tracking specifications and computational constraints. Introducing a model for the later and postulating some a priori knowledge of the target behavior, it is possible to compute the size of the fovea in an optimal way. As a by-product, 'smooth-pursuit' can be defined in a natural way, and the use of a two-mode tracking scheme is justified. The second mode, i.e. 'saccadic control', aims at re-centering the target on the fovea so that the smooth pursuit controller can continue to operate. It is shown that a control strategy can indeed be defined so that this objective can be met under appropriate operating conditions.
We introduce the first benchmark for a new problem - recognizing human action adverbs (HAA): "Adverbs Describing Human Actions" (ADHA). We demonstrate some key features of ADHA: a semantically complete set o...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
We introduce the first benchmark for a new problem - recognizing human action adverbs (HAA): "Adverbs Describing Human Actions" (ADHA). We demonstrate some key features of ADHA: a semantically complete set of adverbs describing human actions, a set of common, describable human actions, and an exhaustive labelling of simultaneously emerging actions in each video. We commit an in-depth analysis on the implementation of current effective models in action recognition and image captioning on adverb recognition, and the results reveal that such methods are unsatisfactory. Furthermore, we propose a novel three-stream hybrid model to tackle the HAA problem, which achieves better performances and receives relatively promising results.
We present a method for computing dense visual correspondence based on general assumptions about scene geometry. Our algorithm does not rely on cor relation, and uses a variable region of support. We assume that image...
详细信息
ISBN:
(纸本)0780342364
We present a method for computing dense visual correspondence based on general assumptions about scene geometry. Our algorithm does not rely on cor relation, and uses a variable region of support. We assume that images consist of a number of connected sets of pixels with the same disparity, which we call disparity components. Using maximum likelihood arguments, at each pixel we compute a small set of plausible disparities. A pixel is assigned a disparity d based on connected components of pixels, where each pixel in a component considers d to be plausible. Our implementation chooses the largest plausible disparity component;however;global contextual constraints can also be applied. While the algorithm was originally designed for visual correspondence, it can also be used for other early vision problems such as image restoration. It runs in a few seconds on traditional benchmark images with standard parameter settings, and gives quite promising results.
Image deblurring and super-resolution (SR) are computervision tasks aiming to restore image detail and spatial scale, respectively. Besides, only a few recent works of literature contribute to this task, as conventio...
详细信息
ISBN:
(纸本)9781665448994
Image deblurring and super-resolution (SR) are computervision tasks aiming to restore image detail and spatial scale, respectively. Besides, only a few recent works of literature contribute to this task, as conventional methods deal with SR or deblurring separately. We focus on designing a novel Pixel-Guided dual-branch attention network (PDAN) that handles both tasks jointly to address this issue. Then, we propose a novel loss function better focus on large and medium range errors. Extensive experiments demonstrated that the proposed PDAN with the novel loss function not only generates remarkably clear HR images and achieves compelling results for joint image deblurring and SR tasks. In addition, our method achieves second place in NTIRE 2021 Challenge on track 1 of the Image Deblurring Challenge.
暂无评论