The determination of ethnicity of an individual, as a soft biometrics, can be very useful in a video-based surveillance system. Currently, face is commonly used to determine the ethnicity of a person. Up to now, gait ...
详细信息
The determination of ethnicity of an individual, as a soft biometrics, can be very useful in a video-based surveillance system. Currently, face is commonly used to determine the ethnicity of a person. Up to now, gait has been used for individual recognition and gender classification but not for ethnicity determination. This paper focuses on the ethnicity determination based on fusion of multi-view gait. Gait Energy Image (GEI) is used to analyze the recognition power of gait for ethnicity. Feature fusion, score fusion and decision fusion from multiple views of gait are explored. For the feature fusion, GEI images and camera views are put together to render a third-order tensor (x; y; view). A multilinear principal component analysis (MPCA) is used to extract features from tensor objects which integrate all views. For the score fusion, the similarity scores measured from single views are combined with a weighted SUM rule. For the decision fusion, ethnicity classification is realized on each individual view first. The classification results are then combined to make the final determination with a majority vote rule. A database of 36 walking people (East Asian and South American) was acquired from 7 different camera views. The experimental results show that ethnicity can be determined from human gait in video automatically. The classification rate is improved by fusing multiple camera views and a comparison among different fusion schemes shows that the MPCA based feature fusion performs the best.
Image fusion is a significant problem in many fields including digital photography, computational imaging and remote sensing, to name but a few. Recently, deep learning has emerged as an important tool for image fusio...
Image fusion is a significant problem in many fields including digital photography, computational imaging and remote sensing, to name but a few. Recently, deep learning has emerged as an important tool for image fusion. This paper presents CSCFuse, which contains three deep convolutional sparse coding (CSC) networks for three kinds of image fusion tasks (i.e., infrared and visible image fusion, multi-exposure image fusion, and multi-spectral image fusion). The CSC model and the iterative shrinkage and thresholding algorithm are generalized into dictionary convolution units. As a result, all hyper-parameters are learned from data. Our extensive experiments and comprehensive comparisons reveal the superiority of CSCF use with regard to quantitative evaluation and visual inspection.
Many advancements of mobile cameras aim to reach the visual quality of professional DSLR cameras. Great progress was shown over the last years in optimizing the sharp regions of an image and in creating virtual portra...
Many advancements of mobile cameras aim to reach the visual quality of professional DSLR cameras. Great progress was shown over the last years in optimizing the sharp regions of an image and in creating virtual portrait effects with artificially blurred backgrounds. Bokeh is the aesthetic quality of the blur in out-of-focus areas of an image. This is a popular technique among professional photographers, and for this reason, a new goal in computational photography is to optimize the Bokeh effect *** paper introduces EBokehNet, a efficient state-of-the-art solution for Bokeh effect transformation and rendering. Our method can render Bokeh from an all-in-focus image, or transform the Bokeh of one lens to the effect of another lens without harming the sharp foreground regions in the image. Moreover we can control the shape and strength of the effect by feeding the lens properties i.e. type (Sony or Canon) and aperture, into the neural network as an additional input. Our method is a winning solution at the NTIRE 2023 Lens-to-Lens Bokeh Effect Transformation Challenge, and state-of-the-art at the EBB benchmark.
A pair of stereo images are said to be rectified if corresponding image points have the same y-coordinate in their respective images. In this paper we consider the rectification of two omnidirectional cameras, specifi...
详细信息
A pair of stereo images are said to be rectified if corresponding image points have the same y-coordinate in their respective images. In this paper we consider the rectification of two omnidirectional cameras, specifically two parabolic catadioptric cameras. Such systems consist of a parabolic mirror and an orthographically projecting lens. We show that if the image coordinates are represented as a point z in the complex plane, then the rectification is specified by coth -1z. This rectification is shown to be conformal, in that it is locally distortionless, and furthermore, it is unique up to scale and transformation. We show an experiment in which two real images have been rectified and a stereo matching performed.
An algorithm for tracking a single visual target is presented where visual features of target are tracked in a group tracking method. A single visual target is regarded as a group of multiple targets. The model develo...
详细信息
An algorithm for tracking a single visual target is presented where visual features of target are tracked in a group tracking method. A single visual target is regarded as a group of multiple targets. The model developed allows some variations on the relative positions of the features which corresponds to some sort of non-rigidity. The approach is a combination of PDA and JPDA algorithms and using the advantages of group motion model this approach enables accurate tracking of single targets.
Selecting regions in an image likely to come from a single object is important for reducing the amount of searching involved in object recognition. Such selections can be purely based on image data (data-driven), or b...
详细信息
Selecting regions in an image likely to come from a single object is important for reducing the amount of searching involved in object recognition. Such selections can be purely based on image data (data-driven), or based on the knowledge of the model object (model-driven). In this paper, we present methods for data- and model-driven selection by grouping closely-spaced parallel lines in images. Data-driven selection is achieved by selecting salient line groups that emphasize the likelihood of the groups coming from single objects. Model-driven selection is achieved by selectively generating image line groups that are likely to be the projections of the model groups, taking into account the effect of occlusions, illumination changes and imaging errors. We also present results that indicate a vast improvement in the search performance of a recognition system that is integrated with parallel fine group-based selection.< >
T3D face reconstruction from a single 2D image is mathematically ill-posed. However, to solve ill-posed problems in the area of computervision, a variety of methods has been proposed; some of the solutions are to est...
详细信息
T3D face reconstruction from a single 2D image is mathematically ill-posed. However, to solve ill-posed problems in the area of computervision, a variety of methods has been proposed; some of the solutions are to estimate latent information or to apply model based approaches. In this paper, we propose a novel method to reconstruct a 3D face from a single 2D face image based on pose estimation and a deformable model of 3D face shape. For 3D face reconstruction from a single 2D face image, it is the first task to estimate the depth lost by 2D projection of 3D faces. Applying the EM algorithm to facial landmarks in a 2D image, we propose a pose estimation algorithm to infer the pose parameters of rotation, scaling, and translation. After estimating the pose, much denser points are interpolated between the landmark points by a 3D deformable model and barycentric coordinates. As opposed to previous literature, our method can locate facial feature points automatically in a 2D facial image. Moreover, we also show that the proposed method for pose estimation can be successfully applied to 3D face reconstruction. Experiments demonstrate that our approach can produce reliable results for reconstructing photorealistic 3D faces.
In this paper, we present techniques for automated understanding of tutor-student behavior through detecting visual deictic gestures, in the context of one-to-one mathematics tutoring. To the best knowledge of the aut...
详细信息
In this paper, we present techniques for automated understanding of tutor-student behavior through detecting visual deictic gestures, in the context of one-to-one mathematics tutoring. To the best knowledge of the authors, this is the first work in the area of intelligent tutoring systems, which focuses on spatial localization of deictic gestural activity, i.e. where the deictic gesture is pointing on the workspace. A new dataset called SDMATH is first introduced. The motivation for detecting deictic gestures and their spatial properties is established, followed by techniques for automatic localization of deictic gestures in a workspace. The techniques employ computervision and machine learning steps such as GBVS saliency, binary morphology and HOG-SVM classification. It is shown that the method localizes the deictic tip with an accuracy of over 85 % accuracy for a cut off distance of 12 pixels. Furthermore, a detailed discussion using examples from the proposed dataset is presented on high-level inferences about the student-tutor interactions that can be derived from the integration of spatial and temporal localization of the deictic gestural activity using the proposed techniques.
In this paper, we provide a detailed description on our submitted method Kattolab to Workshop and Challenge on Learned Image Compression (CLIC) 2020. Our method mainly incorporates discretized Gaussian Mixture Likeli-...
详细信息
ISBN:
(数字)9781728193601
ISBN:
(纸本)9781728193618
In this paper, we provide a detailed description on our submitted method Kattolab to Workshop and Challenge on Learned Image Compression (CLIC) 2020. Our method mainly incorporates discretized Gaussian Mixture Likeli-hoods to previous state-of-the-art learned compression algorithms. Besides, we also describes the acceleration strategies and bit optimization with the low-rate constraint. Experimental results have demonstrated that our approach Kattolab achieves 0.9761 in terms of MS-SSIM at the rate constraint of 0.15 bpp during the validation phase.
We address the problem of 2D-3D pose estimation in difficult viewing conditions, such as low illumination, cluttered background, and large highlights and shadows that appear on the object of interest. In such challeng...
详细信息
暂无评论