Background subtraction is a basic problem for change detection in videos and also the first step of high-level computervision applications. Most background subtraction methods rely on color and texture feature. Howev...
详细信息
ISBN:
(纸本)9781509014378
Background subtraction is a basic problem for change detection in videos and also the first step of high-level computervision applications. Most background subtraction methods rely on color and texture feature. However, due to illuminations changes in different scenes and affections of noise pixels, those methods often resulted in high false positives in a complex environment. To solve this problem, we propose an adaptive background subtraction model which uses a novel Local SVD Binary pattern (named LSBP) feature instead of simply depending on color intensity. this feature can describe the potential structure of the local regions in a given image, thus, it can enhance the robustness to illumination variation, noise, and shadows. We use a sample consensus model which is well suited for our LSBP feature. Experimental results on CDnet 2012 dataset demonstrate that our background subtraction method using LSBP feature is more effective than many state-of-the-art methods.
Automatic facial expression recognition (FER) is an important component of affect-aware technologies. Because of the lack of labeled spontaneous data, majority of existing automated FER systems were trained on posed f...
详细信息
ISBN:
(纸本)9781509014378
Automatic facial expression recognition (FER) is an important component of affect-aware technologies. Because of the lack of labeled spontaneous data, majority of existing automated FER systems were trained on posed facial expressions;however in real-world applications we deal with (subtle) spontaneous facial expression. this paper introduces an extension of DISFA, a previously released and well-accepted face dataset. Extended DISFA (DISFA+) has the following features: 1) it contains a large set of posed and spontaneous facial expressions data for a same group of individuals, 2) it provides the manually labeled frame-based annotations of 5-level intensity of twelve FACS facial actions, 3) it provides meta data (i.e. facial landmark points in addition to the self-report of each individual regarding every posed facial expression). this paper introduces and employs DISFA+, to analyze and compare temporal patterns and dynamic characteristics of posed and spontaneous facial expressions.
this paper proposes the use of multiple low-cost visual sensors to obtain a surround view of the ego-vehicle for semantic understanding. A multi-perspective view will assist the analysis of naturalistic driving studie...
详细信息
ISBN:
(纸本)9781509014378
this paper proposes the use of multiple low-cost visual sensors to obtain a surround view of the ego-vehicle for semantic understanding. A multi-perspective view will assist the analysis of naturalistic driving studies (NDS), by automating the task of data reduction of the observed sequences into events. A user-centric vision-based framework is presented using a vehicle detector and tracker in each separate perspective. Multi-perspective trajectories are estimated and analyzed to extract 14 different events, including potential dangerous behaviors such as overtakes and cut-ins. the system is tested on ten sequences of real-world data collected on U.S. highways. the results show the potential use of multiple low-cost visual sensors for semantic understanding around the ego-vehicle.
We describe an FPGA-based on-board control system for autonomous orientation of an aerial robot to assist aerial manipulation tasks. the system is able to apply yaw control to aid an operator to precisely position a d...
详细信息
ISBN:
(纸本)9781509014378
We describe an FPGA-based on-board control system for autonomous orientation of an aerial robot to assist aerial manipulation tasks. the system is able to apply yaw control to aid an operator to precisely position a drone when it is nearby a bar-like object. this is achieved by applying parallel Hough transform enhanced with a novel image space separation method, enabling highly reliable results in various circumstances combined with high performance. the feasibility of this approach is shown by applying the system to a multi-rotor aerial robot equipped with an upward directed robotic hand on top of the airframe developed for high altitude manipulation tasks. In order to grasp a bar-like object, orientation of the bar object is observed from the image data obtained by a monocular camera mounted on the robot. this data is then analyzed by the on-board FPGA system to control yaw angle of the aerial robot. In experiments, reliable yaw-orientation control of the aerial robot is achieved.
Outdoor surveillance systems that involve farfield operations often encounter atmospheric turbulence perturbations due to a series of randomized reflections and refraction effecting incoming light rays. the resulting ...
详细信息
ISBN:
(纸本)9781509014378
Outdoor surveillance systems that involve farfield operations often encounter atmospheric turbulence perturbations due to a series of randomized reflections and refraction effecting incoming light rays. the resulting distortions make it hard to discriminate between true moving objects and turbulence induced motion. Current algorithms are not effective in detecting true moving objects in the scene and also rely on computationally complex warping methods. In this paper, we describe a real time embedded solution connected with traditional cameras to both rectify turbulence distortions and reliably detect and track true moving targets. Our comparisons with other methods shows better turbulence rectification with less false and miss detections. FPGA-DSP based embedded realization of our algorithm achieves nearly 15x speed-up along with lesser memory requirement over a quad core PC implementation. the proposed system is suitable for persistence surveillance systems and optical sight devices.
Person-independent and pose-invariant estimation of eye-gaze is important for situation analysis and for automated video annotation. We propose a fast cascade regression based method that first estimates the location ...
详细信息
ISBN:
(纸本)9781509014378
Person-independent and pose-invariant estimation of eye-gaze is important for situation analysis and for automated video annotation. We propose a fast cascade regression based method that first estimates the location of a dense set of markers and their visibility, then reconstructs face shape by fitting a part-based 3D model. Next, the reconstructed 3D shape is used to estimate a canonical view of the eyes for 3D gaze estimation. the model operates in a feature space that naturally encodes local ordinal properties of pixel intensities leading to photometric invariant estimation of gaze. To evaluate the algorithm in comparison with alternative approaches, three publicly-available databases were used, Boston University Head Tracking, Multi-View Gaze and CAVE Gaze datasets. Precision for head pose and gaze averaged 4 degrees or less for pitch, yaw, and roll. the algorithm outperformed alternative methods in both datasets.
We present the 2016 ChaLearn Looking at People and Faces of the World Challenge and Workshop, which ran three competitions on the common theme of face analysis from still images. the first one, Looking at People, addr...
ISBN:
(纸本)9781509014378
We present the 2016 ChaLearn Looking at People and Faces of the World Challenge and Workshop, which ran three competitions on the common theme of face analysis from still images. the first one, Looking at People, addressed age estimation, while the second and third competitions, Faces of the World, addressed accessory classification and smile and gender classification, respectively. We present two crowd-sourcing methodologies used to collect manual annotations. A custom-build application was used to collect and label data about the apparent age of people (as opposed to the real age). For the Faces of the World data, the citizen-science Zooniverse platform was used. this paper summarizes the three challenges and the data used, as well as the results achieved by the participants of the competitions.
Object proposal has been successfully applied in recent visual object detection approaches and shown improved computational efficiency. the purpose of object proposal is to use as few as regions to cover as many as ob...
详细信息
ISBN:
(纸本)9781509014378
Object proposal has been successfully applied in recent visual object detection approaches and shown improved computational efficiency. the purpose of object proposal is to use as few as regions to cover as many as objects. In this paper, we propose a strategy named Texture Complexity based Redundant Regions Ranking (TCR) for object proposal. Our approach first produces rich but redundant regions using a color segmentation approach, i.e. Selective Search. It then uses Texture Complexity (TC) based on complete contour number and Local Binary pattern (LBP) entropy to measure the objectness score of each region. By ranking based on the TC, it is expected that as many as true object regions are preserved, while the number of the regions is significantly reduced. Experimental results on the PASCAL VOC 2007 dataset show that the proposed TCR significantly improves the baseline approach by increasing AUC (area under recall curve) from 0.39 to 0.48. It also outperforms the state-of-the-art with AUC and uses fewer detection proposals to achieve comparable recall rates.
In this paper, we present two large video multi-modal datasets for RGB and RGB-D gesture recognition: the ChaLearn LAP RGB-D Isolated Gesture Dataset (IsoGD) and the Continuous Gesture Dataset (ConGD). Both datasets a...
详细信息
ISBN:
(纸本)9781509014378
In this paper, we present two large video multi-modal datasets for RGB and RGB-D gesture recognition: the ChaLearn LAP RGB-D Isolated Gesture Dataset (IsoGD) and the Continuous Gesture Dataset (ConGD). Both datasets are derived from the ChaLearn Gesture Dataset (CGD) that has a total of more than 50000 gestures for the "one-shot-learning" competition. To increase the potential of the old dataset, we designed new well curated datasets composed of 249 gesture labels, and including 47933 gestures manually labeled the begin and end frames in sequences. Using these datasets we will open two competitions on the CodaLab platform so that researchers can test and compare their methods for "user independent" gesture recognition. the first challenge is designed for gesture spotting and recognition in continuous sequences of gestures while the second one is designed for gesture classification from segmented data. the baseline method based on the bag of visual words model is also presented.
Deep neural networks usually benefit from unsupervised pre-training, e.g. auto-encoders. However, the classifier further needs supervised fine-tuning methods for good discrimination. Besides, due to the limits of full...
详细信息
ISBN:
(纸本)9781509014378
Deep neural networks usually benefit from unsupervised pre-training, e.g. auto-encoders. However, the classifier further needs supervised fine-tuning methods for good discrimination. Besides, due to the limits of full-connection, the application of auto-encoders is usually limited to small, well aligned images. In this paper, we incorporate the supervised information to propose a novel formulation, namely class-encoder, whose training objective is to reconstruct a sample from another one of which the labels are identical. Class-encoder aims to minimize the intra-class variations in the feature space, and to learn a good discriminative manifolds on a class scale. We impose the class-encoder as a constraint into the softmax for better supervised training, and extend the reconstruction on feature-level to tackle the parameter size issue and translation issue. the experiments show that the class-encoder helps to improve the performance on benchmarks of classification and face recognition. this could also be a promising direction for fast training of face recognition models.
暂无评论