In this paper, we study methods for learning classifiers for the case when there is a variation introduced by an underlying continuous parameter θ representing transformations like blur, pose, time, etc. First, we co...
详细信息
ISBN:
(纸本)1595930361
In this paper, we study methods for learning classifiers for the case when there is a variation introduced by an underlying continuous parameter θ representing transformations like blur, pose, time, etc. First, we consider the task of learning dictionary-based representation for such cases. Sparse representations driven by data-derived dictionaries have produced state-of-the-art results in various image restoration and classification tasks. While significant advances have been made in this direction, most techniques have focused on learning a single dictionary to represent all variations in the data. In this paper, we show that dictionary learning can be significantly improved by explicitly parameterizing the dictionaries for θ. We develop an optimization framework to learn parametric dictionaries that vary smoothly with θ. We propose two optimization approaches, (a) least squares approach, and (b) the regularized K-SVD approach. Furthermore, we analyze the variations in data induced by θ from a different yet related perspective of feature augmentation. Specifically, we extend the feature augmentation technique proposed for adaptation of discretely separable domains to continuously varying domains, and propose a Mercer kernel to account for such changes. We present experimental validation of the proposed techniques using both synthetic and real datasets. Copyright 2014 ACM.
Abnormality detection in crowded scenes plays a very important role in automatic monitoring of surveillance feeds. Here we present a novel framework for abnormality detection in crowd videos. the key idea of the appro...
详细信息
ISBN:
(纸本)1595930361
Abnormality detection in crowded scenes plays a very important role in automatic monitoring of surveillance feeds. Here we present a novel framework for abnormality detection in crowd videos. the key idea of the approach is that rarely or sparsely occurring events correspond to abnormal activities while the commonly occurring events correspond to the normal activities. Given an input video, multiple feature matrices are computed which are decomposed into their low-rank and sparse components, out of which the sparse components correspond to the abnormal activities. the approach does not require any explicit modeling of crowd behavior or training. Localization of the anomalies is obtained as a by-product of the proposed approach by doing an inverse mapping between the entries of the matrix and the pixels in the video frames. the method is very general and can be applied for both sparsely crowded as well as densely crowded scenes and it can be used to detect both global and local abnormalities. Experimental evaluation on two widely used datasets as well as some dense crowd videos downloaded from the web shows the effectiveness of the proposed approach. Comparison with several state-of-the-art crowd abnormality detection approaches show that the proposed method compares well as compared to the other approaches. Copyright is held by the authors.
A high level abstraction of the behavior a moving object can be obtained by analyzing its trajectory. However, traditional trajectories or tracklets are bound by the limitations of the underlying tracking algorithm us...
详细信息
ISBN:
(纸本)1595930361
A high level abstraction of the behavior a moving object can be obtained by analyzing its trajectory. However, traditional trajectories or tracklets are bound by the limitations of the underlying tracking algorithm used. In this paper, we propose a novel idea of detecting anomalous objects amid other moving objects in a video based on its short history. this history is defined as short local trajectory (SLT). the unique approach of generating SLTs from super-pixels belonging to a foreground object that incorporates both spatial and temporal information is the key in detection of anomaly. Additionally, the proposed trajectory extraction is robust across videos having different crowd density, occlusions, etc. Generally the trajectories of persons/objects moving at a particular region under usual conditions has certain fixed characteristics, thus we use Hidden Markov Model (HMM) for capturing the usual trajectory patterns during training. Whereas during detection, the proposed algorithm takes SLTs as observations for each super-pixel and measures its likelihood of being anomaly using the learned HMMs. Furthermore, we compute the spatial consistency measure for each SLT based on the neighboring trajectories. thus, anomaly detected by the proposed approach is highly localized as demonstrated from the experiments conducted on two widely used anomaly datasets, namely UCSD Ped1 and UCSD Ped2. Copyright 2014 ACM.
Histopathological grading of cancer is a measure of the cell appearance in malignant neoplasms. Grading offers an in-sight to the growth of the cancer and helps in developing individual treatment plans. the Nottingham...
详细信息
ISBN:
(纸本)1595930361
Histopathological grading of cancer is a measure of the cell appearance in malignant neoplasms. Grading offers an in-sight to the growth of the cancer and helps in developing individual treatment plans. the Nottingham grading system [12], well known method for invasive breast cancer grading, primarily relies on the mitosis count in histopathological slides. Pathologists manually identify mitotic figures from a few thousand slide images for each patient to determine the grade of the cancer. Mitotic figures are hard to identify as the appearance of the mitotic cells change at different phases of mitosis. So, the manual cancer grading is not only a tedious job but also prone to observer variability. We propose a fast and accurate approach for automatic mitosis detection from histopathological images using an enhanced random forest classifier with weighted random trees. the random trees are assigned a tree penalty and a forest penalty depending on their classification performance at the training phase. the weight of a tree is calculated based on these penalties. the forest is trained through regeneration of population from weighted trees. the input data is classified based on weighted voting from the random trees after several populations. Experiments show at least 11 percent improvement in F1 score on more than 450 histopathological images at ×40 magnification. Copyright 2014 ACM.
Recent methods of bottom-up salient object detection have attempted to either: (i) obtain a probability map with a 'contrast rarity' based functional, formed using low level cues;or (ii) Minimize an objective ...
详细信息
ISBN:
(纸本)1595930361
Recent methods of bottom-up salient object detection have attempted to either: (i) obtain a probability map with a 'contrast rarity' based functional, formed using low level cues;or (ii) Minimize an objective function, to detect the object. Most of these methods fail for complex, natural scenes, such as the PASCAL-VOC challenge dataset which contains images with diverse appearances, illumination conditions, multiple distracting objects and varying scene environments. We thus formulate a novel multi-criteria objective function which captures many dependencies and the scene structure for correct spatial propagation of low-level priors to perform salient object segmentation, in such cases. Our proposed formulation is based on CRF modeling where the minimization is performed using graph cut and the optimal parameters of the objective function are learned using a max-margin framework from the training set, without the use of class labels. Hence the method proposed is unsu-pervised, and works efficiently when compared to the very recent state-of-the art methods of saliency map detection and object proposals. Results, compared using F-measure and intersection-over-union scores, show that the proposed method exhibits superior performance in case of the complex PASCAL-VOC 2012 object segmentation dataset as well as the traditional MSRA-B saliency dataset. Copyright 2014 ACM.
We describe a system for active stabilization of cameras mounted on highly dynamic robots. To focus on careful performance evaluation of the stabilization algorithm, we use a camera mounted on a robotic test platform ...
详细信息
ISBN:
(纸本)1595930361
We describe a system for active stabilization of cameras mounted on highly dynamic robots. To focus on careful performance evaluation of the stabilization algorithm, we use a camera mounted on a robotic test platform that can have unknown perturbations in the horizontal plane, a commonly occurring scenario in mobile robotics. We show that the camera can be effectively stabilized using an inertial sensor and a single additional motor, without a joint position sensor. the algorithm uses an adaptive controller based on a model of the vertebrate Cerebellum for velocity stabilization, with additional drift correction. We have also developed a resolution adaptive retinal slip algorithm that is robust to motion blur. We evaluated the performance quantitatively using another high speed robot to generate repeatable sequences of large and fast movements that a gaze stabilization system can attempt to counteract. thanks to the high-accuracy repeatability, we can make a fair comparison of algorithms for gaze stabilization. We show that the resulting system can reduce camera image motion to about one pixel per frame on average even when the platform is rotated at 200 degrees per second. As a practical application, we also demonstrate how the common task of face detection benefits from active gaze stabilization. Copyright 2014 ACM.
Most work on automatic writer identification relies on hand-writing features defined by humans[6, 4]. these features correspond to basic units such as letters and words of text. Instead of relying on human-defined fea...
详细信息
ISBN:
(纸本)1595930361
Most work on automatic writer identification relies on hand-writing features defined by humans[6, 4]. these features correspond to basic units such as letters and words of text. Instead of relying on human-defined features, we consider here the determination of writing similarity using automatically determined word-level features learnt by a deep neural network. We generalize the problem of writer identification to the definition of a content-irrelevant handwriting similarity. Our method first takes whether two words were written by the same person as a discriminative label for word-level feature training. then, based on word-level features, we define writing similarity between passages. this similarity not only shows the distinction between writing styles of different people, but also the development of style of the same person. Performance with several hidden layers in the neural network are evaluated. the method is applied to determine how a person's writing style changes with time considering a children's writing dataset. the children's handwriting data are annually collected. they were written by children of 2nd, 3rd or 4th grade. Results are given with a whole passage (50 words) of writing over one-year change. As a comparison, similar experiments on a small amount of data using conventional generative model are also given. Copyright 2014 ACM.
In this paper we describe an early version of our system which synthesizes 3D visual speech including tongue and teeth from frontal facial image sequences. this system is developed for 3D Visual Speech Animation (VSA)...
详细信息
ISBN:
(纸本)1595930361
In this paper we describe an early version of our system which synthesizes 3D visual speech including tongue and teeth from frontal facial image sequences. this system is developed for 3D Visual Speech Animation (VSA) using images generated by an existing state-of-the-art image-based VSA system. In fact, the prime motivation for this system is to have a 3D VSA system from limited amount of training data when compared to that required for developing a conventional corpus based 3D VSA system. It consists of two modules. the first module iteratively estimates the 3D shape of the external facial surface for each image in the input sequence. the second module complements the external face with 3D tongue and teeth to complete the perceptually crucial visual speech information. this has the added advantages of 3D visual speech, which are renderability of the face in different poses and illumination conditions and, enhanced visual information of tongue and teeth. the first module for 3D shape estimation is based on the detection of facial landmarks in images. It uses a prior 3D Morphable Model (3D-MM) trained using 3D facial data. For the time being it is developed for a person-specific domain, i.e., the 3D-MM and the 2D facial landmark detector are trained using the data of a single person and tested withthe same person-specific data. the estimated 3D shape sequences are provided as input to the second module along withthe phonetic segmentation. For any particular 3D shape, tongue and teeth information is generated by rotating the lower jaw based on few skin points on the jaw and animating a rigid 3D tongue through keyframe interpolation. Copyright 2014 ACM.
In this paper, we present a novel technique to localize curved multi-script text contained in natural scene video based on Fuzzy Curve Tracing (FCT) of extracted planar surface. In order to read and interpret easily, ...
详细信息
ISBN:
(纸本)1595930361
In this paper, we present a novel technique to localize curved multi-script text contained in natural scene video based on Fuzzy Curve Tracing (FCT) of extracted planar surface. In order to read and interpret easily, text information is usually written on planar surfaces, for instance billboards, walls of buildings, road-signs and banners. this motivated us to detect planar surfaces by fitting a planar model, that is constructed using Random Sample Consensus (RANSAC). It is assumed that the detected planar surface contains text and is segmental from background using Graph Cuts through Markov Random Field (MRF) labeling of pixels belongs to planar surface. Within the extracted planar surface, the curved text is detected using fuzzy curve tracing, which traces and generates curve path of the text by establishing spatial relations among the cluster centers identified through fuzzy c-means clustering of character regions. Finally, curved text is localized by identifying character regions wherever generated curve path pass through it. the experimental results are evaluated for text localization using recall, precision and f-measure. Based on these metrics result, it's incontestible that the projected technique outperforms the popular existing methods. Copyright 2014 ACM.
Schizophrenia is a serious mental illness that requires timely and accurate diagnosis. Functional magnetic resonance imaging (fMRI) helps in identifying variations in activation patterns of schizophrenia patients and ...
详细信息
ISBN:
(纸本)1595930361
Schizophrenia is a serious mental illness that requires timely and accurate diagnosis. Functional magnetic resonance imaging (fMRI) helps in identifying variations in activation patterns of schizophrenia patients and healthy subjects. But, manual diagnosis using fMRI is cumbersome and prone to subjective errors. this has drawn the attention of pattern recognition and computervision research community towards developing a reliable and efficient decision model for computer aided diagnosis (CAD) of schizophrenia. However, high dimensionality and limited availability of fMRI samples leads to curse-of-dimensionality which may deteriorate the performance of a decision model. In this research work, a combination of feature extraction and feature selection techniques is employed to obtain a reduced set of relevant features for differentiating schizophrenia patients from healthy subjects. A general linear model approach is used for feature extraction on pre-processed fMRI data. Further t-test based feature selection is employed to determine a subset of discriminative features which are used for learning a decision model using support vector machine. Experiments are carried out on two balanced and well-age matched datasets (acquired on 1.5 Tesla and 3 Tesla scanners) of auditory odd-ball task derived from a publicly available multisite FBIRN dataset. the performance is evaluated in terms of sensitivity, specificity and classification accuracy, and compared with two well-known existing approaches. Experimental results demonstrate that the proposed model outperforms the two existing approaches in terms of sensitivity, specificity and classification accuracy. Withthe proposed approach, the classification accuracy of 80.9% and 88.0% is achieved for 1.5 Tesla and 3 Tesla datasets respectively. In addition, the brain regions containing the discriminative features are identified which may be used as biomarkers for CAD of schizophrenia using fMRI. Copyright 2014 ACM.
暂无评论