In this paper we propose a novel framework for action recognition based on multiple features for improve action recognition in videos. The fusion of multiple features is important for recognizing actions as often a si...
详细信息
ISBN:
(纸本)9781424475421
In this paper we propose a novel framework for action recognition based on multiple features for improve action recognition in videos. The fusion of multiple features is important for recognizing actions as often a single feature based representation is not enough to capture the imaging variations (view-point, illumination etc.) and attributes of individuals (size, age, gender etc.). Hence, we use two kinds of features: i) a quantized vocabulary of local spatio-temporal (ST) volumes (cuboids and 2-D SIFT), and ii) the higher-order statistical models of interest points, which aims to capture the global information of the actor. We construct video representation in terms of local space-time features and global features and integrate such representations with hyper-sphere multi-class SVM. Experiments on publicly available datasets show that our proposed approach is effective. An additional experiment shows that using both local and global features provides a richer representation of human action when compared to the use of a single feature type.
In order to improve the classifier performance in semantic image annotation, we propose a novel method which adopts learning vector quantization (LVQ) technique to optimize low level feature data extracted from given ...
详细信息
In order to improve the classifier performance in semantic image annotation, we propose a novel method which adopts learning vector quantization (LVQ) technique to optimize low level feature data extracted from given image. Some representative vectors are selected with LVQ to train support vector machine (SVM) classifier instead of using all feature data. Performance is compared between the methods with and without feature data optimization when SVM is applied to semantic image annotation. Experiment results show that the proposed method has a better performance than that without using LVQ technique.
In automatic image annotation, it is often extracting low-level visual features from original image for the purpose of mapping to high level image semantic information. In this paper, we propose a novel method which i...
详细信息
In automatic image annotation, it is often extracting low-level visual features from original image for the purpose of mapping to high level image semantic information. In this paper, we propose a novel method which integrates kernel independent component analysis (KICA) and support vector machine (SVM) for analyzing the semantic information of natural images. KICA, which contains a nonlinear kernel mapping component, is adopted to extract low-level features from the original image data. Then these feature vectors are mapped to high-level semantic words using SVM to annotate images with labels in a given semantic label set. Comparative studies have done for the performance of KICA with traditional color histogram and discrete cosine transform features. The experimental results show that the proposed method is capable of extracting the components of images as key features, and with these features to map into semantic categories, higher accuracy is achieved.
Unlike most previous manifold-based data classification algorithms assume that all the data points are on a single manifold, we expect that data from different classes may reside on different manifolds of possible dif...
详细信息
ISBN:
(纸本)9781424475421
Unlike most previous manifold-based data classification algorithms assume that all the data points are on a single manifold, we expect that data from different classes may reside on different manifolds of possible different dimensions. Therefore, better classification accuracy would be achieved by modeling the data by multiple manifolds each corresponding to a class. To this end, a general framework for data classification on multiple manifolds is presented. The manifolds are firstly learned for each class separately, and a stochastic optimization algorithm is then employed to get the near optimal dimensionality of each manifold from the classification viewpoint. Then, classification is performed under a newly defined minimum reconstruction error based classifier. Our method could be easily extended by involving various manifold learning methods and searching strategies. Experiments on both synthetic data and databases of facial expression images show the effectiveness of the proposed multiple manifold based approach.
In classification of multi-source remote sensing image, it is usually difficult to obtain higher classification accuracy. In the previous work, the modeling technique for the remote sensing image classification based ...
In classification of multi-source remote sensing image, it is usually difficult to obtain higher classification accuracy. In the previous work, the modeling technique for the remote sensing image classification based on the minimum description length (MDL) principle with mixture model is analyzed theoretically. In this work, experimental studies are performed for investigating the modeling technique. With intensive experiments and sophisticated analysis, it is found that the developed modeling technique can build a robust classification system, which can avoid classifier over-fitting training data and make the learning process trade-off between bias and variance. Meanwhile, designed mixture model is more efficient to represent real multi-source remote sensing images compared to single model.
Based on the human auditory system for spatial localization theory, we proposed a spatial localization of multiple sound sources using a spherical robot head. Space sound vectors recorded by a microphone array with sp...
详细信息
Based on the human auditory system for spatial localization theory, we proposed a spatial localization of multiple sound sources using a spherical robot head. Space sound vectors recorded by a microphone array with spatial configuration, are used to estimate the histograms of spatial arrival time difference vectors by solving the simultaneous equations in different frequency bands. The echo avoidance model based on precedence effect is used to reduce the interference of environment reverberations which provide the strong interference for phase vectors especially in small indoor environments. To integrate spatial cues of different microphone pairs, we propose a mapping method from the correlation between different microphone pairs to a 3D map corresponding to azimuth and elevation of sound sources directions. Experiments indicate that the system provides the distribution of sound source in azimuth-elevation localization, even concurrently in reverberant environments.
An image registration algorithm of digital subtraction angiography (DSA) is proposed based on 3D space-time detection. The DSA image sequence is considered as a 3D space-time sequence. In DSA image sequence, the movem...
详细信息
An image registration algorithm of digital subtraction angiography (DSA) is proposed based on 3D space-time detection. The DSA image sequence is considered as a 3D space-time sequence. In DSA image sequence, the movement of image points is detected for the control points selection and image registration. If the control points allocate in the blood vessels, their gray value will change obviously in the 3D space-time sequence. According to the location of control points, 3D space-time characteristics are used to select control points. Experimental results show that proposed scheme has a good performance in DSA image registration.
In X-ray CT, Beam hardening (BH) effect, which is caused by polychromatic X-ray beam and energy-dependent attenuation coefficients, always introduces cupping and streak artifacts. Most of correction methods can only d...
详细信息
In X-ray CT, Beam hardening (BH) effect, which is caused by polychromatic X-ray beam and energy-dependent attenuation coefficients, always introduces cupping and streak artifacts. Most of correction methods can only deal with beam hardening artifacts for a single material or dual-material object, but fail to correct in case of a multi-material object since the correction complexity and instability increase with the increase of the kinds of materials. In this paper, we proposed a multimaterial BH correction method. A binary Legendre polynomial is adopted to correct BH based on bi-parameter imaging physical model, and the Helgasson-Ludwig consistency condition (H-L consistency condition) is introduced to optimally determine the bi-parameters of all materials. In the simulation experiments showed that the proposed method can suppress the artifacts greatly. The corrected values approach very closely to the ideal ones.
In order to solve the problem of image degradation caused by dust environments, an image degradation model considering multiple scattering factors caused by dust was first established using the first-order multiple sc...
详细信息
This paper investigates the effect of fiducials configuration on target registration error (TRE) and test the accuracy of theoretical model of TRE prediction in image-guided cranio-maxillofacial surgery. The skull spe...
详细信息
This paper investigates the effect of fiducials configuration on target registration error (TRE) and test the accuracy of theoretical model of TRE prediction in image-guided cranio-maxillofacial surgery. The skull specimen is prepared with 20 titanium microscrews placed at defined locations and scanned with a 64-slice spiral computed tomography unit. These markers are separated to registration fiducial group and target fiducial group. An optical tracking system is used to perform skull-to-image registration procedures. Subsequent to each registration, the TRE is calculated by the navigation system. Each configuration have been performed registration 50 times and the average is regarded as TRE of the configuration. The TRE prediction is also calculated for each configuration. The TRE ranges from 0.58mm to 3.88mm, relatively smaller values of TRE may be achieved by placing a majority of fiducials on the maxillary alveolus in proximity about the target and placing a small number on the cranium contralaterally. The TRE values are always larger than the corresponding TRE prediction but there is high correlation between them. The configuration of fiducials is an important factor in minimizing TRE and the TRE prediction is a good guidance for fiducial markers placement.
暂无评论