Introducing features that better represent the visual information of speakers during the speech production is still an open issue that highly affects the quality of the lip-reading and Audio Visual Speech recognition ...
详细信息
ISBN:
(纸本)9781509064540
Introducing features that better represent the visual information of speakers during the speech production is still an open issue that highly affects the quality of the lip-reading and Audio Visual Speech recognition (AVSR) tasks. In this paper, three different types of visual features from both the image-based and model-based ones are investigated inside a professional lip reading task. The simple raw gray level information of the lips Region of Interest (ROI), the geometric representation of lips shape and the Deep Bottle-neck Features (DBNFs) extracted from a 6-layer Deep Auto-encoder Neural Network (DANN) are three valuable feature sets compared while employed for the lip reading purpose. Two different recognition systems, including the conventional GMM-HMM and the state-of-the-art DNN-HMM hybrid, are utilized to perform an isolated and connected digit recognition task. The results indicate that the high level information extracted from deep layers of the lips ROI can represent the visual modality with advantage of "high amount of information in a low dimension feature vector". Moreover, the DBNFs showed a relative improvement with an average of 15.4% in comparison to the shape features and the shape features showed a relative improvement with an average of 20.4% in comparison to the ROI features over the test data.
Camera tracking is an important issue in many computer vision and robotics applications, such as, augmented reality and Simultaneous Localization And Mapping (SLAM). In this paper, a feature-based technique for monocu...
详细信息
ISBN:
(纸本)9781509064540
Camera tracking is an important issue in many computer vision and robotics applications, such as, augmented reality and Simultaneous Localization And Mapping (SLAM). In this paper, a feature-based technique for monocular camera tracking is proposed. The proposed approach is based on tracking a set of sparse features, which are successively tracked in a stream of video frames. In the developed system, camera initially views a chessboard with known cell size for few frames to be enabled to construct initial map of the environment. Thereafter, Camera pose estimation for each new incoming frame is carried out in a framework that is merely working with a set of visible natural landmarks. Estimation of 6-DOF camera pose parameters is performed using a particle filter. Moreover, recovering depth of newly detected landmarks, a linear triangulation method is used. The proposed method is applied on real world videos and positioning error of the camera pose is less than 3 cm in average that indicates effectiveness and accuracy of the proposed method.
The rapid development of digital technology makes the faster and easier transmission of electronic data with very low cost. Watermarking is one of the methods proposed for data protection in the way that information i...
详细信息
ISBN:
(纸本)9781509064540
The rapid development of digital technology makes the faster and easier transmission of electronic data with very low cost. Watermarking is one of the methods proposed for data protection in the way that information is embedded in the image without reducing image quality, but watermark may appear against different attacks. There are different ways for dealing with attacks. This article is aimed to make the watermarking image robust against the crop attack in the spatial domain by Least Significant Bit (LSB) of the image. First, in this algorithm, the host image and the algorithm of creating and solving Sudoku have been called and stored in an image form. Then, applying the XOR function on the created Sudoku and on the bits of the host image produced a robust watermark image. If the watermarked image is exposed to crop attack and partially destroyed, through recalling an algorithm to solve Sudoku, it will be recovered, and finally, the cropped parts of the image will be recovered by recovered Sudoku and the XOR function.
Sentiment analysis (SA) is a subfield of natural language processing and data mining which concerns the problem of extracting useful information from users39; comments on the Web. Although researchers have been stud...
详细信息
ISBN:
(纸本)9781509064540
Sentiment analysis (SA) is a subfield of natural language processing and data mining which concerns the problem of extracting useful information from users' comments on the Web. Although researchers have been studying different problems in SA for more than one decade, most studies concentrate on English and languages like Persian have not received the attention they deserved. Resource scarcity for assessing sentiment analysis studies is the main limiting factor in Persian. This paper addresses the problem of resource scarcity by introducing two new resources;a sentence-level dataset for sentiment analysis in Persian, SPerSent and a new Persian lexicon, CNRC. SPerSent contains 150000 sentences, each associated with two labels;a binary label indicating the polarity of the sentence, and a five-star rating. These labels are obtained automatically using a lexicon-based method. Specifically, three lexicons are used independently to label each sentence. Then, the majority voting and average methods are used to aggregate the results for polarity and five-star labels, respectively. Finally, a well-known machine learning method, Naive Bayes, is used to evaluate the SPerSent.
Specific characteristics of the functional near infrared spectroscopy (fNIRS) of the hemodynamic response may represent the brain cortical activity levels during mental arithmetic tasks. In this paper, we use hemodyna...
详细信息
ISBN:
(纸本)9781509064540
Specific characteristics of the functional near infrared spectroscopy (fNIRS) of the hemodynamic response may represent the brain cortical activity levels during mental arithmetic tasks. In this paper, we use hemodynamic response signals of the prefrontal cortex, acquired by a 4-channel fNIRS system to identify the difficulty level of an arithmetic task. To this end, twelve temporal features and several classification methods are used. In addition, most discriminating features are identified by principle component analysis (PCA) method. Experimental results show that the highest accuracy rate of 92.2% is achieved by a linear Support Vector Machine (SVM) classifier. They also show that skewness and total area of the signal from the 3 cm channel on the left prefrontal lobe are the most discriminating features.
Kernel principal component analysis (PCA) generalizes linear PCA to high-dimensional feature spaces, related to input space by some nonlinear map. One can efficiently compute principal components via an eigen-decompos...
详细信息
Complementary role of computer assisted models using machine learning methods in medical imaging has been a center of attention in recent years. Shape analysis of the brain structures can be used to evaluate their abn...
详细信息
ISBN:
(纸本)9781509064540
Complementary role of computer assisted models using machine learning methods in medical imaging has been a center of attention in recent years. Shape analysis of the brain structures can be used to evaluate their abnormalities and deformations, specifically in patients suffering from neurological diseases like epilepsy, Alzheimer, and Parkinson. We propose an automatic diagnosis and lateralization algorithm using Signed Poisson Mapping (SPoM), which has been recently proposed as a new framework for shape analysis of three-dimensional (3D) structures. In contrast to previous studies, we use a three-class classification to show the robustness of our algorithm in differentiating between normal, left temporal lobe epilepsy (LTLE), and right temporal lobe epilepsy (RTLE) subjects. We also use a support vector machine (SVM) classifier with a radial basic function (RBF) kernel for lateralization, i.e., differentiating between RTLE and LTLE patients. The classification accuracy for the three-class classifier is 94% and for the lateralization task is 95% which is superior to those reported in the related literature.
Biometrics has been widely used in the last decades for security purposes and for increasing the confidence of people in the new informational systems. The present paper presents a new analysis and encoding method of ...
详细信息
ISBN:
(纸本)9781538610381
Biometrics has been widely used in the last decades for security purposes and for increasing the confidence of people in the new informational systems. The present paper presents a new analysis and encoding method of dorsal hand vein patterns, for biometric recognition. Two multiresolution approaches, Discrete Wavelet Transform and Riesz Wavelet Transform, are firstly applied to extract directional image features. The resulted coefficients are encoded based on an ordinal procedure, namely Local Line Binary pattern.
For diagnosis of infertility in men semen analysis is conducted in which sperm morphology i.e. the size and shape of the sperm, is one of the factors that are evaluated. Since manual assessment of sperm morphology is ...
详细信息
ISBN:
(纸本)9781509064540
For diagnosis of infertility in men semen analysis is conducted in which sperm morphology i.e. the size and shape of the sperm, is one of the factors that are evaluated. Since manual assessment of sperm morphology is time consuming and subjective, automatic classification methods are being developed. Automatic classification of sperm heads is a complicated task due to the "within class" differences and "between class" similarities. To automatically classify the sperms, appropriate features should be extracted from their microscopic images. In this research, a set of previously proposed features is extracted and examined in an automatic framework in order to evaluate their discriminating capacity in classifying sperms into four classes of shapes (Normal, Tapered, Pyriform and Amorphous). Also, a new set of features called elliptic features is proposed and added to the original features to improve the classification results. Both sets of features are used with Linear Discriminant analysis (LDA) classifier. It is shown that adding these new features, significantly improves the discrimination between those classes of sperm shapes.
Recent studies in the field of multi-modal data fusion demonstrate that considering the prior spatial structure and groupness information available from many years of intensive neuroimaging research can increase inter...
详细信息
ISBN:
(纸本)9781509064540
Recent studies in the field of multi-modal data fusion demonstrate that considering the prior spatial structure and groupness information available from many years of intensive neuroimaging research can increase interpretability and detection accuracy of the hidden phenomena within multi-modal datasets. Although recent functional neuroimaging data analyses indicate that certain brain regions may participate in multiple groups (networks), the extent of group overlap is never considered in data fusion frameworks. To address this issue, here we propose a group-structured sparse canonical correlation analysis (gssCCA) method by employing groupness and sparsity constraints in a unified fusion framework. To investigate the performance of the proposed algorithm, we compare the effect of overlapping and disjoint gssCCA in a simulation study, to evaluate their ability in detecting multi-modal data associations. The results demonstrate that considering an overlapping group-structure constraint can increase the sensitivity and specificity of the true associations between multi-modal datasets.
暂无评论