this paper presents a self-supervised learning (SSL) based framework, specifically designed for Handwritten Mathematical Expression Recognition (HMER). the proposed approach incorporates a momentum encoding technique ...
详细信息
ISBN:
(纸本)9798400710759
this paper presents a self-supervised learning (SSL) based framework, specifically designed for Handwritten Mathematical Expression Recognition (HMER). the proposed approach incorporates a momentum encoding technique and a non-linear projection head into the image encoder to effectively address the common issue of dimensional collapse in self-supervised learning (SSL) methods. Our approach consists of two main steps: first, we use self-supervised pre-training to train the image encoder to obtain strong feature representations from HME images. Subsequently, we fine-tune the model using a Transformer network to predict LaTeX sequences from HME images. the assessment demonstrates that our SSL framework surpasses other existing SSL frameworks as well as several supervised methods in terms of performance. the findings indicate that our approach is highly advanced, emphasizing its potential to enhance the robustness and efficiency of feature representations in HMER tasks. the integration of momentum encoding and a non-linear projection head in the image encoder is shown to enhance the durability and effectiveness of feature representations, leading to superior performance in HMER tasks. Our experiments reveal that our approaches achieve an expression recognition rate (ExpRate) of 62. 17%, 61. 03%, 64. 8% on the CROHME 2014,2016,2019 test datasets respectively. the CROHME 2019 test data achieves the highest ExpRate, which is state-of-the-art (SOTA). this success is achieved by overcoming the challenges of dimensional collapse and leveraging the advantages of both self-supervised and supervised learning.(1)
In this paper, a complete database of handwritten atomic Odia characters is suggested. the first version of the database has been modeled and named OHCSv1.0 (Odia handwritten character set). the database comprises of ...
详细信息
ISBN:
(纸本)9781467385640
In this paper, a complete database of handwritten atomic Odia characters is suggested. the first version of the database has been modeled and named OHCSv1.0 (Odia handwritten character set). the database comprises of 17,100 transcribed characters, each collected twice from 150 unique people at different point of time. Each character has 300 number of occurrences. the character images are standardized to a size of 6 4 x 6 4 pixels. A novel framework for perceiving transcribed Odia characters from this database has also been proposed. the character images are gathered into various groups in view of their shape components utilizing an incremental spectral clustering algorithm. During testing, affinity of probe character to a cluster is first decided. Subsequently, the trained classifier recognizes the character inside the cluster. Suitable simulation has been carried out to validate the scheme.
Tracking dense features has become one of the most popular methods for human action recognition. Proper descriptors should be used to capture the motion information contained in these trajectories and motion boundary ...
详细信息
ISBN:
(纸本)9781467385640
Tracking dense features has become one of the most popular methods for human action recognition. Proper descriptors should be used to capture the motion information contained in these trajectories and motion boundary histogram (MBH), which encodes velocity information, gives best performance among state of art action recognition descriptors. In this paper, we propose to use a new descriptor, histogram of spatial gradient of acceleration (HSGA) in combination with MBH to describe actions. Our new descriptor combination is based on studies which reveal that acceleration is as important as velocity in motion description. HSGA is computed by taking histogram of orientation of spatial gradient of optical acceleration in a 3D space-time block divided into cells around dense trajectories. Optical acceleration is obtained by taking time derivative of optical flow. this combination of descriptors gave good performance on a variety of data sets. Combining these motion descriptors with a scene descriptor like HOG further improved the recognition accuracy for realistic action datasets.
We present a simple and powerful scheme to allow CSG of implicit surfaces on the GPU. We decompose the boolean expression of surfaces into sum-of-products form. Our algorithm presented in this paper then renders each ...
详细信息
ISBN:
(纸本)9781479915880
We present a simple and powerful scheme to allow CSG of implicit surfaces on the GPU. We decompose the boolean expression of surfaces into sum-of-products form. Our algorithm presented in this paper then renders each product term, sum of products can be automatically by enabling depth test. Our Approximate CSG uses adaptive marching points algorithm for finding ray-surface intersection. Once we find an interval where root exists after root-isolation, this is used for presence of intersection. We perform root-refinement only for the uncomplemented terms in the product. Exact CSG is done by using the discriminant of the ray-surface intersection for the presence of the root. Now we can simply evaluate the product expression by checking all uncomplemented terms should be true and all complemented terms should be false. If our condition is met, we find the maximum of all the roots among uncomplemented terms to be the solution. Our algorithm is linear in the number of terms O(n). We achieve real-time rates for 4-5 terms in the product for approximate CSG. We achieve more than real-time rates for Exact CSG. Our primitives are implicit surfaces so we can achieve fairly complex results with less terms.
Existing approaches on newborn identification focuses on recognizing them using face, inked footprints, and palm prints. While palm and inked footprints are intrusive modalities, face modality suffers from non-coopera...
详细信息
ISBN:
(纸本)9781467385640
Existing approaches on newborn identification focuses on recognizing them using face, inked footprints, and palm prints. While palm and inked footprints are intrusive modalities, face modality suffers from non-cooperative nature of newborns. In this research, we investigate utilization of binocular region for recognizing newborns, as this region is considered to be relatively stable in face biometrics literature. We collect a database consisting of 402 face images pertaining to 50 babies of less than 6 months of age. A set of experiments pertaining to various descriptors, including local binary patterns, dense scale invariant feature transform, and Gabor features, along with subspace learning using principal component analysis, linear discriminant analysis, and independent component analysis. Recognition performance of various approaches are compared with respect to face and binocular modalities. Verification results are reported in terms of Receiver operating characteristics curves respectively. the results show that binocular can outperform face as a modality for newborn recognition.
Development of computer-aided diagnosis (CAD) systems for early detection of the pathological brain is essential to save medical resources. In recent years, a variety of techniques have been proposed to upgrade the sy...
详细信息
ISBN:
(纸本)9781467385640
Development of computer-aided diagnosis (CAD) systems for early detection of the pathological brain is essential to save medical resources. In recent years, a variety of techniques have been proposed to upgrade the system's performance. In this paper, a new automatic CAD system for brain magnetic resonance (MR) image classification is proposed. the method utilizes two-dimensional discrete wavelet transform to extract features from the MR images. the dimension of the features have been reduced using principal component analysis (PCA) and linear discriminant analysis (LDA), to obtain the more significant features. Finally, the reduced set of features are applied to the random forests classifier to determine the normal or pathological brain. A standard dataset, Dataset-255 of 255 images (35 normal and 220 pathological) is used for the validation of the proposed scheme. To improve the generalization capability of the scheme, 5-fold stratified cross-validation procedure is utilized. the results of the experiments reveal that the proposed scheme is superior to other state-of-the-art techniques in terms of classification accuracy with substantially reduced number of features.
Skin colour detection under poor or varying illumination condition is a big challenge for various imageprocessing and human-computer interaction applications. In this paper, a novel skin detection method utilizing im...
详细信息
ISBN:
(纸本)9781450347532
Skin colour detection under poor or varying illumination condition is a big challenge for various imageprocessing and human-computer interaction applications. In this paper, a novel skin detection method utilizing image pixel distribution in a given colour space is proposed. the pixel distribution of an image can provide a better localization of the actual skin colour distribution of an image. Hence, a local skin distribution model (LSDM) is derived using the image pixel distribution model and its similarity withthe global skin distribution model (GSDM). Finally, a fusion-based skin model is obtained using boththe GSDM and the LSDM. Subsequently, a dynamic region growing method is employed to improve the overall detection rate. Experimental results show that proposed skin detection method can significantly improve the detection accuracy in presence of varying illumination conditions.
In this paper we address the problem of unsupervised learning of usual patterns of activities in an area under surveillance and detecting deviant patterns. We use video epitomes for segmenting foreground objects from ...
详细信息
ISBN:
(纸本)9781424442195
In this paper we address the problem of unsupervised learning of usual patterns of activities in an area under surveillance and detecting deviant patterns. We use video epitomes for segmenting foreground objects from background and obtain an approximate shape, trajectory and temporal information in the form of space-time patches. We apply pLSA for finding correlations among these patches to learn usual activities in the scene. We also extend pLSA to classify a novel video as usual or unusual.
In this paper, a real time multi-view human activity recognition model using a RGB-D (Red Green BlueDepth) sensor is proposed. the method receives as input RGBD data streams in real time from a Kinect for Windows V2 s...
详细信息
ISBN:
(纸本)9781467385640
In this paper, a real time multi-view human activity recognition model using a RGB-D (Red Green BlueDepth) sensor is proposed. the method receives as input RGBD data streams in real time from a Kinect for Windows V2 sensor. Initially, a skeleton-tracking algorithm is applied which gives 3D joint information of 25 unique joints. the presented approach uses a weighted version of the Fast Dynamic Time Warping that weighs the importance of each skeleton joint towards the Dynamic Time Warping (DTW) similarity cost. To recognize multi-view human activities, the weighted Dynamic TimeWarping warps a time sequence of joint positions to reference time sequences and produces a similarity value. Experimental results demonstrate that the proposed method is robust, flexible and efficient with respect to multiple views activity recognition, scale and phase variations activities at different realistic scenes.
Dictionary learning has been used to solve inverse problems in imaging and as an unsupervised feature extraction tool in vision. the main disadvantage of dictionary learning for applications in vision is the relativel...
详细信息
ISBN:
(纸本)9781450347532
Dictionary learning has been used to solve inverse problems in imaging and as an unsupervised feature extraction tool in vision. the main disadvantage of dictionary learning for applications in vision is the relatively long feature extraction time during testing;owing to the requirement of solving an iterative optimization problem (10-minimization). the newly developed analysis framework of transform learning does not suffer from this shortcoming;feature extraction only requires a matrix vector multiplication. this work proposes an alternate formulation for transform learning that improves the accuracy even further. Experiments on benchmark databases show that our proposed transform learning yields results better than dictionary learning, autoencoder (AE) and restricted Boltzmann machine (RBM). the feature extraction time is fast as AE and RBM.
暂无评论