this paper presents a self-supervised learning (SSL) based framework, specifically designed for Handwritten Mathematical Expression Recognition (HMER). the proposed approach incorporates a momentum encoding technique ...
详细信息
ISBN:
(纸本)9798400710759
this paper presents a self-supervised learning (SSL) based framework, specifically designed for Handwritten Mathematical Expression Recognition (HMER). the proposed approach incorporates a momentum encoding technique and a non-linear projection head into the image encoder to effectively address the common issue of dimensional collapse in self-supervised learning (SSL) methods. Our approach consists of two main steps: first, we use self-supervised pre-training to train the image encoder to obtain strong feature representations from HME images. Subsequently, we fine-tune the model using a Transformer network to predict LaTeX sequences from HME images. the assessment demonstrates that our SSL framework surpasses other existing SSL frameworks as well as several supervised methods in terms of performance. the findings indicate that our approach is highly advanced, emphasizing its potential to enhance the robustness and efficiency of feature representations in HMER tasks. the integration of momentum encoding and a non-linear projection head in the image encoder is shown to enhance the durability and effectiveness of feature representations, leading to superior performance in HMER tasks. Our experiments reveal that our approaches achieve an expression recognition rate (ExpRate) of 62. 17%, 61. 03%, 64. 8% on the CROHME 2014,2016,2019 test datasets respectively. the CROHME 2019 test data achieves the highest ExpRate, which is state-of-the-art (SOTA). this success is achieved by overcoming the challenges of dimensional collapse and leveraging the advantages of both self-supervised and supervised learning.(1)
Contextual information plays a critical role in object recognition models within computervision, where changes in context can significantly affect accuracy, underscoring models9; dependence on contextual cues. thi...
详细信息
ISBN:
(纸本)9798400710759
Contextual information plays a critical role in object recognition models within computervision, where changes in context can significantly affect accuracy, underscoring models' dependence on contextual cues. this study investigates how context manipulation influences both model accuracy and feature attribution, providing insights into the reliance of object recognition models on contextual information as understood through the lens of feature attribution methods. We employ a range of feature attribution techniques to decipher the reliance of deep neural networks on context in object recognition tasks. Using the imageNet-9 and our curated imageNet-CS datasets, we conduct experiments to evaluate the impact of contextual variations, analyzed through feature attribution methods. Our findings reveal several key insights: (a) Correctly classified images predominantly emphasize object volume attribution over context volume attribution. (b) the dependence on context remains relatively stable across different context modifications, irrespective of classification accuracy. (c) Context change exerts a more pronounced effect on model performance than Context perturbations. (d) Surprisingly, context attribution in 'no-information' scenarios is non-trivial. Our research moves beyond traditional methods by assessing the implications of broad-level modifications on object recognition, either in the object or its context. Code available at https://***/nineRishav/Lost-In-Context
Existing super-resolution (SR) models require separate training for different scales of SR because of the fixed upsampling levels in their architecture. We propose a novel one-time training approach for multi-scale SR...
详细信息
Advances in networking and digital technologies have led to the widespread usage of Online Signature Verification (OSV) frameworks in real-time settings to validate a user9;s identity. Because of the superior perfo...
详细信息
In this paper, we propose a geometric feature and frame segmentation based approach for video summarization. Video summarization aims to generate a summarized video with all the salient activities of the input video. ...
详细信息
this paper proposes an efficient method of character segmentation for handwritten text. the main challenge in character segmentation of hand-written text is the varied size of each letter in different documents, conne...
详细信息
Detection of falls of elderly people is a trivial yet an immediate problem due to the growing age of the population. this demands the need for autonomous self care systems for providing a quick assistance. the three b...
详细信息
this paper addresses an approach for classification of hyperspectral imagery (HSI). In remote sensing, the HSI sensor acquires hundreds of images with very narrow but continuous spectral width in visible and near-infr...
详细信息
Biometric systems commonly utilize multi-biometric approaches where a person is verified or identified based on multiple biometric traits. However, requiring systems that are deployed usually require verification or i...
详细信息
ISBN:
(数字)9789811513879
ISBN:
(纸本)9789811513862
Biometric systems commonly utilize multi-biometric approaches where a person is verified or identified based on multiple biometric traits. However, requiring systems that are deployed usually require verification or identification from a large number of enrolled candidates. these are possible only if there are efficient methods that retrieve relevant candidates in a multi-biometric system. To solve this problem, we analyze the use of hashing techniques that are available for obtaining retrieval. We specifically based on our analysis recommend the use of supervised hashing techniques over deep learned features as a possible common technique to solve this problem. Our investigation includes a comparison of some of the supervised and unsupervised methods viz. Principal Component Analysis (PCA), Locality Sensitive Hashing (LSH), Locality-sensitive binary codes from shift-invariant kernels (SKLSH), Iterative quantization: A procrustean approach to learning binary codes (ITQ), Binary Reconstructive Embedding (BRE) and Minimum loss hashing (MLH) that represent the prevalent classes of such systems and we present our analysis for the following biometric data: Face, Iris, and Fingerprint for a number of standard datasets. the main technical contributions through this work are as follows: (a) Proposing Siamese network based deep learned feature extraction method (b) Analysis of common feature extraction techniques for multiple biometrics as to a reduced feature space representation (c) Advocating the use of supervised hashing for obtaining a compact feature representation across different biometrics traits. (d) Analysis of the performance of deep representations against shallow representations in a practical reduced feature representation framework. through experimentation with multiple biometrics traits, feature representations, and hashing techniques, we can conclude that current deep learned features when retrieved using supervised hashing can be a standard pipeline adopte
In this work a memory efficient topological map generation algorithm has been proposed using local descriptors. A topological map is a graphical data structure where each node signifies an area within an environment. ...
详细信息
ISBN:
(纸本)9781450366151
In this work a memory efficient topological map generation algorithm has been proposed using local descriptors. A topological map is a graphical data structure where each node signifies an area within an environment. these nodes are connected by links which ensure the presence of a physical path between the pair. Experiments have been conducted with feature descriptors using a vocabulary based approach. these approaches take huge memory and time. To deal withthese a KD-tree based map generation algorithm has been proposed where each node in the tree stores a descriptor and a table of occurrence. this table stores node ids of the locations, where the corresponding descriptor is present. the map generation algorithm is a two-stage algorithm. In the first stage, the visual similarity based position identification is conducted in order to check for loop-closures. It is followed by a corrective step on validating the decision of loop closure, if any. the table of occurrence keeps track of presence of each descriptor. the least occurring descriptors are pruned at regular intervals, making the algorithm memory-efficient. the approach has been experimented with several benchmark datasets.
暂无评论