1 *** Activity Recognition(GAR),which aims to identify activities performed collectively in videos,has gained significant attention *** conventional action recognition centered on single individuals,GAR explores the c...
详细信息
1 *** Activity Recognition(GAR),which aims to identify activities performed collectively in videos,has gained significant attention *** conventional action recognition centered on single individuals,GAR explores the complex interactions between multiple individuals.
Text Sentiment Classification, a significant task in Natural Language Processing, aims to comprehend user needs and expectations by categorizing the sentiments of texts posted on platforms. Despite their utility, exis...
详细信息
This paper presents the Myanmar Optical Character Recognition (OCR), named myOCR. It utilizes a synthetic text image dataset with 14 different font styles that contains 25,790 text images. The system includes Convolut...
详细信息
ISBN:
(数字)9798331509910
ISBN:
(纸本)9798331509927
This paper presents the Myanmar Optical Character Recognition (OCR), named myOCR. It utilizes a synthetic text image dataset with 14 different font styles that contains 25,790 text images. The system includes Convolutional Neural Networks (CNN) for feature extraction, Bidirectional Long-Short Term Memory (BiLSTM) networks for sequence modeling, and Connectionist Temporal Classification (CTC) for decoding, evaluated across various iterations (3,000, 6,000, 9,000) and hidden states (64, 128, 256). Statistical Post-OCR correction methods involve N(3,4,5)-grams and edit distances with the Symmetric Delete Spelling correction algorithm (SymSpell). For Neural Machine Translation-based correction, BiLSTM and Transformer models are employed, while the mT5-base and mBART-50 models are used for LLM-based correction. The best base (optical) model is the model with 9,000 iterations that achieved a chrF
++
score of over 97.90 and a Word Error Rate (WER) of 9.18%. Transformer correction improved its chrF
++
to 99.31 and reduced the WER to 0.66%.
While significant progress has been made in multi-modal learning driven by large-scale image-text datasets, there is still a noticeable gap in the availab.lity of such datasets within the facial domain. To facilitate ...
详细信息
Recommendation technologies can help users to solve the problem of information overload. In the academic and educational fields, the application of intelligent recommendation technology has largely improved the effect...
详细信息
With the continuous penetration of information technology into scientific research work, information resources with diverse structures have gathered into the scientific research team. Facing the needs of scientific re...
详细信息
Concrete structural crack damage classification is of importance for road safety. This paper proposes a new method based on broad neural network for crack damage classification in concrete structures. It includes thre...
详细信息
In this paper, a double closed-loop control method is proposed for three-dimensional optimal trajectory tracking control of underactuated autonomous underwater vehicles (AUVs), Firstly, a five-degree-of-freedom mathem...
详细信息
With the development of computer vision technology and smart agriculture, deep learning techniques have been widely applied to crop pest identification tasks. However, existing studies do not consider the problem of l...
详细信息
Multivariate Time Series Classification (MTSC) enables the analysis if complex temporal data, and thus serves as a cornerstone in various real-world applications, ranging from healthcare to finance. Since the relation...
详细信息
暂无评论