3D Gaussian Splatting (3DGS) has attracted significant attention for its potential to revolutionize 3D representation, rendering, and interaction. Despite the rapid growth of 3DGS research, its direct application to E...
详细信息
Speaker recognition (SR) systems are particularly vulnerable to adversarial example (AE) attacks. To mitigate these attacks, AE detection systems are typically integrated into SR systems. To overcome the limitations o...
详细信息
Despite recent significant advancements in Handwritten Document Recognition (HDR), the efficient and accurate recognition of text against complex backgrounds, diverse handwriting styles, and varying document layouts r...
详细信息
ISBN:
(数字)9798331510831
ISBN:
(纸本)9798331510848
Despite recent significant advancements in Handwritten Document Recognition (HDR), the efficient and accurate recognition of text against complex backgrounds, diverse handwriting styles, and varying document layouts remains a practical challenge. Moreover, this issue is seldom addressed in academic research, particularly in scenarios with minimal annotated data available. In this paper, we introduce the DocTTT framework to address these challenges. The key innovation of our approach is that it uses test-time training to adapt the model to each specific input during testing. We propose a novel Meta-Auxiliary learning approach that combines Meta-learning and self-supervised Masked Autoencoder (MAE). During testing, we adapt the visual representation parameters using a self-supervised MAE loss. During training, we learn the model parameters using a meta-learning framework, so that the model parameters are learned to adapt to a new input effectively. Experimental results show that our proposed method significantly outperforms existing state-of-the-art approaches on benchmark datasets.
In today's digital world online video lectures have become a crucial tool for learning. Yet, these resources aren't available to a big part of the world's population those who are deaf or hard of hearing. ...
详细信息
ISBN:
(数字)9798331511890
ISBN:
(纸本)9798331511906
In today's digital world online video lectures have become a crucial tool for learning. Yet, these resources aren't available to a big part of the world's population those who are deaf or hard of hearing. This study presents a new system that aims to make these resources available to everyone by turning spoken words from video lectures into sign language gestures. The system uses advanced video processing to change spoken words into text. It then turns this text into British Sign Language (BSL) glosses, which are then written in the Hamburg Notation System (HamNoSys). Using the Signing Gesture Markup Language (SiGML), the system creates a virtual avatar that can show complex sign language gestures. The heart of this research is in combining cutting-edge technologies to offer a deep learning experience that goes beyond hearing limits and tries to make online education for everyone. Early tests of the system show it has the power to close the current gap in education highlighting how important it is to include everyone in the digital age.
Location-based service (LBS) applications are increasingly popular for travelling. The public transit scenario is very common in urban areas, yet there is a lack of effective privacy protection mechanisms to safeguard...
详细信息
Forecasting Human mobility is of great significance in the simulation and control of infectious diseases like COVID-19. To get a clear picture of potential future outbreaks, it is necessary to forecast multi-step Ori...
详细信息
Cerebral stroke is a major global health issue, contributing to high mortality and long-term disability. Early identification of individuals at high risk of stroke can significantly improve preventive care outcomes. W...
详细信息
In a conversational system, dynamically generating follow-up questions based on context can help users explore information and provide a better user experience. Humans are usually able to ask questions that involve so...
详细信息
Text classification is a crucial technology that helps in extracting and organizing textual data by automatically identifying its type based on the content. It can be applied to categorize Bengali news content into va...
详细信息
ISBN:
(数字)9798331521691
ISBN:
(纸本)9798331521707
Text classification is a crucial technology that helps in extracting and organizing textual data by automatically identifying its type based on the content. It can be applied to categorize Bengali news content into various categories. In this re-search paper, we employed the voting classification technique, which includes multiple classifiers such as Logistic Regression, Support Vector Classification (SVC), Multinomial Naive Bayes (NB), Bernoulli NB, Stochastic Gradient De-scent (SGD) Classifier, AdaBoost Classifier, Decision Tree Classifier, Logistic RegressionCV, Calibrated ClassifierCV, and Random Forest Classifier. The model was trained and tested on a gold standard dataset consisting of Bengali news content, and categorized the data into eight distinct categories, including Economy, Education, Entertainment, International, National, science and Tech-nology, Sports, and Politics. Our results indicate that the hard voting approach achieved an accuracy rate of 89%, while the soft voting approach achieved an accuracy rate of 93%. This paper demonstrates the effectiveness of employing text classification techniques for categorizing Bengali news content and highlights the potential for its application in other languages and domains.
The increasing demand for programmers has led to a surge in participants in programming courses, making it increasingly challenging for instructors to assess student code manually. As a result, automated programming a...
详细信息
暂无评论