Handwritten Text recognition (HTR) is an open problem at the intersection of computervision and Natural Language processing. the main challenges, when dealing with historical manuscripts, are due to the preservation ...
详细信息
ISBN:
(数字)9781665490627
ISBN:
(纸本)9781665490627
Handwritten Text recognition (HTR) is an open problem at the intersection of computervision and Natural Language processing. the main challenges, when dealing with historical manuscripts, are due to the preservation of the paper support, the variability of the handwriting - even of the same author over a wide time-span - and the scarcity of data from ancient, poorly represented languages. Withthe aim of fostering the research on this topic, in this paper we present the Ludovico Antonio Muratori (LAM) dataset, a large line-level HTR dataset of Italian ancient manuscripts edited by a single author over 60 years. the dataset comes in two configurations: a basic splitting and a date-based splitting which takes into account the age of the author. the first setting is intended to study HTR on ancient documents in Italian, while the second focuses on the ability of HTR systems to recognize text written by the same writer in time periods for which training data are not available. For both configurations, we analyze quantitative and qualitative characteristics, also with respect to other line-level HTR benchmarks, and present the recognition performance of state-of-the-art HTR architectures. the dataset is available for download at https://***/go/lam.
this paper introduces a screen control mechanism for long distance interaction between Human and computer. this Paper provides solution for controlling the screen elements of different applications and browsers throug...
详细信息
Document image Binarization is a well-known problem in Document Analysis and computervision, although it is far from being solved. One of the main challenges of this task is that documents generally exhibit degradati...
详细信息
ISBN:
(纸本)9783031705427;9783031705434
Document image Binarization is a well-known problem in Document Analysis and computervision, although it is far from being solved. One of the main challenges of this task is that documents generally exhibit degradations and acquisition artifacts that can greatly vary throughout the page. Nonetheless, even when dealing with a local patch of the document, taking into account the overall appearance of a wide portion of the page can ease the prediction by enriching it with semantic information on the ink and background conditions. In this respect, approaches able to model both local and global information have been proven suitable for this task. In particular, recent applications of vision Transformer (ViT)-based models, able to model short and long-range dependencies via the attention mechanism, have demonstrated their superiority over standard Convolution-based models, which instead struggle to model global dependencies. In this work, we propose an alternative solution based on the recently introduced Fast Fourier Convolutions, which overcomes the limitation of standard convolutions in modeling global information while requiring fewer parameters than ViTs. We validate the effectiveness of our approach via extensive experimental analysis considering different types of degradations.
Healthcare monitoring for humans is important due to several factors including life quality and early detection of health-related problems. Human activity patterns recognition is the most promising ways to monitor hum...
详细信息
Artificial Intelligence, particularly through recent advancements in deep learning (DL), has achieved exceptional performances in many tasks in fields such as natural language processing and computervision. For certa...
详细信息
ISBN:
(数字)9781665490627
ISBN:
(纸本)9781665490627
Artificial Intelligence, particularly through recent advancements in deep learning (DL), has achieved exceptional performances in many tasks in fields such as natural language processing and computervision. For certain high-stake domains, in addition to desirable performance metrics, a high level of interpretability is often required in order for AI to be reliably utilized. Unfortunately, the black box nature of DL models prevents researchers from providing explicative descriptions for a DL model's reasoning process and decisions. In this work, we propose a novel framework utilizing Adversarial Inverse Reinforcement Learning that can provide global explanations for decisions made by a Reinforcement Learning model and capture intuitive tendencies that the model follows by summarizing the model's decision-making process.
the citrus industry plays a pivotal role in the global agricultural sector, contributing significantly to the economy. Approximately 161 million tonnes of citrus fruits were produced worldwide. One third of the total ...
详细信息
Semantic image Segmentation facilitates a multitude of real-world applications ranging from autonomous driving over industrial process supervision to vision aids for human beings. these models are usually tr...
详细信息
this review paper surveys a range of hyperparameter optimization techniques employed in the context of transformer models for facial expression recognition. Transformers have proven to be highly effective in various n...
详细信息
Although image inpainting, or the art of restoring old and degraded photographs/images, has been around for a long time, it has lately acquired popularity as a consequence of technical advancements in imageprocessing...
详细信息
the spread of rumors has brought many negative impacts to society. Nowadays, rumors on social media platforms often exist in the form of bothimages and text. In response to this, many methods for multimodal rumor det...
详细信息
暂无评论