This paper presents a systematic literature review of image datasets for document image analysis, focusing on historical documents, such as handwritten manuscripts and early prints. Finding appropriate datasets for hi...
详细信息
The enhancement of historical document images is critical for improving the quality and legibility of scanned or captured document images. Convolutional-based techniques previously generated competitive results for do...
详细信息
ISBN:
(纸本)9798400716256
The enhancement of historical document images is critical for improving the quality and legibility of scanned or captured document images. Convolutional-based techniques previously generated competitive results for document image binarization, however, due to their inherent locality, these models are often limited in explicitly expressing long-range dependency. Transformers (ViT) have evolved as an alternative design with a global self-attention mechanism to tackle this issue, however, they can result in restricted localization capabilities due to a lack of low-level details. To address this problem, we propose TransDocUNet, a CNN-Transformer hybrid UNet architecture for document image binarization that merits both attention and convolution capabilities in a U-Net architecture and serves as a strong alternative to the existing solutions. The experimental results, obtained using the DIBCO/H-DIBCO datasets, highlight that our proposed method outperforms all the existing competing methods in terms of both objective quality metrics and visual quality assessment, achieving state-of-the-art performance in document image binarization. In addition, we undertake an ablation study to understand the role of dilation in the CNN to capture feature dependencies while reducing the computational cost as well. The findings helped us arrive at the final model and provide valuable insights into the importance of acquiring both global and local contextual information for tasks like enhancing document images.
In this work, we focus on a special group of human body language — the micro-gesture (MG), which differs from the range of ordinary illustrative gestures in that they are not intentional behaviors performed to convey...
详细信息
This paper presents a systematic literature review of image datasets for document image analysis, focusing on historical documents, such as handwritten manuscripts and early prints. Finding appropriate datasets for hi...
详细信息
Self-supervised Multi-view stereo (MVS) with a pretext task of image reconstruction has achieved significant progress recently. However, previous methods are built upon intuitions, lacking comprehensive explanations a...
详细信息
ISBN:
(纸本)9781665428132
Self-supervised Multi-view stereo (MVS) with a pretext task of image reconstruction has achieved significant progress recently. However, previous methods are built upon intuitions, lacking comprehensive explanations about the effectiveness of the pretext task in self-supervised MVS. To this end, we propose to estimate epistemic uncertainty in self-supervised MVS, accounting for what the model ignores. Specially, the limitations can be categorized into two types: ambiguious supervision in foreground and invalid supervision in background. To address these issues, we propose a novel Uncertainty reduction Multi-view Stereo (U-MVS) framework for self-supervised learning. To alleviate ambiguous supervision in foreground, we involve extra correspondence prior with a flow-depth consistency loss. The dense 2D correspondence of optical flows is used to regularize the 3D stereo correspondence in MVS. To handle the invalid supervision in background, we use Monte-Carlo Dropout to acquire the uncertainty map and further filter the unreliable supervision signals on invalid regions. Extensive experiments on DTU and Tank&Temples benchmark show that our U-MVS framework 1 achieves the best performance among unsupervised MVS methods, with competitive performance with its supervised opponents.
Self-supervised Multi-view stereo (MVS) with a pretext task of image reconstruction has achieved significant progress recently. However, previous methods are built upon intuitions, lacking comprehensive explanations a...
详细信息
Person images captured by surveillance cameras are often occluded by various obstacles, which lead to defective feature representation and harm person re-identification (Re-ID) performance. To tackle this challenge, w...
详细信息
Due to change in mindset and living style of humans, the numbers of diversified marriages are increasing all around the world irrespective of race, color, religion and culture. As a result, it is challenging for resea...
详细信息
This work reviews the results of the NTIRE 2023 Challenge on Image Shadow Removal. The described set of solutions were proposed for a novel dataset, which captures a wide range of object-light interactions. It consist...
详细信息
Haze during the bad weather, degrades the visibility of the scene drastically. Degradation of scene visibility varies with respect to the transmission coefficient/map (Tc) of the scene. Estimation of accurate Tc is ke...
详细信息
ISBN:
(纸本)9781450366151
Haze during the bad weather, degrades the visibility of the scene drastically. Degradation of scene visibility varies with respect to the transmission coefficient/map (Tc) of the scene. Estimation of accurate Tc is key step to reconstruct the haze free scene. Previously, local as well as global priors were proposed to estimate the Tc. We, on the other hand, propose integration of local and global approaches to learn both point level and object level Tc. The proposed local encoder decoder network (LEDNet) estimates the scene transmission map in two stages. During first stage, network estimates the point level Tc using parallel convolutional filters and spatial invariance filtering. The second stage comprises of a two level encoder-decoder architecture which anticipates the object level Tc. We also propose, local air-light estimation (LAE) algorithm, which is able to obtain the air-light component of the outdoor scene. Combination of LEDNet and LAE improves the accuracy of haze model to recover the scene radiance. Structural similarity index, mean square error and peak signal to noise ratio are used to evaluate the performance of the proposed approach for single image haze removal. Experiments on benchmark datasets show that LEDNet outperforms the existing state-of-the-art methods for single image haze removal.
暂无评论