Latest diffusion-based methods for many image restoration tasks outperform traditional models, but they encounter the long-time inference problem. To tackle it, this paper proposes a Wavelet-Based Diffusion Model (Wav...
详细信息
Automated wildlife reidentification has attracted increasing attention in recent years as it provides a non-invasive tool to identify and to track individual wild animals over time. In this paper, the first steps are ...
详细信息
Automated wildlife reidentification has attracted increasing attention in recent years as it provides a non-invasive tool to identify and to track individual wild animals over time. In this paper, the first steps are taken towards the automatic photo-identification of the Ladoga ringed seals (Pusa hispida ladogensis). A method is proposed that takes a sequence of images, each containing multiple individuals as the input, and produces cropped images of seals grouped based on one certain individual per group. The method starts by detecting each seal from the images and proceeds to matching the individual seals between the images. It is shown that high grouping accuracy can be obtained with a general-purpose image retrieval method on an image sequence taken from the same location within a relatively short period of time. Each resulting group contains multiple images of one individual with slightly different variations, for example, in pose and illumination. Utilizing these images simultaneously provides more information for the individual re-identification compared to the traditional approach, i.e., which utilizes just one image at a time. It is further demonstrated that a convolutional neural network based method can be used to extract the unique pelage patterns of the seals despite the low contrast. Finally, a method is proposed and experiments with the novel Ladoga ringed seals data are carried out to provide a proof-of-concept for the individual re-identification.
The recognition of human emotions remains a challenging task for social media images. This is due to distortions created by different social media conflict with the minute changes in facial expression. This study pres...
详细信息
Image colorization is a well-known problem in computervision. However, due to the ill-posed nature of the task, image colorization is inherently challenging. Though several attempts have been made by researchers to m...
详细信息
Pose estimation of a pedestrian helps to gather information about the current activity or the instant behaviour of the subject. Such information is useful for autonomous vehicles, augmented reality, video surveillance...
详细信息
Self-supervised contrastive learning frameworks have progressed rapidly over the last few years. In this paper, we propose a novel mutual information optimization-based loss function for contrastive learning. We model...
详细信息
Transformer-based methods have shown impressive performance in low-level vision tasks, such as image super-resolution. However, we find that these networks can only utilize a limited spatial range of input information...
Transformer-based methods have shown impressive performance in low-level vision tasks, such as image super-resolution. However, we find that these networks can only utilize a limited spatial range of input information through attribution analysis. This implies that the potential of Transformer is still not fully exploited in existing networks. In order to activate more input pixels for better reconstruction, we propose a novel Hybrid Attention Transformer (HAT). It combines both channel attention and window-based self-attention schemes, thus making use of their complementary advantages of being able to utilize global statistics and strong local fitting capability. Moreover, to better aggregate the cross-window information, we introduce an overlapping cross-attention module to enhance the interaction between neighboring window features. In the training stage, we additionally adopt a same-task pre-training strategy to exploit the potential of the model for further improvement. Extensive experiments show the effectiveness of the proposed modules, and we further scale up the model to demonstrate that the performance of this task can be greatly improved. Our overall method significantly outperforms the state-of-the-art methods by more than 1dB.
The Background Linking task is a problem that focuses on providing users with suggestions for articles to read next, when the user is reading a news article. The suggested articles should provide adequate context and ...
The relevance of machine learning (ML) in our daily lives is closely intertwined with its explainability. Explainability can allow end-users to have a transparent and humane reckoning of a ML scheme's capability a...
详细信息
Although there are advanced technologies for character recognition, automatic descriptive answer evaluation is an open challenge for the document image analysis community due to large diversified handwritten text and ...
ISBN:
(纸本)9781450397056
Although there are advanced technologies for character recognition, automatic descriptive answer evaluation is an open challenge for the document image analysis community due to large diversified handwritten text and answers to the question. This paper presents a novel method for detecting anomaly handwritten text in the responses written by the students to the questions. The method is proposed based on the fact that when the students are confident in answering questions, the students usually write answers legibly and neatly while they are not confident, they write sloppy writing which may not be easy for the reader to understand. To detect such anomaly handwritten text, we explore a new combination of Fourier transform and deep learning model for detecting edges. This result preserves the structure of handwritten text. For extracting features for classification of anomaly text and normal text, the proposed method studies the behavior of writing style, especially the variation at ascenders and descenders. Therefore, the proposed work draws principal axis which is invariant to rotation, scaling and some extent to distortion for the edge images. With respect to principal axis, the proposed method draws medial axis using uppermost and lowermost points. The distance between the medial axis and principal axis points are considered as feature vector. Further, the feature vector is passed to Artificial Neural Network for classification of anomaly text. The proposed method is evaluated by testing on our own dataset, standard dataset of gender identification (IAM) and handwritten forgery detection dataset (ACPR 2019). The results on different datasets show that the proposed work outperforms the existing methods.
暂无评论