Thermal Image Super-Resolution (TISR) is a technique for converting Low-Resolution (LR) thermal images to High-Resolution (HR) thermal images. This technique has recently become a research hotspot due to its ability t...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Thermal Image Super-Resolution (TISR) is a technique for converting Low-Resolution (LR) thermal images to High-Resolution (HR) thermal images. This technique has recently become a research hotspot due to its ability to reduce sensor costs and improve visual perception. However, current research does not provide an effective solution for multi-sensor data training, possibly driven by pixel mismatch and simple degradation setting issues. In this paper, we proposed a Camera Internal Parameters Perception Network (CIPPSRNet) for LR thermal image enhancement. The camera internal parameters (CIP) were explicitly modeled as a feature representation, the LR features were transformed into the intermediate domain containing the internal parameters information by perceiving CIP representation. The mapping between the intermediate domain and the spatial domain of the HR features was learned by CIPPSRNet. In addition, we introduced contrastive learning to optimize the pretrained Camera Internal Parameters Representation Network and the feature encoders. Our proposed network is capable of achieving a more efficient transformation from the LR to the HR domains. Additionally, the use of contrastive learning can improve the network's adaptability to misalignment data with insufficient pixel matching and its robustness. Experiments on PBVS2022 TISR Dataset show that our network has achieved state-of-the-art performance for the Thermal SR task.
Human emotions recognization contributes to the development of human-computer interaction. The machines understanding human emotions in the real world will significantly contribute to life in the future. This paper in...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Human emotions recognization contributes to the development of human-computer interaction. The machines understanding human emotions in the real world will significantly contribute to life in the future. This paper introduces the 3rd Affective Behavior Analysis in-the-wild (ABAW3) 2022 challenge. We focused on solving the problem of the Valence-Arousal (VA) estimation and Action Unit (AU) detection. For valence-arousal estimation, we conducted two stages: creating new features from multimodel and temporal learning to predict valence-arousal. First, we make new features;the Gated Recurrent Unit (GRU) and Transformer are combined using a Regular Networks (RegNet) feature, which is extracted from the image. The next step is the GRU combined with local attention to predict valence-arousal. The Concordance Correlation Coefficient (CCC) was used to evaluate the model. The result achieved 0.450 for valence and 0.445 for arousal on the test set, outperforming the baseline method with a corresponding CCC of 0.180 for valence and 0.170 for arousal. We also performed additional experiments on the action unit task with simple transformer blocks. We achieved a score of 49.04 on the test set in terms of F-1 score, which outperforms the baseline method with a corresponding F1 score of 36.50. Our submission to ABAW3 2022 ranks 3rd for both tasks.
Polyp segmentation is a crucial step towards computer-aided diagnosis of colorectal cancer. However, most of the polyp segmentation methods require pixel-wise annotated datasets. Annotated datasets are tedious and tim...
详细信息
Growing interest in conversational agents promote two-way human-computer communications involving asking and answering visual questions have become an active area of research in AI. Thus, generation of visual question...
详细信息
Image processing is a very fundamental technique in the field of low-level vision. However, with the development of deep learning over the past five years, most low-level vision methods tend to ignore this technique. ...
详细信息
Recent advancements in learning-based novel view synthesis enable users to synthesize light field from a monocular image without special equipment. Moreover, the state-of-the-art techniques including multiplane image ...
详细信息
The aim of this paper is to propose a large scale dataset for image restoration (LSDIR). Recent work in image restoration has been focused on the design of deep neural networks. The datasets used to train these networ...
详细信息
We propose a novel depth-aware joint attention target estimation framework that estimates the attention target in 3D space. Our goal is to mimic human's ability to understand where each person is looking in their ...
详细信息
With the large-scale video-text datasets being collected, learning general visual-textual representation has gained increasing attention. While recent methods are designed with the assumption that the alt-text descrip...
详细信息
This paper presents our Facial Action Units (AUs) detection submission to the fifth Affective Behavior Analysis in-the-wild Competition (ABAW). Our approach consists of three main modules: (i) a pre-trained facial rep...
详细信息
暂无评论