Invertible networks have various benefits for image denoising since they are lightweight, information-lossless, and memory-saving during back-propagation. However, applying invertible models to remove noise is challen...
详细信息
ISBN:
(纸本)9781665445092
Invertible networks have various benefits for image denoising since they are lightweight, information-lossless, and memory-saving during back-propagation. However, applying invertible models to remove noise is challenging because the input is noisy, and the reversed output is clean, following two different distributions. We propose an invertible denoising network, InvDN, to address this challenge. InvDN transforms the noisy input into a low-resolution clean image and a latent representation containing noise. To discard noise and restore the clean image, InvDN replaces the noisy latent representation with another one sampled from a prior distribution during reversion. The denoising performance of InvDN is better than all the existing competitive models, achieving a new state-of-the-art result for the SIDD dataset while enjoying less run time. Moreover, the size of InvDN is far smaller, only having 4.2% of the number of parameters compared to the most recently proposed DANet. Further, via manipulating the noisy latent representation, InvDN is also able to generate noise more similar to the original one. Our code is available at: https://***/Yang-Liu1082/***.
Autonomous driving can benefit from motion behavior comprehension when interacting with diverse traffic participants in highly dynamic environments. Recently, there has been a growing interest in estimating class-agno...
详细信息
ISBN:
(纸本)9781665445092
Autonomous driving can benefit from motion behavior comprehension when interacting with diverse traffic participants in highly dynamic environments. Recently, there has been a growing interest in estimating class-agnostic motion directly from point clouds. Current motion estimation methods usually require vast amount of annotated training data from self-driving scenes. However, manually labeling point clouds is notoriously difficult, error-prone and time-consuming. In this paper, we seek to answer the research question of whether the abundant unlabeled data collections can be utilized for accurate and efficient motion learning. To this end, we propose a learning framework that leverages free supervisory signals from point clouds and paired camera images to estimate motion purely via self-supervision. Our model involves a point cloud based structural consistency augmented with probabilistic motion masking as well as a cross-sensor motion regularization to realize the desired self-supervision. Experiments reveal that our approach performs competitively to supervised methods, and achieves the state-of-the-art result when combining our self-supervised model with supervised fine-tuning.
We propose an end-to-end model-based cross-view gait recognition which employs pose sequences and shapes extracted by human model fitting. Specifically, we consider a problem setting where gait sequences from single d...
详细信息
ISBN:
(纸本)9781665401913
We propose an end-to-end model-based cross-view gait recognition which employs pose sequences and shapes extracted by human model fitting. Specifically, we consider a problem setting where gait sequences from single different views are given as a pair to match in a test phase, while asynchronous multi-view gait sequences are given for each subject in a training phase. This work exploits multi-view constraint in the training phase to extract more consistent pose sequences from any views in the test phase, unlike the existing methods do not consider them. For this purpose, given asynchronous multi-view gait sequences, we introduce a phase synchronization step in the training phase so that we can impose pose consistency at each synchronized phase in a temporally up-sampled phase domain. We then train our network by minimizing a loss function based on the synchronized multi-view pose constraint as well as shape consistency, temporal pose smoothness, recognition accuracy, etc in an end-to-end manner. We also introduce the synchronization step in a test phase to reduce intra-subject variations caused by asynchronous pose features. Experimental results on the OU-MVLP and CASIA-B datasets show that the proposed method achieves the state-of-the-art performance for both gait identification and verification scenarios, especially a great improvement in terms of the pose representations.
Person re-identification (ReID) methods always learn through a stationary domain that is fixed by the choice of a given dataset. In many contexts (e.g., lifelong learning), those methods are ineffective because the do...
详细信息
ISBN:
(纸本)9781665445092
Person re-identification (ReID) methods always learn through a stationary domain that is fixed by the choice of a given dataset. In many contexts (e.g., lifelong learning), those methods are ineffective because the domain is continually changing in which case incremental learning over multiple domains is required potentially. In this work we explore a new and challenging ReID task, namely lifelong person re-identification (LReID), which enables to learn continuously across multiple domains and even generalise on new and unseen domains. Following the cognitive processes in the human brain, we design an Adaptive Knowledge Accumulation (AKA) framework that is endowed with two crucial abilities: knowledge representation and knowledge operation. Our method alleviates catastrophic forgetting on seen domains and demonstrates the ability to generalize to unseen domains. Correspondingly, we also provide a new and large-scale benchmark for LReID. Extensive experiments demonstrate our method outperforms other competitors by a margin of 5.8% mAP in generalising evaluation.
This work presents a novel two stage architecture designed to enhance degraded images affected by environmental factors such as haze, blur, fog, and rain. Despite the dominance of deep Convolutional Neural Networks (C...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
This work presents a novel two stage architecture designed to enhance degraded images affected by environmental factors such as haze, blur, fog, and rain. Despite the dominance of deep Convolutional Neural Networks (CNNs) and Transformers in single image restoration tasks, existing methods neglect the intrinsic priors for physical properties of degradation. To enhance the generalization ability of image restoration models, we propose Fourier prior based on a key observation that substituting the Fourier amplitude of degraded images with that of clean images effectively mitigates degradation. Therefore, amplitude contains degradation information, while the phase retains background structures. Consequently, a two-stage model is proposed, that consists of Amplitude Refinement Unit (ARU) and the Phase Refinement Unit (PRU), that separately restore both amplitude and phase information, respectively. ARU and PRU leverage a CNN-Transformer-based architecture to extract local and global features, overcoming computational constraints posed by large image sizes in Transformers. Additionally, a multi-scale approach in ARU refines amplitude features at coarse and fine levels, improving restoration efficiency. Experimental results across multiple image restoration tasks, like image deraining, dehazing, and low-light enhancement, indicate that the proposed architecture improved the performance in terms of PSNR, SSIM, and computational efficiency compared to state-of-the-art Transformer approaches.
Current model extraction attacks assume that the adversary has access to a surrogate dataset with characteristics similar to the proprietary data used to train the victim model. This requirement precludes the use of e...
详细信息
ISBN:
(纸本)9781665445092
Current model extraction attacks assume that the adversary has access to a surrogate dataset with characteristics similar to the proprietary data used to train the victim model. This requirement precludes the use of existing model extraction techniques on valuable models, such as those trained on rare or hard to acquire datasets. In contrast, we propose data free model extraction methods that do not require a surrogate dataset. Our approach adapts techniques from the area of data free knowledge transfer for model extraction. As part of our study, we identify that the choice of loss is critical to ensuring that the extracted model is an accurate replica of the victim model. Furthermore, we address difficulties arising from the adversary's limited access to the victim model in a black-box setting. For example, we recover the model's logits from its probability predictions to approximate gradients. We find that the proposed data free model extraction approach achieves high-accuracy with reasonable query complexity - 0.99x and 0.92 x the victim model accuracy on SVHN and CIFAR-10 datasets given 2M and 20M queries respectively.
Affective Behavior Analysis aims to facilitate technology emotionally smart, creating a world where devices can understand and react to our emotions as humans do. To comprehensively evaluate the authenticity and appli...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Affective Behavior Analysis aims to facilitate technology emotionally smart, creating a world where devices can understand and react to our emotions as humans do. To comprehensively evaluate the authenticity and applicability of emotional behavior analysis techniques in natural environments, the 6th competition on Affective Behavior Analysis in-the-wild (ABAW) utilizes the Aff-Wild2, Hume-Vidmimic2, and C-EXPR-DB datasets to set up five competitive tracks, i.e., Valence-Arousal (VA) Estimation, Expression (EXPR) recognition, Action Unit (AU) Detection, Compound Expression (CE) recognition, and Emotional Mimicry Intensity (EMI) Estimation. In this paper, we present our method designs for VA estimation, expression recognition, and AU detection tracks. Specifically, our framework mainly includes three aspects: 1) To achieve high-quality facial feature representations, we employ Masked-Auto Encoder as the visual features extraction model and fine-tune it with our facial dataset. 2) Utilizing a transformer-based feature fusion module to fully integrate emotional information provided by audio signals, visual images, and transcripts, offering high-quality expression features for the downstream tasks. 3) Considering the complexity of the video collection scenes, we conduct a more detailed dataset division based on scene characteristics and train the classifier for each scene. Extensive experiments demonstrate the superiority of our designs. Our work won the championship in the AU, EXPR, and VA tracks at the ABAW6 competition.
In this paper, we consider the absorption effect for the problem of single image reflection removal. We show that the absorption effect can be numerically approximated by the average of refractive amplitude coefficien...
详细信息
ISBN:
(纸本)9781665445092
In this paper, we consider the absorption effect for the problem of single image reflection removal. We show that the absorption effect can be numerically approximated by the average of refractive amplitude coefficient map. We then reformulate the image formation model and propose a two-step solution that explicitly takes the absorption effect into account. The first step estimates the absorption effect from a reflection-contaminated image, while the second step recovers the transmission image by taking a reflection-contaminated image and the estimated absorption effect as the input. Experimental results on four public datasets show that our two-step solution not only successfully removes reflection artifact, but also faithfully restores the intensity distortion caused by the absorption effect. Our ablation studies further demonstrate that our method achieves superior performance on the recovery of overall intensity and has good model generalization capacity. The code is available at https://***/q-zh/absorption.
Prototype learning is extensively used for few-shot segmentation. Typically, a single prototype is obtained from the support feature by averaging the global object information. However, using one prototype to represen...
详细信息
ISBN:
(纸本)9781665445092
Prototype learning is extensively used for few-shot segmentation. Typically, a single prototype is obtained from the support feature by averaging the global object information. However, using one prototype to represent all the information may lead to ambiguities. In this paper, we propose two novel modules, named superpixel-guided clustering (SGC) and guided prototype allocation (GPA), for multiple prototype extraction and allocation. Specifically, SGC is a parameter-free and training free approach, which extracts more representative prototypes by aggregating similar feature vectors, while GPA is able to select matched prototypes to provide more accurate guidance. By integrating the SGC and GPA together, we propose the Adaptive Superpixel-guided Network (ASGNet), which is a lightweight model and adapts to object scale and shape variation. In addition, our network can easily generalize to k-shot segmentation with substantial improvement and no additional computational cost. In particular, our evaluations on COCO demonstrate that ASGNet surpasses the state-of-the-art method by 5% in 5-shot segmentation.(1)
Humans can easily infer the underlying 3D geometry and texture of an object only from a single 2D image. Current computervision methods can do this, too, but suffer from view generalization problems - the models infe...
详细信息
ISBN:
(纸本)9781665445092
Humans can easily infer the underlying 3D geometry and texture of an object only from a single 2D image. Current computervision methods can do this, too, but suffer from view generalization problems - the models inferred tend to make poor predictions of appearance in novel views. As for generalization problems in machine learning, the difficulty is balancing single-view accuracy (cf training error;bias) with novel view accuracy (cf test error;variance). We describe a class of models whose geometric rigidity is easily controlled to manage this tradeoff. We describe a cycle consistency loss that improves view generalization (roughly, a model from a generated view should predict the original view well). View generalization of textures requires that models share texture information, so a car seen from the back still has headlights because other cars have headlights. We describe a cycle consistency loss that encourages model textures to be aligned, so as to encourage sharing. We compare our method against the state-of-the-art method and show both qualitative and quantitative improvements.
暂无评论