High-performing out-of-distribution (OOD) detection, both anomaly and novel class, is an important prerequisite for the practical use of classification models. In this paper we focus on the species recognition task in...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
High-performing out-of-distribution (OOD) detection, both anomaly and novel class, is an important prerequisite for the practical use of classification models. In this paper we focus on the species recognition task in images, concerned with large databases, a large number of fine-grained hierarchical classes, severe class imbalance, and varying image quality. We propose a framework for combining individual OOD measures into one combined OOD (COOD) measure using a supervised model. The individual measures are several existing state-of-the-art measures and several novel OOD measures developed with novel class detection and hierarchical class structure in mind. COOD was extensively evaluated on three large-scale (500k+ images) biodiversity datasets in the context of anomaly and novel class detection. We show that COOD outperforms individual, including state-of-the-art, OOD measures by a large margin in terms of TPR@1%FPR in the majority of experiments, e.g., improving detecting ImageNet images (OOD) from 54.3% to 83.3% for the iNaturalist 2018 dataset. SHAP (feature contribution) analysis shows that different individual OOD measures are essential for various tasks, indicating that multiple OOD measures and combinations are needed to generalize. Additionally, we show that explicitly considering ID images that are incorrectly classified for the original (species) recognition task is important for constructing high-performing OOD detection methods and for practical applicability. The framework can easily be extended or adapted to other tasks and media modalities.
Humans have a remarkable ability to organize, compress and retrieve episodic memories throughout their daily life. Current AI systems, however, lack comparable capabilities as they are mostly constrained to an analysi...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487405
Humans have a remarkable ability to organize, compress and retrieve episodic memories throughout their daily life. Current AI systems, however, lack comparable capabilities as they are mostly constrained to an analysis with access to the raw input sequence, assuming an unlimited amount of data storage which is not feasible in realistic deployment scenarios. For instance, existing Video Question Answering (VideoQA) models typically reason over the video while already being aware of the question, thus requiring to store the complete video in case the question is not known in *** this paper, we address this challenge with three main contributions: First, we propose the Episodic Memory Question Answering (EMQA) task as a specialization of VideoQA. Specifically, EMQA models are constrained to keep only a constant-sized representation of the video input, thus automatically limiting the computation requirements at query time. Second, we introduce a new egocentric VideoQA dataset called QaEgo4D, far larger than existing egocentric VideoQA datasets and featuring video length unprecedented in VideoQA datasets in general. Third, we present extensive experiments on the new dataset, comparing various baseline models in both the VideoQA and the EMQA setting. To facilitate future research on egocentric VideoQA as well as episodic memory representation and retrieval, we publish our code and dataset.
GANs have matured in recent years and are able to generate high-resolution, realistic images. However, the computational resources and the data required for the training of high-quality GANs are enormous, and the stud...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487405
GANs have matured in recent years and are able to generate high-resolution, realistic images. However, the computational resources and the data required for the training of high-quality GANs are enormous, and the study of transfer learning of these models is therefore an urgent topic. Many of the available high-quality pretrained GANs are unconditional (like StyleGAN). For many applications, however, conditional GANs are preferable, because they provide more control over the generation process, despite often suffering more training difficulties. Therefore, in this paper, we focus on transferring from high-quality pretrained unconditional GANs to conditional GANs. This requires architectural adaptation of the pretrained GAN to perform the conditioning. To this end, we propose hyper-modulated generative networks that allow for shared and complementary supervision. To prevent the additional weights of the hypernetwork to overfit, with subsequent mode collapse on small target domains, we introduce a self-initialization procedure that does not require any real data to initialize the hypernetwork parameters. To further improve the sample efficiency of the transfer, we apply contrastive learning in the discriminator, which effectively works on very limited batch sizes. In extensive experiments, we validate the efficiency of the hypernetworks, self-initialization and contrastive loss for knowledge transfer on standard benchmarks. Our code is available at https://***/hecoding/Hyper-Modulation.
Exposure correction tasks are dedicated to recovering the brightness and structural information of overexposed or underexposed images. The recovery difficulty of areas with different exposure levels is different, as s...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Exposure correction tasks are dedicated to recovering the brightness and structural information of overexposed or underexposed images. The recovery difficulty of areas with different exposure levels is different, as severely exposed areas are more difficult to recover due to severe structural information loss than commonly exposed areas. However, existing methods focus on the simultaneous recovery of global brightness and structure, ignoring that the recovery difficulty varies between areas. To address this issue, we propose a novel exposure correction strategy named "Inpainting Assisted Exposure Correction"(IAEC), which pre-performs image structure repair on severely exposed areas to guide the exposure correction process. This method is based on the observation that the contextual semantic information contained in the image structure can effectively help the overall image recovery, and the lack of contextual semantic information in severely incorrectly exposed areas is very severe. The pre-performed structural repair by the inpainting model can well supplement the insufficient contextual semantic information caused by severe exposure. Therefore, we use an inpainting model to perform pre-structure repair on severely exposed areas to obtain supplementary contextual semantic information and then align the structure-repaired image with the improperly exposed input at the feature level. Extensive experiments demonstrate that our method gets superior results than the state-of-the-art methods and has the potential to be applied to other tasks with similar context loss problems.
Optical flow estimation is crucial to a variety of vision tasks. Despite substantial recent advancements, achieving real-time on-device optical flow estimation remains a complex challenge. First, an optical flow model...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Optical flow estimation is crucial to a variety of vision tasks. Despite substantial recent advancements, achieving real-time on-device optical flow estimation remains a complex challenge. First, an optical flow model must be sufficiently lightweight to meet computation and memory constraints to ensure real-time performance on devices. Second, the necessity for real-time on-device operation imposes constraints that weaken the model’s capacity to adequately handle ambiguities in flow estimation, thereby intensifying the difficulty of preserving flow *** paper introduces two synergistic techniques, Self-Cleaning Iteration (SCI) and Regression Focal Loss (RFL), designed to enhance the capabilities of optical flow models, with a focus on addressing optical flow regression ambiguities. These techniques prove particularly effective in mitigating error propagation, a prevalent issue in optical flow models that employ iterative refinement. Notably, these techniques add negligible to zero overhead in model parameters and inference latency, thereby preserving real-time on-device *** effectiveness of our proposed SCI and RFL techniques, collectively referred to as SciFlow for brevity, is demonstrated across two distinct lightweight optical flow model architectures in our experiments. Remarkably, SciFlow enables substantial reduction in error metrics (EPE and Fl-all) over the baseline models by up to 6.3% and 10.5% for in-domain scenarios and by up to 6.2% and 13.5% for cross-domain scenarios on the Sintel and KITTI 2015 datasets, respectively.
Building and maintaining High-Definition (HD) maps represents a large barrier to autonomous vehicle deployment. This, along with advances in modern online map detection models, has sparked renewed interest in the onli...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Building and maintaining High-Definition (HD) maps represents a large barrier to autonomous vehicle deployment. This, along with advances in modern online map detection models, has sparked renewed interest in the online mapping problem. However, effectively predicting online maps at a high enough quality to enable safe, driverless deployments remains a significant challenge. Recent work on these models proposes training robust online mapping systems using low quality map priors with synthetic perturbations in an attempt to simulate out-of-date HD map priors. In this paper, we investigate how models trained on these synthetically perturbed map priors generalize to performance on deployment-scale, real world map changes. We present a large-scale experimental study to determine which synthetic perturbations are most useful in generalizing to real world HD map changes, evaluated using multiple years of real-world autonomous driving data. We show there is still a substantial sim2real gap between synthetic prior perturbations and observed real-world changes, which limits the utility of current prior-informed HD map prediction models.
Data augmentations are widely used in training medical image deep learning models to increase the diversity and size of sparse datasets. However, commonly used augmentation techniques can result in loss of clinically ...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Data augmentations are widely used in training medical image deep learning models to increase the diversity and size of sparse datasets. However, commonly used augmentation techniques can result in loss of clinically relevant information from medical images, leading to incorrect predictions at inference time. We propose the Interactive Medical Image Learning (IMIL) framework, a novel approach for improving the training of medical image analysis algorithms that enables clinician-guided intermediate training data augmentations on misprediction outliers, focusing the algorithm on relevant visual information. To prevent the model from using irrelevant features during training, IMIL will ’blackout’ clinician-designated irrelevant regions and replace the original images with the augmented samples. This ensures that for originally mispredicted samples, the algorithm subsequently attends only to relevant regions and correctly correlates them with the respective diagnosis. We validate the efficacy of IMIL using radiology residents and compare its performance to state-of-the-art data augmentations. A 4.2% improvement in accuracy over ResNet-50 was observed when using IMIL on only 4% of the training set. Our study demonstrates the utility of clinician-guided interactive training to achieve meaningful data augmentations for medical image analysis algorithms.
In scenes containing specular objects, the image motion observed by a moving camera may be an intermixed combination of optical flow resulting from diffuse reflectance (diffuse flow) and specular reflection (specular ...
详细信息
In scenes containing specular objects, the image motion observed by a moving camera may be an intermixed combination of optical flow resulting from diffuse reflectance (diffuse flow) and specular reflection (specular flow). Here, with few assumptions, we formalize the notion of specular flow, show how it relates to the 3D structure of the world, and develop an algorithm for estimating scene structure from 2D image motion. Unlike previous work on isolated specular highlights we use two image frames and estimate the semi-dense flow arising from the specular reflections of textured scenes. We parametrically model the image motion of a quadratic surface patch viewed from a moving camera. The flow is modeled as a probabilistic mixture of diffuse and specular components and the 3D shape is recovered using an Expectation-Maximization algorithm. Rather than treating specular reflections as noise to be removed or ignored, we show that the specular flow provides additional constraints on scene geometry that improve estimation of 3D structure when compared with reconstruction from diffuse flow alone. We demonstrate this for a set of synthetic and real sequences of mixed specular-diffuse objects.
The necessity for a Person ReID system for rapidly evolving urban surveillance applications is severely challenged by domain shifts—variations in data distribution that occur across different environments or times. I...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
The necessity for a Person ReID system for rapidly evolving urban surveillance applications is severely challenged by domain shifts—variations in data distribution that occur across different environments or times. In this paper, we provide the first empirical review of domain shift in person ReID, which includes three settings namely Unsupervised Domain Adaptation ReID, Domain Generalizable ReID, and Lifelong ReID. We observe that existing approaches only tackle domain shifts caused by cross-dataset setting, while ignoring intra-dataset attribute domain shifts caused by changes in clothing, shape, or gait, which is very common in ReID. Thus, we enhance research directions in this field by redefining domain shift in ReID as the combination of attribute domain shift with cross-dataset domain shift. With a focus on Lifelong Re-ID methods, we conduct an extensive comparison on a fair experimental setup and provide an in-depth analysis of these methods under both non-cloth-changing and cloth-changing Re-ID scenarios. Insights into the strengths and limitations of these methods based on their performance are studied. This paper outlines future research directions and paves the way for the development of more adaptive, resilient, and enduring cross-domain ReID systems. Code is available here.
To be able to develop and test robust affective multimodal systems, researchers need access to novel databases containing representative samples of human multi-modal expressive behavior. The creation of such databases...
详细信息
ISBN:
(纸本)0769525210
To be able to develop and test robust affective multimodal systems, researchers need access to novel databases containing representative samples of human multi-modal expressive behavior. The creation of such databases requires a major effort in the definition of representative behaviors, the choice of expressive modalities, and the collection and labeling of large amount of data. At present, public databases only exist for single expressive modalities such as facial expression analysis. There also exist a number of gesture databases of static and dynamic hand postures and dynamic hand gestures. However, there is not a readily available database combining affective face and body information in a genuine bimodal manner. Accordingly, in this paper, we present a bimodal database recorded by two high-resolution cameras simultaneously for use in automatic analysis of human nonverbal affective behavior
暂无评论