In clinical settings, where acquisition conditions and patient populations change over time, continual learning is key for ensuring the safe use of deep neural networks. Yet most existing work focuses on convolutional...
详细信息
ISBN:
(纸本)9781665487399
In clinical settings, where acquisition conditions and patient populations change over time, continual learning is key for ensuring the safe use of deep neural networks. Yet most existing work focuses on convolutional architectures and image classification. Instead, radiologists prefer to work with segmentation models that outline specific regions-of-interest, for which Transformer-based architectures are gaining traction. The self-attention mechanism of Transformers could potentially mitigate catastrophic forgetting, opening the way for more robust medical image segmentation. In this work, we explore how recently-proposed Transformer mechanisms for semantic segmentation behave in sequential learning scenarios, and analyse how best to adapt continual learning strategies for this setting. Our evaluation on hippocampus segmentation shows that Transformer mechanisms mitigate catastrophic forgetting for medical image segmentation compared to purely convolutional architectures, and demonstrates that regularising ViT modules should be done with caution.
Under complex viewing conditions, human perception relies on generating hypotheses and revising them in an iterative fashion. We developed novel visual stimuli to study such iterative inference in humans and AI. In th...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Under complex viewing conditions, human perception relies on generating hypotheses and revising them in an iterative fashion. We developed novel visual stimuli to study such iterative inference in humans and AI. In these stimuli, called "constellations", all local information about the object has been removed and it can only be recognized when taking into account the global pattern. We here describe the dataset and demonstrate that humans indeed use an iterative process of generating hypotheses and refining them to solve these images. We also provide code that allows researchers to create their own constellation images. The constellation dataset allows researchers to develop sketching algorithms for guessing the hidden object. As such algorithms used by humans appear to be iterative in nature, this dataset will facilitate the study of iterative inference in minds and machines.
This paper reviews the NTIRE 2022 challenge on learning the super-Resolution space. This challenge aims to raise awareness that the super-resolution problem is ill-posed. Since many high-resolution images map to the s...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
This paper reviews the NTIRE 2022 challenge on learning the super-Resolution space. This challenge aims to raise awareness that the super-resolution problem is ill-posed. Since many high-resolution images map to the same low-resolution image, we asked the participants to create methods that sample diverse super-resolution from the space of possible high-resolution images given a low-resolution image. For evaluation, we use the same protocol as introduced in the last year's super-resolution space challenge of NTIRE 2021. We compare the submissions of the participating teams and relate them to the approaches from last year. This challenge contains two tracks: 4x and 8x scale factor. In total, 3 teams competed in the final testing phase.
Under the new norm of working from home, demand for fitness from home is on the rise. Different exercise forms solve different fitness needs for different people. Yoga gives flexibility and relieves stress. Pilates st...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Under the new norm of working from home, demand for fitness from home is on the rise. Different exercise forms solve different fitness needs for different people. Yoga gives flexibility and relieves stress. Pilates strengthens the muscles. Kung Fu brings balance. It is not feasible for everyone to hire a personal trainer. In this paper, we develop Pose Tutor, an AI-based explainable pose recognition and correction system. Pose Tutor combines vision and pose skeleton models in a novel coarse-to-fine framework to obtain pose class predictions. An angle-likelihood mechanism is used to explain which human joints maximally caused the pose class predictions and also correct any wrongly formed joints. Even without keypoint level training, Pose Tutor shows promising results on Yoga-82, Pilates-32, and Kungfu-7 datasets. Additionally, user studies conducted with multiple domain experts validate the explanations provided by our framework.
Dataset distillation is the task of synthesizing a small dataset such that a model trained on the synthetic set will match the test accuracy of the model trained on the full dataset. In this paper, we propose a new fo...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Dataset distillation is the task of synthesizing a small dataset such that a model trained on the synthetic set will match the test accuracy of the model trained on the full dataset. In this paper, we propose a new formulation that optimizes our distilled data to guide networks to a similar state as those trained on real data across many training steps. Given a network, we train it for several iterations on our distilled data and optimize the distilled data with respect to the distance between the synthetically trained parameters and the parameters trained on real data. To efficiently obtain the initial and target network parameters for large-scale datasets, we pre-compute and store training trajectories of expert networks trained on the real dataset. Our method handily outperforms existing methods and also allows us to distill higher-resolution visual data.
Image-based virtual try-on has recently gained a lot of attention in both the scientific and fashion industry communities due to its challenging setting and practical real-world applications. While pure convolutional ...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Image-based virtual try-on has recently gained a lot of attention in both the scientific and fashion industry communities due to its challenging setting and practical real-world applications. While pure convolutional approaches have been explored to solve the task, Transformer-based architectures have not received significant attention yet. Following the intuition that self- and cross-attention operators can deal with long-range dependencies and hence improve the generation, in this paper we extend a Transformer-based virtual try-on model by adding a dual-branch collaborative module that can exploit cross-modal information at generation time. We perform experiments on the VITON dataset, which is the standard benchmark for the task, and on a recently collected virtual try-on dataset with multi-category clothing, Dress Code. Experimental results demonstrate the effectiveness of our solution over previous methods and show that Transformer-based architectures can be a viable alternative for virtual try-on.
In this study, we proposed a novel minutia patch embedding network (MinNet) model for latent fingerprint recognition task. Embedding vectors generated for a fixed-size patch extracted around a minutia are used in the ...
详细信息
ISBN:
(纸本)9781665487399
In this study, we proposed a novel minutia patch embedding network (MinNet) model for latent fingerprint recognition task. Embedding vectors generated for a fixed-size patch extracted around a minutia are used in the local similarity assignment algorithm to produce a global similarity match score. Unlike earlier minutia embedding models that aim to discriminate between latent image and sensor image minutia pair embeddings using similar to 2 distance between the embedding vectors in the training process, MinNet model jointly optimizes the spatial and angular distribution of neighboring minutiae and ridge flows of the patches. Even though the proposed model is trained using weakly labeled training data, it produces state-of-the-art results thanks to it ability to generate discriminative embeddings. Proposed method has been evaluated on several public and private datasets and compared to popular latent fingerprint recognition methods presented in earlier studies. Our proposed method significantly outperforms existing methods on all three databases utilized in our study.
Naturalistic driving studies with computervision techniques have become an emergent research issue. The objective is to classify the distracted behavior actions by drivers. Specifically, this issue is regarded as tem...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Naturalistic driving studies with computervision techniques have become an emergent research issue. The objective is to classify the distracted behavior actions by drivers. Specifically, this issue is regarded as temporal action localization (TAL) of untrimmed videos, which is a challenging task in the research field of video analysis. Particularly, TAL remains as one of the most challenging unsolved problems in computervision that requires not only the recognition of action but the localization of the start and end times of each action. Most state-of-the-art approaches adopt complex architectures, which are expensive training and inefficient inference time. In this study, we propose a new framework for untrimmed naturalistic driving videos by utilizing the results from 3D action recognition with video clip classification for short temporal and spatial correlation. Then, simple post-processing based on data-driven is presented for long temporal correlation in untrimmed videos. The proposed method is evaluated on the AI City Challenge 2022 dataset for Naturalistic Driving Action recognition. Accordingly, our method achieves the top 1 on the public leaderboard of the challenge.
State-of-the-art object recognition methods do not generalize well to unseen domains. Work in domain generalization has attempted to bridge domains by increasing feature compatibility, but has focused on standard, app...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
State-of-the-art object recognition methods do not generalize well to unseen domains. Work in domain generalization has attempted to bridge domains by increasing feature compatibility, but has focused on standard, appearance-based representations. We show the potential of shape-based representations to increase domain robustness. We compare two types of shape-based representations: one trains a convolutional network over edge features, and another computes a soft, dense medial axis transform. We show the complementary strengths of these representations for different types of domains, and the effect of the amount of texture that is preserved. We show that our shape-based techniques better leverage data augmentations for domain generalization, and are more effective at texture bias mitigation than shape-inducing augmentations. Finally, we show that when the convolutional network in state-of-the-art domain generalization methods is replaced with one that explicitly captures shape, we obtain improved results.
The 5th ABAW Competition is part of the respective Workshop held in conjunction with ieeecvpr 2023 and is a continuation of the Competitions held at ECCV 2022, ieeecvpr 2022, ICCV 2021, ieee FG 2020 and cvpr 2017 Co...
暂无评论