Unsupervised domain adaptation for person re-identification (Person Re-ID) is the task of transferring the learned knowledge on the labeled source domain to the unlabeled target domain. Most of the recent papers that ...
详细信息
ISBN:
(纸本)9781665487399
Unsupervised domain adaptation for person re-identification (Person Re-ID) is the task of transferring the learned knowledge on the labeled source domain to the unlabeled target domain. Most of the recent papers that address this problem adopt an offline training setting. More precisely, the training of the Re-ID model is done assuming that we have access to the complete training target domain data set. In this paper, we argue that the target domain generally consists of a stream of data in a practical realworld application, where data is continuously increasing from the different network's cameras. The Re-ID solutions are also constrained by confidentiality regulations stating that the collected data can be stored for only a limited period, hence the model can no longer get access to previously seen target images. Therefore, we present a new yet practical online setting for Unsupervised Domain Adaptation for person Re-ID with two main constraints: Online Adaptation and Privacy Protection. We then adapt and evaluate the state-of-the-art UDA algorithms on this new online setting using the well-known Market-1501, Duke, and MSMT17 benchmarks.
Network pruning is an effective method that reduces the computation of neural networks while maintaining high performance. This enables the operation of deep neural networks in resource-limited environments. In a gene...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Network pruning is an effective method that reduces the computation of neural networks while maintaining high performance. This enables the operation of deep neural networks in resource-limited environments. In a general large network, the roles of each channel often inevitably overlap with those of others. Therefore, for more effective pruning, it is important to observe the correlation between features in the network. In this paper, we propose a novel channel pruning method, namely, the linear combination approximation of features (LCAF). We approximate each feature map by a linear combination of other feature maps in the same layer, and then remove the most approximated one. Additionally, by exploiting the linearity of the convolution operation, we propose a supporting method called weight modification, to further reduce the loss change that occurs during pruning. Extensive experiments show that LCAF achieves state-of-the-art performance in several benchmarks. Furthermore, ablations on the LCAF demonstrate the effectiveness of our approach in a variety of ways.
Beyond possessing large enough size to feed data hungry machines (eg, transformers), what attributes measure the quality of a dataset? Assuming that the definitions of such attributes do exist, how do we quantify amon...
详细信息
ISBN:
(纸本)9781665487399
Beyond possessing large enough size to feed data hungry machines (eg, transformers), what attributes measure the quality of a dataset? Assuming that the definitions of such attributes do exist, how do we quantify among their relative existences? Our work attempts to explore these questions for video action detection. The task aims to spatio-temporally localize an actor and assign a relevant action class. We first analyze the existing datasets on video action detection and discuss their limitations. Next, we propose a new dataset, Multi Actor Multi Action (MAMA) which overcomes these limitations and is more suitable for real world applications. In addition, we perform a biasness study which analyzes a key property differentiating videos from static images: the temporal aspect. This reveals if the actions in these datasets really need the motion information of an actor, or whether they predict the occurrence of an action even by looking at a single frame. Finally, we investigate the widely held assumptions on the importance of temporal ordering: is temporal ordering important for detecting these actions? Such extreme experiments show existence of biases which have managed to creep into existing methods inspite of careful modeling.
Momentum contrast [16] (MoCo) for unsupervised visual representation learning has a close performance to supervised learning, but it sometimes possesses excess parameters. Extracting a subnetwork from an over-paramete...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Momentum contrast [16] (MoCo) for unsupervised visual representation learning has a close performance to supervised learning, but it sometimes possesses excess parameters. Extracting a subnetwork from an over-parameterized unsupervised network without sacrificing performance is of particular interest to accelerate inference speed. Typical pruning methods are not applicable for MoCo, because in the fine-tune stage after pruning, the slow update of the momentum encoder will undermine the pretrained encoder. In this paper, we propose a Momentum Contrastive Pruning (MCP) method, which prunes the momentum encoder instead to obtain a momentum subnet. It maintains an unpruned momentum encoder as a smooth transition scheme to alleviate the representation gap between the encoder and momentum subnet. To fulfill the sparsity requirements of the encoder, alternating direction method of multipliers [40] (ADMM) is adopted. Experiments prove that our MCP method can obtain a momentum subnet that has almost equal performance as the over-parameterized MoCo when transferred to downstream tasks, meanwhile has much less parameters and float operations per second (FLOPs).
vision (image & video) - Language (VL) pre-training is the recent popular paradigm that achieved state-of-the-art results on multi-modal tasks like image-retrieval, video-retrieval, visual question answering etc. ...
详细信息
Image-based virtual try-on strives to transfer the appearance of a clothing item onto the image of a target person. Existing literature focuses mainly on upper-body clothes (e.g. t-shirts, shirts, and tops) and neglec...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Image-based virtual try-on strives to transfer the appearance of a clothing item onto the image of a target person. Existing literature focuses mainly on upper-body clothes (e.g. t-shirts, shirts, and tops) and neglects full-body or lower-body items. This shortcoming arises from a main factor: current publicly available datasets for image-based virtual try-on do not account for this variety, thus limiting progress in the field. In this research activity, we introduce Dress Code, a novel dataset which contains images of multi-category clothes. Dress Code is more than 3x larger than publicly available datasets for image-based virtual try-on and features high-resolution paired images (1024 x 768) with front-view, full-body reference models. To generate HD try-on images with high visual quality and rich in details, we propose to learn fine-grained discriminating features. Specifically, we leverage a semantic-aware discriminator that makes predictions at pixel-level instead of image- or patch-level. The Dress Code dataset is publicly available at https://***/aimagelab/dress-code.
Video anomaly detection (VAD) addresses the problem of automatically finding anomalous events in video data. The primary data modalities on which current VAD systems work on are monochrome or RGB images. Using depth d...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Video anomaly detection (VAD) addresses the problem of automatically finding anomalous events in video data. The primary data modalities on which current VAD systems work on are monochrome or RGB images. Using depth data in this context instead is still hardly explored in spite of depth images being a popular choice in many other computervision research areas and the increasing availability of inexpensive depth camera hardware. We evaluate the application of existing autoencoder-based methods on depth video and propose how the advantages of using depth data can be leveraged by integration into the loss function. Training is done unsupervised using normal sequences without need for any additional annotations. We show that depth allows easy extraction of auxiliary information for scene analysis in the form of a foreground mask and demonstrate its beneficial effect on the anomaly detection performance through evaluation on a large public dataset, for which we are also the first ones to present results on.
Semi-supervised object detection methods are widely used in autonomous driving systems, where only a fraction of objects are labeled. To propagate information from the labeled objects to the unlabeled ones, pseudo-lab...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Semi-supervised object detection methods are widely used in autonomous driving systems, where only a fraction of objects are labeled. To propagate information from the labeled objects to the unlabeled ones, pseudo-labels for unlabeled objects must be generated. Although pseudo-labels have proven to improve the performance of semisupervised object detection significantly, the applications of image-based methods to video frames result in numerous miss or false detections using such generated pseudo-labels. In this paper, we propose a new approach, PseudoProp, to generate robust pseudo-labels by leveraging motion continuity in video frames. Specifically, PseudoProp uses a novel bidirectional pseudo-label propagation approach to compensate for misdetection. A feature-based fusion technique is also used to suppress inference noise. Extensive experiments on the large-scale Cityscapes dataset demonstrate that our method outperforms the state-of-the-art semi-supervised object detection methods by 7.4% on mAP(75).
Learning continually from non-stationary data streams is a challenging research topic of growing popularity in the last few years. Being able to learn, adapt, and generalize continually in an efficient, effective, and...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Learning continually from non-stationary data streams is a challenging research topic of growing popularity in the last few years. Being able to learn, adapt, and generalize continually in an efficient, effective, and scalable way is fundamental for a sustainable development of Artificial Intelligent systems. However, an agent-centric view of continual learning requires learning directly from raw data, which limits the interaction between independent agents, the efficiency, and the privacy of current approaches. Instead, we argue that continual learning systems should exploit the availability of compressed information in the form of trained models. In this paper, we introduce and formalize a new paradigm named "Ex-Model Continual Learning" (ExML), where an agent learns from a sequence of previously trained models instead of raw data. We further contribute with three ex-model continual learning algorithms and an empirical setting comprising three datasets (MNIST, CIFAR-10 and CORe50), and eight scenarios, where the proposed algorithms are extensively tested. Finally, we highlight the peculiarities of the ex-model paradigm and we point out interesting future research directions.
Gait recognition is a promising biometric with unique properties for identifying individuals from a long distance by their walking patterns. In recent years, most gait recognition methods used the person's silhoue...
详细信息
ISBN:
(纸本)9781665487399
Gait recognition is a promising biometric with unique properties for identifying individuals from a long distance by their walking patterns. In recent years, most gait recognition methods used the person's silhouette to extract the gait features. However, silhouette images can lose fine-grained spatial information, suffer from (self) occlusion, and be challenging to obtain in real-world scenarios. Furthermore, these silhouettes also contain other visual clues that are not actual gait features and can be used for identification, but also to fool the system. Model-based methods do not suffer from these problems and are able to represent the temporal motion of body joints, which are actual gait features. The advances in human pose estimation started a new era for model-based gait recognition with skeleton-based gait recognition. In this work, we propose an approach based on Graph Convolutional Networks (GCNs) that combines higher-order inputs, and residual networks to an efficient architecture for gait recognition. Extensive experiments on the two popular gait datasets, CASIA-B and OUMVLP-Pose, show a massive improvement (3x) of the state-of-the-art (SotA) on the largest gait dataset OUMVLP-Pose and strong temporal modeling capabilities. Finally, we visualize our method to understand skeleton-based gait recognition better and to show that we model real gait features.
暂无评论