To identify different cargoes on vehicles accurately in scanned image is a tough issue. An unsupervised image decomposition method, based on a novel dual-stage double-DIP (DDIP) network, named as Quad-DIP, was propose...
详细信息
ISBN:
(纸本)9781665448994
To identify different cargoes on vehicles accurately in scanned image is a tough issue. An unsupervised image decomposition method, based on a novel dual-stage double-DIP (DDIP) network, named as Quad-DIP, was proposed for the decomposition of X-ray scanned image of a cargo vehicle into vehicle and goods separately without ground truth data. The model could be effectively trained based on the fact that, firstly, the structure contents of same type vehicles were similar in the images, and secondly, the contents of goods on different vehicles were different and independent to each other. Our work focus on the content-wise correlation between them. The vehicle structure could be identified from two inputs containing the same type of vehicles, and the image could be decomposed into two components of vehicle structure and cargo information accurately after the training of Quad-DIP. We examine the accuracy of this method on the collected X-ray cargo vehicle dataset. The decomposition of Quad-DIP was more accurate than those of other published methods in literature.
Continual learning aims to learn tasks sequentially, with (often severe) constraints on the storage of old learning samples, without suffering from catastrophic forgetting. In this work, we propose prescient continual...
详细信息
ISBN:
(纸本)9781665448994
Continual learning aims to learn tasks sequentially, with (often severe) constraints on the storage of old learning samples, without suffering from catastrophic forgetting. In this work, we propose prescient continual learning, a novel experimental setting, to incorporate existing information about the classes, prior to any training data. Usually, each task in a traditional continual learning setting evaluates the model on present and past classes, the latter with a limited number of training samples. Our setting adds future classes, with no training samples at all. We introduce Ghost Model, a representation-learning-based model for continual learning using ideas from zero-shot learning. A generative model of the representation space in concert with a careful adjustment of the losses allows us to exploit insights from future classes to constraint the spatial arrangement of the past and current classes. Quantitative results on the AwA2 and aP&Y datasets and detailed visualizations showcase the interest of this new setting and the method we propose to address it.(1)
Soccer broadcast video understanding has been drawing a lot of attention in recent years within data scientists and industrial companies. This is mainly due to the lucrative potential unlocked by effective deep learni...
详细信息
ISBN:
(纸本)9781665448994
Soccer broadcast video understanding has been drawing a lot of attention in recent years within data scientists and industrial companies. This is mainly due to the lucrative potential unlocked by effective deep learning techniques developed in the field of computervision. In this work, we focus on the topic of camera calibration and on its current limitations for the scientific community. More precisely, we tackle the absence of a large-scale calibration dataset and of a public calibration network trained on such a dataset. Specifically, we distill a powerful commercial calibration tool in a recent neural network architecture on the large-scale SoccerNet dataset, composed of untrimmed broadcast videos of 500 soccer games. We further release our distilled network, and leverage it to provide 3 ways of representing the calibration results along with player localization. Finally, we exploit those representations within the current best architecture for the action spotting task of SoccerNet-v2, and achieve new state-of-the-art performances.
Neural image compression (NIC) is a new coding paradigm where coding capabilities are captured by deep models learned from data. This data-driven nature enables new potential functionalities. In this paper, we study t...
详细信息
ISBN:
(纸本)9781665448994
Neural image compression (NIC) is a new coding paradigm where coding capabilities are captured by deep models learned from data. This data-driven nature enables new potential functionalities. In this paper, we study the adaptability of codecs to custom domains of interest. We show that NIC codecs are transferable and that they can be adapted with relatively few target domain images. However, naive adaptation interferes with the solution optimized for the original source domain, resulting in forgetting the original coding capabilities in that domain, and may even break the compatibility with previously encoded bitstreams. Addressing these problems, we propose Codec Adaptation without Forgetting (CAwF), a framework that can avoid these problems by adding a small amount of custom parameters, where the source codec remains embedded and unchanged during the adaptation process. Experiments demonstrate its effectiveness and provide useful insights on the characteristics of catastrophic interference in NIC.
This paper proposes two novel knowledge transfer techniques for class-incremental learning (CIL). First, we propose data-free generative replay (DF-GR) to mitigate catastrophic forgetting in CIL by using synthetic sam...
详细信息
ISBN:
(纸本)9781665448994
This paper proposes two novel knowledge transfer techniques for class-incremental learning (CIL). First, we propose data-free generative replay (DF-GR) to mitigate catastrophic forgetting in CIL by using synthetic samples from a generative model. In the conventional generative replay, the generative model is pre-trained for old data and shared in extra memory for later incremental learning. In our proposed DF-GR, we train a generative model from scratch without using any training data, based on the pre-trained classification model from the past, so we curtail the cost of sharing pre-trained generative models. Second, we introduce dual-teacher information distillation (DT-ID) for knowledge distillation from two teachers to one student. In CIL, we use DT-ID to learn new classes incrementally based on the pre-trained model for old classes and another model (pre-)trained on the new data for new classes. We implemented the proposed schemes on top of one of the state-of-the-art CIL methods and showed the performance improvement on CIFAR-100 and ImageNet datasets.
Among the different deepfake generation techniques, flow-based methods appear as natural candidates. Due to the property of invertibility, flow-based methods eliminate the necessity of person-specific training and are...
详细信息
ISBN:
(纸本)9781665448994
Among the different deepfake generation techniques, flow-based methods appear as natural candidates. Due to the property of invertibility, flow-based methods eliminate the necessity of person-specific training and are able to reconstruct any input image almost perfectly to human perception. We present a method for deepfake generation based on facial expression transfer using flow-based generative models. Our approach relies on simple latent vector operations akin to the ones used for attribute manipulation, but for transferring expressions between identity source-target pairs. We show the feasibility of this approach using a pre-trained Glow model and small sets of source and target images, not necessarily considered during prior training. We also provide an evaluation pipeline of the generated images in terms of similarities between identities and Action Units encoding the expression to be transferred. Our results show that an efficient expression transfer is feasible by using the proposed approach setting up a first precedent in deepfake content creation, and its evaluation, independently of the training identities.
Multi-Camera People Tracking is a multifaceted issue that requires the integration of several computervision tasks, such as Object Detection, Multiple Object Tracking, and Person Re-identification. This study present...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Multi-Camera People Tracking is a multifaceted issue that requires the integration of several computervision tasks, such as Object Detection, Multiple Object Tracking, and Person Re-identification. This study presents a multi-camera people tracking method that comprises four main processes: (1) single camera people tracking based on overlap suppression clustering, (2) representative image extraction using pose estimation for re-identification, (3) re-identification using hierarchical clustering with average linkage, and (4) low-identifiability tracklets *** RIIPS team achieved the highest Higher Order Tracking Accuracy (HOTA) of 71.9446% in the 2024 AI City Challenge Track 1.
We propose an automated video editing model, which we term contextual and multimodal video editing (CMVE). The model leverages visual and textual metadata describing videos, integrating essential information from both...
详细信息
ISBN:
(纸本)9781665448994
We propose an automated video editing model, which we term contextual and multimodal video editing (CMVE). The model leverages visual and textual metadata describing videos, integrating essential information from both modalities, and uses a learned editing style from a single example video to coherently combine clips. The editing model is useful for tasks such as generating news clip montages and highlight reels given a text query that describes the video storyline. The model exploits the perceptual similarity between video frames, objects in videos and text descriptions to emulate coherent video editing. Amazon Mechanical Turk participants made judgements comparing CMVE to expert human editing. Experimental results showed no significant difference in the CMVE vs human edited video in terms of matching the text query and the level of interest each generates, suggesting CMVE is able to effectively integrate semantic information across visual and textual modalities and create perceptually coherent quality videos typical of human video editors. We publicly release an online demonstration of our method.
Image colourisation is an ill-posed problem, with multiple correct solutions which depend on the context and object instances present in the input datum. Previous approaches attacked the problem either by requiring in...
详细信息
ISBN:
(纸本)9781665448994
Image colourisation is an ill-posed problem, with multiple correct solutions which depend on the context and object instances present in the input datum. Previous approaches attacked the problem either by requiring intense user-interactions or by exploiting the ability of convolutional neural networks (CNNs) in learning image-level (context) features. However, obtaining human hints is not always feasible and CNNs alone are not able to learn entity-level semantics, unless multiple models pre-trained with supervision are considered. In this work, we propose a single network, named UCapsNet, that takes into consideration the image-level features obtained through convolutions and entity-level features captured by means of capsules. Then, by skip connections over different layers, we enforce collaboration between such the convolutional and entity factors to produce a high-quality and plausible image colourisation. We pose the problem as a classification task that can be addressed by a fully unsupervised approach, thus requires no human effort. Experimental results on three benchmark datasets show that our approach outperforms existing methods on standard quality metrics and achieves state-of-the-art performances on image colourisation. A large scale user study shows that our method is preferred over existing solutions. Code available at https://***/Riretta/Image_Colourisation_WiCV_2021.
We consider the training of binary neural networks (BNNs) using the stochastic relaxation approach, which leads to stochastic binary networks (SBNs). We identify that a severe obstacle to training deep SBNs without sk...
详细信息
ISBN:
(纸本)9781665448994
We consider the training of binary neural networks (BNNs) using the stochastic relaxation approach, which leads to stochastic binary networks (SBNs). We identify that a severe obstacle to training deep SBNs without skip connections is already the initialization phase. While smaller models can be trained from a random (possibly data-driven) initialization, for deeper models and large datasets, it becomes increasingly difficult to obtain non-vanishing and low variance gradients when initializing randomly. In this work, we initialize SBNs from real-valued networks with ReLU activations. Real valued networks are well established, easier to train and benefit from many techniques to improve their generalization properties. We propose that closely approximating their internal features can provide a good initialization for SBN. We transfer features incrementally, layer-by-layer, accounting for noises in the SBN, exploiting equivalent reparametrizations of ReLU networks and using a novel transfer loss formulation. We demonstrate experimentally that with the proposed initialization, binary networks can be trained faster and achieve a higher accuracy than when initialized randomly.
暂无评论