Many real-world machine learning systems require the ability to continually learn new knowledge. Class incremental learning receives increasing attention recently as a solution towards this goal. However, existing met...
详细信息
ISBN:
(纸本)9781728193601
Many real-world machine learning systems require the ability to continually learn new knowledge. Class incremental learning receives increasing attention recently as a solution towards this goal. However, existing methods often introduce some assumptions to simplify the problem setting, which rarely holds in real-world scenarios. In this paper, we formulate a Generalized Class Incremental Learning (GCIL) framework to systematically alleviate these restrictions, and introduce several novel realistic incremental learning scenarios. In addition, we propose a simple yet effective method, namely ReMix, which combines Exemplar Replay (ER) and Mixup to deal with different challenges in realistic GCIL setups. We demonstrate on CIFAR-100 that ReMix outperforms the state-of-the-art methods in different GCIL setups by significant margins without introducing additional computation cost.
Voronoi diagrams are highly compact representations that are used in various Graphics applications. In this work, we show how to embed a differentiable version of it - via a novel deep architecture - into a generative...
详细信息
ISBN:
(纸本)9781728193601
Voronoi diagrams are highly compact representations that are used in various Graphics applications. In this work, we show how to embed a differentiable version of it - via a novel deep architecture - into a generative deep network. By doing so, we achieve a highly compact latent embedding that is able to provide much more detailed reconstructions, both in 2D and 3D, for various shapes. In this tech report, we introduce our representation and present a set of preliminary results comparing it with recently proposed implicit occupancy networks.
We present an end-to-end trainable framework for P-frame compression in this paper. A joint motion vector (MV) and residual prediction network MV-Residual is designed to extract the ensembled features of motion repres...
详细信息
ISBN:
(数字)9781728193601
ISBN:
(纸本)9781728193601
We present an end-to-end trainable framework for P-frame compression in this paper. A joint motion vector (MV) and residual prediction network MV-Residual is designed to extract the ensembled features of motion representations and residual information by treating the two successive frames as inputs. The prior probability of the latent representations is modeled by a hyperprior auto-encoder and trained jointly with the MV-Residual network. Specially, the spatially-displaced convolution is applied for video frame prediction, in which a motion kernel for each pixel is learned to generate predicted pixel by applying the kernel at a displaced location in the source image. Finally, novel rate allocation and post-processing strategies are used to produce the final compressed bits, considering the bits constraint of the challenge. The experimental results on validation set show that the proposed optimized framework can generate the highest MS-SSIM for P-frame compression competition.
The performance of deep face recognition depends heavily on the training data. Recently, larger and larger datasets have been developed for the training of deep models. However, most face recognition training sets suf...
详细信息
ISBN:
(纸本)9781728193601
The performance of deep face recognition depends heavily on the training data. Recently, larger and larger datasets have been developed for the training of deep models. However, most face recognition training sets suffer from the class imbalance problem, and most studies ignore the benefit of optimizing dataset structures. In this paper, we study how class-balanced training can promote face recognition performance. A medium-scale face recognition training set BUPT-CBFace is built by exploring the optimal data structure from massive data. This publicly available dataset is characterized by the uniformly distributed sample size per class, as well as the balance between the number of classes and the number of samples in one class. Experimental results show that deep models trained with BUPT-CBFace can not only achieve comparable results to larger-scale datasets such as MS-Celeb-1M but also alleviate the problem of recognition bias.
In real-world environments, such as the vehicle cabin, we have to deal with novel concepts as they arise. To this end, we introduce ZS-Drive&Act - the first zero-shot activity classification benchmark specifically...
详细信息
ISBN:
(纸本)9781728193601
In real-world environments, such as the vehicle cabin, we have to deal with novel concepts as they arise. To this end, we introduce ZS-Drive&Act - the first zero-shot activity classification benchmark specifically aimed at recognizing previously unseen driver behaviors. ZS-Drive&Act is unique due to its focus on fine-grained activities and presence of activity-driven attributes, which are automatically derived from a hierarchical annotation scheme. We adopt and evaluate multiple off-the-shelf zero-shot learning methods on our benchmark, showcasing the difficulties of such models when moving to our application-specific task. We further extend the prominent method based on feature generating Wasserstein GANs with a fusion strategy for linking semantic attributes and word vectors representing the behavior labels. Our experiments demonstrate the effectiveness of leveraging both semantic spaces simultaneously, improving the recognition rate by 2.79%.
Identifying prescription medications is a frequent task for patients and medical professionals;however, this is an error-prone task as many pills have similar appearances (e.g. white round pills), which increases the ...
详细信息
ISBN:
(纸本)9781728193601
Identifying prescription medications is a frequent task for patients and medical professionals;however, this is an error-prone task as many pills have similar appearances (e.g. white round pills), which increases the risk of medication errors. In this paper, we introduce ePillID, the largest public benchmark on pill image recognition, composed of 13k images representing 8184 appearance classes (two sides for 4092 pill types). For most of the appearance classes, there exists only one reference image, making it a challenging low-shot recognition setting. We present our experimental setup and evaluation results of various baseline models on the benchmark. The best baseline using a multi-head metric-learning approach with bilinear features performed remarkably well;however, our error analysis suggests that they still fail to distinguish particularly confusing classes.
Image classification has been studied extensively, but there has been limited work in using unconventional, external guidance other than traditional image-label pairs for training. We present a set of methods for leve...
详细信息
ISBN:
(数字)9781728193601
ISBN:
(纸本)9781728193601
Image classification has been studied extensively, but there has been limited work in using unconventional, external guidance other than traditional image-label pairs for training. We present a set of methods for leveraging information about the semantic hierarchy embedded in class labels. We first inject label-hierarchy knowledge into an arbitrary CNN-based classifier and empirically show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance. Taking a step further in this direction, we model more explicitly the label-label and label-image interactions using order-preserving embeddings governed by both Euclidean and hyperbolic geometries, prevalent in natural language, and tailor them to hierarchical image classification and representation learning. We empirically validate all the models on the hierarchical ETHEC dataset.
The task of referring relationships is to localize subject and object entities in an image satisfying a relationship query, which is given in the form of . This requires simultaneous localization of the subject and ob...
详细信息
ISBN:
(纸本)9781728193601
The task of referring relationships is to localize subject and object entities in an image satisfying a relationship query, which is given in the form of . This requires simultaneous localization of the subject and object entities in a specified relationship. We introduce a simple yet effective proposal-based method for referring relationships. Different from the existing methods such as SSAS, our method can generate a high-resolution result while reducing its complexity and ambiguity. Our method is composed of two modules: a category-based proposal generation module to select the proposals related to the entities and a predicate analysis module to score the compatibility of pairs of selected proposals. We show state-of-the-art performance on the referring relationship task on two public datasets: Visual Relationship Detection and Visual Genome.
Deep learning-based approaches have gained popularity for environment perception tasks such as semantic segmentation and object detection from images. However, the different nature of a data-driven deep neural nets (D...
详细信息
ISBN:
(纸本)9781728193601
Deep learning-based approaches have gained popularity for environment perception tasks such as semantic segmentation and object detection from images. However, the different nature of a data-driven deep neural nets (DNN) to conventional software is a challenge for practical software verification. In this work, we show how existing methods from software engineering provide benefits for the development of a DNN and in particular for dataset design and analysis. We show how combinatorial testing based on a domain model can be leveraged for generating test sets providing coverage guarantees with respect to important environmental features and their interaction. Additionally, we show how our approach can be used for growing a dataset, i.e. to identify where data is missing and should be collected next. We evaluate our approach on an internal use case and two public datasets.
The goal of few-shot image learning is to utilize a very small amount of training examples in order to train a machine learning model to recognize a given number of image classes. While humans can perform such a task ...
详细信息
ISBN:
(纸本)9781728193601
The goal of few-shot image learning is to utilize a very small amount of training examples in order to train a machine learning model to recognize a given number of image classes. While humans can perform such a task pretty much effortlessly, applying the same mechanism to deep learning visual recognition systems is a much more difficult task, having a wide range of real-world visual recognition applications. In this paper, we investigate the behavior of such few-shot methods in the context of drone vision cinematography for sports event filming, in order to recognize new image classes by taking into consideration the fact that this new class we wish to identify is a subclass of an already known class. More specifically we use UAV footage to recognize certain types of athletes, belonging to a subset of an original athlete class, utilizing only a handful of recorded images of this athlete subclass. We examine the effects of such methods on image recognition accuracy while proposing a novel approach for accuracy optimizations. The overall task is evaluated on actual cycling race UAV footage.
暂无评论