Image classification has been studied extensively, but there has been limited work in using unconventional, external guidance other than traditional image-label pairs for training. We present a set of methods for leve...
详细信息
ISBN:
(数字)9781728193601
ISBN:
(纸本)9781728193601
Image classification has been studied extensively, but there has been limited work in using unconventional, external guidance other than traditional image-label pairs for training. We present a set of methods for leveraging information about the semantic hierarchy embedded in class labels. We first inject label-hierarchy knowledge into an arbitrary CNN-based classifier and empirically show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance. Taking a step further in this direction, we model more explicitly the label-label and label-image interactions using order-preserving embeddings governed by both Euclidean and hyperbolic geometries, prevalent in natural language, and tailor them to hierarchical image classification and representation learning. We empirically validate all the models on the hierarchical ETHEC dataset.
Artificial, CNN-generated images are now of such high quality that humans have trouble distinguishing them from real images. Several algorithmic detection methods have been proposed, but these appear to generalize poo...
详细信息
ISBN:
(纸本)9781728193601
Artificial, CNN-generated images are now of such high quality that humans have trouble distinguishing them from real images. Several algorithmic detection methods have been proposed, but these appear to generalize poorly to data from unknown sources, making them infeasible for real-world scenarios. In this work, we present a framework for evaluating detection methods under real-world conditions, consisting of cross-model, cross-data, and post-processing evaluation, and we evaluate state-of-the-art detection methods using the proposed framework. Furthermore, we examine the usefulness of commonly used image pre-processing methods. Lastly, we evaluate human performance on detecting CNN-generated images, along with factors that influence this performance, by conducting an online survey. Our results suggest that CNN-based detection methods are not yet robust enough to be used in real-world scenarios.
In this paper, we address a key limitation of existing 2D face recognition methods: robustness to occlusions. To accomplish this task, we systematically analyzed the impact of facial attributes on the performance of a...
详细信息
ISBN:
(纸本)9781728193601
In this paper, we address a key limitation of existing 2D face recognition methods: robustness to occlusions. To accomplish this task, we systematically analyzed the impact of facial attributes on the performance of a state-of-the-art face recognition method and through extensive experimentation, quantitatively analyzed the performance degradation under different types of occlusion. Our proposed Occlusion-aware face recognition (OREO) approach learned discriminative facial templates despite the presence of such occlusions. First, an attention mechanism was proposed that extracted local identity-related region. The local features were then aggregated with the global representations to form a single template. Second, a simple, yet effective, training strategy was introduced to balance the non-occluded and occluded facial images. Extensive experiments demonstrated that OREO improved the generalization ability of face recognition under occlusions by 10.17% in a single-image-based setting and outperformed the baseline by approximately 2% in terms of rank-1 accuracy in an image-set-based scenario.
The goal of few-shot image learning is to utilize a very small amount of training examples in order to train a machine learning model to recognize a given number of image classes. While humans can perform such a task ...
详细信息
ISBN:
(纸本)9781728193601
The goal of few-shot image learning is to utilize a very small amount of training examples in order to train a machine learning model to recognize a given number of image classes. While humans can perform such a task pretty much effortlessly, applying the same mechanism to deep learning visual recognition systems is a much more difficult task, having a wide range of real-world visual recognition applications. In this paper, we investigate the behavior of such few-shot methods in the context of drone vision cinematography for sports event filming, in order to recognize new image classes by taking into consideration the fact that this new class we wish to identify is a subclass of an already known class. More specifically we use UAV footage to recognize certain types of athletes, belonging to a subset of an original athlete class, utilizing only a handful of recorded images of this athlete subclass. We examine the effects of such methods on image recognition accuracy while proposing a novel approach for accuracy optimizations. The overall task is evaluated on actual cycling race UAV footage.
We present a new visual parsing method based on convolutional neural networks for handwritten mathematical formulas. The Query-Driven Global Graph Attention (QD-GGA) parsing model employs multi-task learning, and uses...
详细信息
ISBN:
(纸本)9781728193601
We present a new visual parsing method based on convolutional neural networks for handwritten mathematical formulas. The Query-Driven Global Graph Attention (QD-GGA) parsing model employs multi-task learning, and uses a single feature representation for locating, classifying, and relating symbols. First, a Line-Of-Sight (LOS) graph is computed over the handwritten strokes in a formula. Second, class distributions for LOS nodes and edges are obtained using query-specific feature filters (i.e., attention) in a single feed-forward pass. Finally, a Maximum Spanning Tree (MST) is extracted from the weighted graph. Our preliminary results show that this is a promising new approach for visual parsing of handwritten formulas. Our data and source code are publicly available.
Despite remarkable improvements in speed and accuracy, convolutional neural networks (CNNs) still typically operate as monolithic entities at inference time. This poses a challenge for resource-constrained practical a...
详细信息
ISBN:
(数字)9781728193601
ISBN:
(纸本)9781728193601
Despite remarkable improvements in speed and accuracy, convolutional neural networks (CNNs) still typically operate as monolithic entities at inference time. This poses a challenge for resource-constrained practical applications, where both computational budgets and performance needs can vary with the situation. To address these constraints, we propose the Any-Width Network (AWN), an adjustable-width CNN architecture and associated training routine that allow for fine-grained control over speed and accuracy during inference. Our key innovation is the use of lower-triangular weight matrices which explicitly address width-varying batch statistics while being naturally suited for multi-width operations. We also show that this design facilitates an efficient training routine based on random width sampling. We empirically demonstrate that our proposed AWNs compare favorably to existing methods while providing maximally granular control during inference.
Automatic understanding of facial behavior is hampered by factors such as occlusion, illumination, non-frontal head pose, low image resolution, or limitations in labeled training data. The EmotioNet 2020 Challenge add...
详细信息
ISBN:
(数字)9781728193601
ISBN:
(纸本)9781728193601
Automatic understanding of facial behavior is hampered by factors such as occlusion, illumination, non-frontal head pose, low image resolution, or limitations in labeled training data. The EmotioNet 2020 Challenge addresses these issues through a competition on recognizing facial action units on in-the-wild data. We propose to combine multi-task and self-training to make best use of the small manually / fully labeled and the large weakly / partially labeled training datasets provided by the challenge organizers. With our approach (and without using additional data) we achieve the second place in the 2020 challenge - with a performance gap of only 0.05% to the challenge winner and of 5.9% to the third place. On the 2018 challenge evaluation data our method outperforms all other known results.
Recent methods for people detection in overhead, fisheye images either use radially-aligned bounding boxes to represent people, assuming people always appear along image radius or require significant pre-/post-process...
详细信息
ISBN:
(纸本)9781728193601
Recent methods for people detection in overhead, fisheye images either use radially-aligned bounding boxes to represent people, assuming people always appear along image radius or require significant pre-/post-processing which radically increases computational complexity. In this work, we develop an end-to-end rotation-aware people detection method, named RAPiD, that detects people using arbitrarily-oriented bounding boxes. Our fully-convolutional neural network directly regresses the angle of each bounding box using a periodic loss function, which accounts for angle periodicities. We have also created a new dataset(1) with spatio-temporal annotations of rotated bounding boxes, for people detection as well as other vision tasks in overhead fisheye videos. We show that our simple, yet effective method outperforms state-of-the-art results on three fisheye-image datasets. The source code for RAPiD is publicly available(2).
Whilst face recognition applications are becoming increasingly prevalent within our daily lives, leading approaches in the field still suffer from performance bias to the detriment of some racial profiles within socie...
详细信息
ISBN:
(纸本)9781728193601
Whilst face recognition applications are becoming increasingly prevalent within our daily lives, leading approaches in the field still suffer from performance bias to the detriment of some racial profiles within society. In this study, we propose a novel adversarial derived data augmentation methodology that aims to enable dataset balance at a per-subject level via the use of image-to-image transformation for the transfer of sensitive racial characteristic facial features. Our aim is to automatically construct a synthesised dataset by transforming facial images across varying racial domains, while still preserving identity-related features, such that racially dependant features subsequently become irrelevant within the determination of subject identity. We construct our experiments on three significant face recognition variants: Softmax, CosFace and ArcFace loss over a common convolutional neural network backbone. In a side-by-side comparison, we show the positive impact our proposed technique can have on the recognition performance for (racial) minority groups within an originally imbalanced training dataset by reducing the per-race variance in performance.
Human pose estimation is a well-known problem in computervision to locate joint positions. Existing datasets for learning of poses are observed to be not challenging enough in terms of pose diversity, object occlusio...
详细信息
ISBN:
(纸本)9781728193601
Human pose estimation is a well-known problem in computervision to locate joint positions. Existing datasets for learning of poses are observed to be not challenging enough in terms of pose diversity, object occlusion and view points. This makes the pose annotation process relatively simple and restricts the application of the models that have been trained on them. To handle more variety in human poses, we propose the concept of fine-grained hierarchical pose classification, in which we formulate the pose estimation as a classification task, and propose a dataset, Yoga-82, for large-scale yoga pose recognition with 82 classes. Yoga-82 consists of complex poses where fine annotations may not be possible. To resolve this, we provide hierarchical labels for yoga poses based on the body configuration of the pose. The dataset contains a three-level hierarchy including body positions, variations in body positions, and the actual pose names. We present the classification accuracy of the state-of-the-art convolutional neural network architectures on Yoga-82. We also present several hierarchical variants of DenseNet in order to utilize the hierarchical labels.
暂无评论