Deep learning-based approaches have gained popularity for environment perception tasks such as semantic segmentation and object detection from images. However, the different nature of a data-driven deep neural nets (D...
详细信息
ISBN:
(纸本)9781728193601
Deep learning-based approaches have gained popularity for environment perception tasks such as semantic segmentation and object detection from images. However, the different nature of a data-driven deep neural nets (DNN) to conventional software is a challenge for practical software verification. In this work, we show how existing methods from software engineering provide benefits for the development of a DNN and in particular for dataset design and analysis. We show how combinatorial testing based on a domain model can be leveraged for generating test sets providing coverage guarantees with respect to important environmental features and their interaction. Additionally, we show how our approach can be used for growing a dataset, i.e. to identify where data is missing and should be collected next. We evaluate our approach on an internal use case and two public datasets.
Attribution methods can provide powerful insights into the reasons for a classifier's decision. We argue that a key desideratum of an explanation method is its robustness to input hyperparameters which are often r...
详细信息
ISBN:
(纸本)9781728193601
Attribution methods can provide powerful insights into the reasons for a classifier's decision. We argue that a key desideratum of an explanation method is its robustness to input hyperparameters which are often randomly set or empirically tuned. High sensitivity to arbitrary hyperparameter choices does not only impede reproducibility but also questions the correctness of an explanation and impairs the trust of end-users. In this paper, we provide a thorough empirical study on the sensitivity of existing attribution methods. We found an alarming trend that many methods are highly sensitive to changes in their common hyperparameters e.g. even changing a random seed can yield a different explanation! Interestingly, such sensitivity is not reflected in the average explanation accuracy scores over the dataset as commonly reported in the literature. In addition, explanations generated for robust classifiers (i.e. which are trained to be invariant to pixel-wise perturbations) are surprisingly more robust than those generated for regular classifiers.
Explainable artificial intelligence (XAI) methods rely on access to model architecture and parameters that is not always feasible for most users, practitioners, and regulators. Inspired by cognitive psychology, we pre...
详细信息
ISBN:
(纸本)9781728193601
Explainable artificial intelligence (XAI) methods rely on access to model architecture and parameters that is not always feasible for most users, practitioners, and regulators. Inspired by cognitive psychology, we present a case for response times (RTs) as a technique for XAI. RTs are observable without access to the model. Moreover, dynamic inference models performing conditional computation generate variable RTs for visual learning tasks depending on hierarchical representations. We show that MSDNet, a conditional computation model with early-exit architecture, exhibits slower RT for images with more complex features in the ObjectNet test set, as well as the human phenomenon of scene grammar, where object recognition depends on intra-scene object-object relationships. These results cast light on MSDNet's feature space without opening the black box and illustrate the promise of RT methods for XAI.
In real-world environments, such as the vehicle cabin, we have to deal with novel concepts as they arise. To this end, we introduce ZS-Drive&Act - the first zero-shot activity classification benchmark specifically...
详细信息
ISBN:
(纸本)9781728193601
In real-world environments, such as the vehicle cabin, we have to deal with novel concepts as they arise. To this end, we introduce ZS-Drive&Act - the first zero-shot activity classification benchmark specifically aimed at recognizing previously unseen driver behaviors. ZS-Drive&Act is unique due to its focus on fine-grained activities and presence of activity-driven attributes, which are automatically derived from a hierarchical annotation scheme. We adopt and evaluate multiple off-the-shelf zero-shot learning methods on our benchmark, showcasing the difficulties of such models when moving to our application-specific task. We further extend the prominent method based on feature generating Wasserstein GANs with a fusion strategy for linking semantic attributes and word vectors representing the behavior labels. Our experiments demonstrate the effectiveness of leveraging both semantic spaces simultaneously, improving the recognition rate by 2.79%.
Image classification has been studied extensively, but there has been limited work in using unconventional, external guidance other than traditional image-label pairs for training. We present a set of methods for leve...
详细信息
ISBN:
(数字)9781728193601
ISBN:
(纸本)9781728193601
Image classification has been studied extensively, but there has been limited work in using unconventional, external guidance other than traditional image-label pairs for training. We present a set of methods for leveraging information about the semantic hierarchy embedded in class labels. We first inject label-hierarchy knowledge into an arbitrary CNN-based classifier and empirically show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance. Taking a step further in this direction, we model more explicitly the label-label and label-image interactions using order-preserving embeddings governed by both Euclidean and hyperbolic geometries, prevalent in natural language, and tailor them to hierarchical image classification and representation learning. We empirically validate all the models on the hierarchical ETHEC dataset.
Artificial, CNN-generated images are now of such high quality that humans have trouble distinguishing them from real images. Several algorithmic detection methods have been proposed, but these appear to generalize poo...
详细信息
ISBN:
(纸本)9781728193601
Artificial, CNN-generated images are now of such high quality that humans have trouble distinguishing them from real images. Several algorithmic detection methods have been proposed, but these appear to generalize poorly to data from unknown sources, making them infeasible for real-world scenarios. In this work, we present a framework for evaluating detection methods under real-world conditions, consisting of cross-model, cross-data, and post-processing evaluation, and we evaluate state-of-the-art detection methods using the proposed framework. Furthermore, we examine the usefulness of commonly used image pre-processing methods. Lastly, we evaluate human performance on detecting CNN-generated images, along with factors that influence this performance, by conducting an online survey. Our results suggest that CNN-based detection methods are not yet robust enough to be used in real-world scenarios.
In this paper, we address a key limitation of existing 2D face recognition methods: robustness to occlusions. To accomplish this task, we systematically analyzed the impact of facial attributes on the performance of a...
详细信息
ISBN:
(纸本)9781728193601
In this paper, we address a key limitation of existing 2D face recognition methods: robustness to occlusions. To accomplish this task, we systematically analyzed the impact of facial attributes on the performance of a state-of-the-art face recognition method and through extensive experimentation, quantitatively analyzed the performance degradation under different types of occlusion. Our proposed Occlusion-aware face recognition (OREO) approach learned discriminative facial templates despite the presence of such occlusions. First, an attention mechanism was proposed that extracted local identity-related region. The local features were then aggregated with the global representations to form a single template. Second, a simple, yet effective, training strategy was introduced to balance the non-occluded and occluded facial images. Extensive experiments demonstrated that OREO improved the generalization ability of face recognition under occlusions by 10.17% in a single-image-based setting and outperformed the baseline by approximately 2% in terms of rank-1 accuracy in an image-set-based scenario.
The goal of few-shot image learning is to utilize a very small amount of training examples in order to train a machine learning model to recognize a given number of image classes. While humans can perform such a task ...
详细信息
ISBN:
(纸本)9781728193601
The goal of few-shot image learning is to utilize a very small amount of training examples in order to train a machine learning model to recognize a given number of image classes. While humans can perform such a task pretty much effortlessly, applying the same mechanism to deep learning visual recognition systems is a much more difficult task, having a wide range of real-world visual recognition applications. In this paper, we investigate the behavior of such few-shot methods in the context of drone vision cinematography for sports event filming, in order to recognize new image classes by taking into consideration the fact that this new class we wish to identify is a subclass of an already known class. More specifically we use UAV footage to recognize certain types of athletes, belonging to a subset of an original athlete class, utilizing only a handful of recorded images of this athlete subclass. We examine the effects of such methods on image recognition accuracy while proposing a novel approach for accuracy optimizations. The overall task is evaluated on actual cycling race UAV footage.
We present a new visual parsing method based on convolutional neural networks for handwritten mathematical formulas. The Query-Driven Global Graph Attention (QD-GGA) parsing model employs multi-task learning, and uses...
详细信息
ISBN:
(纸本)9781728193601
We present a new visual parsing method based on convolutional neural networks for handwritten mathematical formulas. The Query-Driven Global Graph Attention (QD-GGA) parsing model employs multi-task learning, and uses a single feature representation for locating, classifying, and relating symbols. First, a Line-Of-Sight (LOS) graph is computed over the handwritten strokes in a formula. Second, class distributions for LOS nodes and edges are obtained using query-specific feature filters (i.e., attention) in a single feed-forward pass. Finally, a Maximum Spanning Tree (MST) is extracted from the weighted graph. Our preliminary results show that this is a promising new approach for visual parsing of handwritten formulas. Our data and source code are publicly available.
Despite remarkable improvements in speed and accuracy, convolutional neural networks (CNNs) still typically operate as monolithic entities at inference time. This poses a challenge for resource-constrained practical a...
详细信息
ISBN:
(数字)9781728193601
ISBN:
(纸本)9781728193601
Despite remarkable improvements in speed and accuracy, convolutional neural networks (CNNs) still typically operate as monolithic entities at inference time. This poses a challenge for resource-constrained practical applications, where both computational budgets and performance needs can vary with the situation. To address these constraints, we propose the Any-Width Network (AWN), an adjustable-width CNN architecture and associated training routine that allow for fine-grained control over speed and accuracy during inference. Our key innovation is the use of lower-triangular weight matrices which explicitly address width-varying batch statistics while being naturally suited for multi-width operations. We also show that this design facilitates an efficient training routine based on random width sampling. We empirically demonstrate that our proposed AWNs compare favorably to existing methods while providing maximally granular control during inference.
暂无评论