Equivariance w.r.t. geometric transformations in neural networks improves data efficiency, parameter efficiency and robustness to out-of-domain perspective shifts. When equivariance is not designed into a neural netwo...
详细信息
Zero-shot learning (ZSL) tackles the problem of recognizing unseen classes using only semantic descriptions, e.g., attributes. Current zero-shot learning techniques all assume that a single vector of attributes suffic...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Zero-shot learning (ZSL) tackles the problem of recognizing unseen classes using only semantic descriptions, e.g., attributes. Current zero-shot learning techniques all assume that a single vector of attributes suffices to describe each category. We show that this assumption is incorrect. Many classes in real world problems have multiple modes of appearance: male and female birds vary in appearance, for instance. Domain experts know this and can provide attribute descriptions of the chief modes of appearance for each class. Motivated by this, we propose the task of multi-modal zero-shot learning, where the learner must learn from these multimodal attribute descriptions. We present a technique for addressing this problem of multimodal ZSL that outperforms the unimodal counterpart significantly. We posit that multimodal ZSL is more practical for real-world problems where complex intra-class variation is common.
Temporal action localization for untrimmed videos is a difficult problem in computervision. It is challenge to infer the start and end of activity instances on small-scale datasets covering multi-view information acc...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Temporal action localization for untrimmed videos is a difficult problem in computervision. It is challenge to infer the start and end of activity instances on small-scale datasets covering multi-view information accurately. In this paper, we propose an effective activity temporal localization and classification method to localize the temporal boundaries and predict the class label of activities for naturalistic driving. Our approach includes (i) a distraction behavior recognition and localization method in naturalistic driving videos on small-scale data sets, (ii) a strategy that uses multi-branch network to make full use of information from different channels, (iii)a post-processing method for selecting and correcting temporal range to ensure that our system finds accurate boundaries. In addition, the frame-level object detection information is also utilized. Extensive experiments prove the effectiveness of our method and we rank the 6th on the Test-A2 of the 6th AI City Challenge track 3.
A false negative in object detection describes an object that was not correctly localised and classified by a detector. In prior work, we introduced five 'false negative mechanisms' that identify the specific ...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
A false negative in object detection describes an object that was not correctly localised and classified by a detector. In prior work, we introduced five 'false negative mechanisms' that identify the specific component inside the detector architecture that failed to detect the object. Using these mechanisms, we explore how different computervision datasets and their inherent characteristics can influence object detector failures. Specifically, we investigate the false negative mechanisms of Faster R-CNN and RetinaNet across five computervision datasets, namely Microsoft COCO, Pascal VOC, ExDark, ObjectNet, and COD10K. Our results show that object size and class influence the false negative mechanisms of object detectors. We also show that comparing the false negative mechanisms of a single object class across different datasets can highlight potentially unknown biases in datasets.
Neural Architecture Search (NAS) can automatically design model architecture with better performance. Current researchers have searched for local architecture similar to block, then stacked to construct entire models,...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Neural Architecture Search (NAS) can automatically design model architecture with better performance. Current researchers have searched for local architecture similar to block, then stacked to construct entire models, or searched the entire model based on a manually designed benchmark module. There is no method to directly search the architecture of the global(entire) model at the operation level. The purpose of this article is to search the entire model directly in the operation level search space. We analyzed the search space of past methods which searching for local architectures, then a working mode for global model architecture search named CAM is proposed. Proposed CAM decouples the architectural parameters of the entire model which can complete the entire model architecture search with few architecture parameters. In the experiment, the test error 2.68 % in CIFAR-10 is obtained by the proposed method at the global architecture level, which can compare with the stage-of-art local architecture search methods.
The AdderNet was recently developed as a way to implement deep neural networks without needing multiplication operations to combine weights and inputs. Instead, absolute values of the difference between weights and in...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
The AdderNet was recently developed as a way to implement deep neural networks without needing multiplication operations to combine weights and inputs. Instead, absolute values of the difference between weights and inputs are used, greatly reducing the gate-level implementation complexity. Training of AdderNets is challenging, however, and the loss curves during training tend to fluctuate significantly. In this paper we propose the Conjugate Adder Network, or CAddNet, which uses the difference between the absolute values of conjugate pairs of inputs and the weights. We show that this can be implemented simply via a single minimum operation, resulting in a roughly 50% reduction in logic gate complexity as compared with AdderNets. The CAddNet method also stabilizes training as compared with AdderNets, yielding training curves similar to standard CNNs.
We introduce a lightweight simulation and modeling framework, HMIway-env, for studying human-machine teaming in the context of driving. The goal of the framework is to accelerate the development of adaptive AI systems...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
We introduce a lightweight simulation and modeling framework, HMIway-env, for studying human-machine teaming in the context of driving. The goal of the framework is to accelerate the development of adaptive AI systems which can respond to individual driver states, traits, and preferences, by serving as a data-generation engine and training environment for learning personalized human-AI teaming policies. We extend highway-env, an OpenAI Gym-based simulator environment, to enable specification of human driver behavior, and design of vehicle-driver interactions and outcomes. We describe one instance of our framework incorporating models for distracted and cautious driving, which we validate through crowd-sourced feedback, and show early experimental results toward the training of better intervention policies.
The Visual Genome Dataset is the de facto standard dataset used in Scene Graph generation. It contains a large collection of images with corresponding object and relationship labels. We explore the lingual aspect of t...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
The Visual Genome Dataset is the de facto standard dataset used in Scene Graph generation. It contains a large collection of images with corresponding object and relationship labels. We explore the lingual aspect of the relationship predicates and find that very few symmetric/inverse relationships are represented in the dataset(for example, 'above' and 'under'). We believe this is linked to human spatial cognition, and posit that labelling bias stemming from human representations of relationships creates asymmetric relationship labels that span the whole dataset. We also perform a 2D topological analysis of the bounding boxes linked by different relationship predicates. This analysis sheds light on certain classes and their ambiguity wherein more frequent classes are semantically overloaded and therefore quite confusing. Finally we show that when reduced to more lingually and topologically well defined spatial relationships scene graph generation algorithm performance improves tremendously, but scene graph generators are still far from perfect.
Multi-input multi-output architectures propose to train multiple subnetworks within one base network and then average the subnetwork predictions to benefit from ensembling for free. Despite some relative success, thes...
详细信息
ISBN:
(纸本)9781665487399
Multi-input multi-output architectures propose to train multiple subnetworks within one base network and then average the subnetwork predictions to benefit from ensembling for free. Despite some relative success, these architectures are wasteful in their use of parameters. Indeed, we highlight in this paper that the learned subnetwork fail to share even generic features which limits their applicability on smaller mobile and AR/VR devices. We posit this behavior stems from an ill-posed part of the multi-input multi-output framework. To solve this issue, we propose a novel unmixing step in MIMO architectures that allows subnetworks to properly share features. Preliminary experiments on CIFAR 100 show our adjustments allow feature sharing and improve model performance for small architectures.
Recent 3D room layout recovery approaches mostly concentrate on Manhattan layouts, where the vertical walls are orthogonal with respect to each other, even though there are many rooms with non-Manhattan layouts in the...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Recent 3D room layout recovery approaches mostly concentrate on Manhattan layouts, where the vertical walls are orthogonal with respect to each other, even though there are many rooms with non-Manhattan layouts in the real world. This paper presents a room layout recovery method generalizing across Manhattan and non-Manhattan worlds. Without introducing additional supervision, we extend current Manhattan layout recovery methods by predicting an extra surface normal feature, which is further used for an adaptive post-processing to reconstruct layouts of arbitrary shapes. Experimental results show that our method has a great improvement on non-Manhattan layouts while being capable of generalizing across Manhattan and non-Manhattan layouts.
暂无评论