Neural Architecture Search (NAS) defines the design of Neural Networks as a search problem. Unfortunately, NAS is computationally intensive because of various possibilities depending on the number of elements in the d...
详细信息
ISBN:
(纸本)9781665487399
Neural Architecture Search (NAS) defines the design of Neural Networks as a search problem. Unfortunately, NAS is computationally intensive because of various possibilities depending on the number of elements in the design and the possible connections between them. In this work, we extensively analyze the role of the dataset size based on several sampling approaches for reducing the dataset size (unsupervised and supervised cases) as an agnostic approach to reduce search time. We compared these techniques with four common NAS approaches in NAS-Bench-201 in roughly 1,400 experiments on CIFAR-100. One of our surprising findings is that in most cases we can reduce the amount of training data to 25%, consequently also reducing search time to 25%, while at the same time maintaining the same accuracy as if training on the full dataset. In addition, some designs derived from subsets out-perform designs derived from the full dataset by up to 22 p.p. accuracy.
Vessels move 90% of international cargo by volume, with the marine economy contributing to 5.1% of global GDP. As one of the oldest industries, the marine industry has yet to embrace innovations in modern technology t...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Vessels move 90% of international cargo by volume, with the marine economy contributing to 5.1% of global GDP. As one of the oldest industries, the marine industry has yet to embrace innovations in modern technology to safeguard the blue economy. Situational awareness from intelligent vessel systems can enable enhanced safety and decision-making for mariners. As the foundation for these intelligent systems, advanced perception technology requires sufficient real-world operational data to leverage recent AI technologies. In this work, we introduce the Sea Situational Awareness (SeaSAw) dataset - a novel dataset that is comprised of 1.9 million images with 14.6 million objects associated with 20.4 million attributes from 12 object classes, making it the largest maritime dataset for object detection, fine-grained classification and tracking. Furthermore, this dataset consists of 9 sources in combination with various RGB cameras, mounted on different moving vessels, operating in different geographic locations globally, having variations in scenario, weather and illumination conditions. This data collection took place across 4 years with rigorous efforts on data selection, annotation, management and analysis to enhance the marine perception technology.
Beyond possessing large enough size to feed data hungry machines (eg, transformers), what attributes measure the quality of a dataset? Assuming that the definitions of such attributes do exist, how do we quantify amon...
详细信息
ISBN:
(纸本)9781665487399
Beyond possessing large enough size to feed data hungry machines (eg, transformers), what attributes measure the quality of a dataset? Assuming that the definitions of such attributes do exist, how do we quantify among their relative existences? Our work attempts to explore these questions for video action detection. The task aims to spatio-temporally localize an actor and assign a relevant action class. We first analyze the existing datasets on video action detection and discuss their limitations. Next, we propose a new dataset, Multi Actor Multi Action (MAMA) which overcomes these limitations and is more suitable for real world applications. In addition, we perform a biasness study which analyzes a key property differentiating videos from static images: the temporal aspect. This reveals if the actions in these datasets really need the motion information of an actor, or whether they predict the occurrence of an action even by looking at a single frame. Finally, we investigate the widely held assumptions on the importance of temporal ordering: is temporal ordering important for detecting these actions? Such extreme experiments show existence of biases which have managed to creep into existing methods inspite of careful modeling.
Multiple datasets and open challenges for object detection have been introduced in recent years. To build more general and powerful object detection systems, in this paper, we construct a new large-scale benchmark ter...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Multiple datasets and open challenges for object detection have been introduced in recent years. To build more general and powerful object detection systems, in this paper, we construct a new large-scale benchmark termed BigDetection. Our goal is to simply leverage the training data from existing datasets (LVIS, OpenImages and Object365) with carefully designed principles, and curate a larger dataset for improved detector pre-training. Specifically, we generate a new taxonomy which unifies the heterogeneous label spaces from different sources. Our BigDetection dataset has 600 object categories and contains over 3.4M training images with 36M bounding boxes. It is much larger in multiple dimensions than previous benchmarks, which offers both opportunities and challenges. Extensive experiments demonstrate its validity as a new benchmark for evaluating different object detection methods and its effectiveness as a pre-training dataset. The code and models are available at https://***/amazonresearch/bigdetection.
Image-based virtual try-on strives to transfer the appearance of a clothing item onto the image of a target person. Existing literature focuses mainly on upper-body clothes (e.g. t-shirts, shirts, and tops) and neglec...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Image-based virtual try-on strives to transfer the appearance of a clothing item onto the image of a target person. Existing literature focuses mainly on upper-body clothes (e.g. t-shirts, shirts, and tops) and neglects full-body or lower-body items. This shortcoming arises from a main factor: current publicly available datasets for image-based virtual try-on do not account for this variety, thus limiting progress in the field. In this research activity, we introduce Dress Code, a novel dataset which contains images of multi-category clothes. Dress Code is more than 3x larger than publicly available datasets for image-based virtual try-on and features high-resolution paired images (1024 x 768) with front-view, full-body reference models. To generate HD try-on images with high visual quality and rich in details, we propose to learn fine-grained discriminating features. Specifically, we leverage a semantic-aware discriminator that makes predictions at pixel-level instead of image- or patch-level. The Dress Code dataset is publicly available at https://***/aimagelab/dress-code.
This paper reports our approach for the 2022 AI City Challenge - Naturalistic Driving Action recognition (Track 3), where the objective is to detect when and what kinds of actions that a driver performs in a long, unt...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
This paper reports our approach for the 2022 AI City Challenge - Naturalistic Driving Action recognition (Track 3), where the objective is to detect when and what kinds of actions that a driver performs in a long, untrimmed video. Our solution is built upon the single stage ActionFormer detector, in which temporal location and classification are predicted simultaneously for efficiency. The input feature for the detector is extracted offline using our proposed backbone, which we named "ConvNext-Video". However, due to the small size of the dataset, training the model to avoid over-fitting becomes challenging. To address this problem, we focus on training techniques that can improve the generalization of underlying features. Specifically, we utilize two methods: "learning without forgetting" and semi-weak supervised learning on the unlabeled data A2. Finally, we also add a second-stage classifier (SSC) using our ConvNeXt-Video backbone. The SSC Classifer is designed to combine information from multi-clips and multi-view cameras to improve the prediction precision. Our best result achieves 29.1 F1 score on the public test set. Our source code is released at link.
Semi-supervised object detection methods are widely used in autonomous driving systems, where only a fraction of objects are labeled. To propagate information from the labeled objects to the unlabeled ones, pseudo-lab...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Semi-supervised object detection methods are widely used in autonomous driving systems, where only a fraction of objects are labeled. To propagate information from the labeled objects to the unlabeled ones, pseudo-labels for unlabeled objects must be generated. Although pseudo-labels have proven to improve the performance of semisupervised object detection significantly, the applications of image-based methods to video frames result in numerous miss or false detections using such generated pseudo-labels. In this paper, we propose a new approach, PseudoProp, to generate robust pseudo-labels by leveraging motion continuity in video frames. Specifically, PseudoProp uses a novel bidirectional pseudo-label propagation approach to compensate for misdetection. A feature-based fusion technique is also used to suppress inference noise. Extensive experiments on the large-scale Cityscapes dataset demonstrate that our method outperforms the state-of-the-art semi-supervised object detection methods by 7.4% on mAP(75).
Current neural networks are compatible with high-performance GPU/CPUs. However, implementing neural networks on emerging embedded sensor for inference is challenging due to sensor's unique hardware architecture an...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Current neural networks are compatible with high-performance GPU/CPUs. However, implementing neural networks on emerging embedded sensor for inference is challenging due to sensor's unique hardware architecture and stringent computing resources. With this in mind, this work presents new methods to implement fully convolutional neural networks (FCNs) on Pixel Processor Array (PPA) sensors with many techniques to fully use the limited resources on sensor. Specifically, we, for the first time, design and train binarized FCN for both binary weights and activations using batchnorm, group convolution, and learnable threshold for binarization, producing networks small enough to be embedded on the focal plane of the PPA, with limited local memory resources, and using parallel elementary add/subtract, shifting, and bit operations only. We demonstrate the first implementation of an FCN on a PPA device, performing three convolution layers entirely in the pixel-level processors. We use this architecture to demonstrate inference generating heat maps for object segmentation and localisation at over 280 FPS using the SCAMP-5 PPA vision chip.
Neural Radiance Fields (NeRF) has emerged as the state-of-the-art method for novel view generation of complex scenes, but is very slow during inference. Recently, there have been multiple works on speeding up NeRF inf...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Neural Radiance Fields (NeRF) has emerged as the state-of-the-art method for novel view generation of complex scenes, but is very slow during inference. Recently, there have been multiple works on speeding up NeRF inference, but the state of the art methods for real-time NeRF inference rely on caching the neural network output, which occupies several giga-bytes of disk space that limits their real-world applicability. As caching the neural network of original NeRF network is not feasible, Garbin et al. proposed "FastNeRF" which factorizes the problem into 2 subnetworks - one which depends only on the 3D coordinate of a sample point and one which depends only on the 2D camera viewing direction. Although this factorization enables them to reduce the cache size and perform inference at over 200 frames per second, the memory overhead is still substantial. In this work, we propose SqueezeNeRF, which is more than 60 times memory-efficient than the sparse cache of FastNeRF and is still able to render at more than 190 frames per second on a high spec GPU during inference.
Existing continual learning techniques focus on either task incremental learning (TIL) or class incremental learning (CIL) problem, but not both. CIL and TIL differ mainly in that the task-id is provided for each test...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Existing continual learning techniques focus on either task incremental learning (TIL) or class incremental learning (CIL) problem, but not both. CIL and TIL differ mainly in that the task-id is provided for each test sample during testing for TIL, but not provided for CIL. Continual learning methods intended for one problem have limitations on the other problem. This paper proposes a novel unified approach based on out-of-distribution (OOD) detection and task masking, called CLOM, to solve both problems. The key novelty is that each task is trained as an OOD detection model rather than a traditional supervised learning model, and a task mask is trained to protect each task to prevent forgetting. Our evaluation shows that CLOM outperforms existing state-of-the-art baselines by large margins. The average TIL/CIL accuracy of CLOM over six experiments is 87.6/67.9% while that of the best baselines is only 84.4/55.0%.
暂无评论