A biometric detection system named finger vein recognition studies the vein patterns in human fingers that are concealed under the skin's surface through patternrecognition algorithms. It involves comparing the v...
详细信息
We propose a new type of full-body human avatars, which combines parametric mesh-based body model with a neural texture. We show that with the help of neural textures, such avatars can successfully model clothing and ...
详细信息
ISBN:
(纸本)9781665445092
We propose a new type of full-body human avatars, which combines parametric mesh-based body model with a neural texture. We show that with the help of neural textures, such avatars can successfully model clothing and hair, which usually poses a problem for mesh-based approaches. We also show how these avatars can be created from multiple frames of a video using backpropagation. We then propose a generative model for such avatars that can be trained from datasets of images and videos of people. The generative model allows us to sample random avatars as well as to create dressed avatars of people from one or few images. The code for the project is available at ***/style-people.
In this paper, a new adaptive quantization algorithm for generalized posit format is presented, to optimally represent the dynamic range and distribution of deep neural network parameters. Adaptation is achieved by mi...
详细信息
ISBN:
(纸本)9781665448994
In this paper, a new adaptive quantization algorithm for generalized posit format is presented, to optimally represent the dynamic range and distribution of deep neural network parameters. Adaptation is achieved by minimizing the intra-layer posit quantization error with a compander. The efficacy of the proposed quantization algorithm is studied within a new low-precision framework, ALPS, on ResNet-50 and EfficientNet models for classification tasks. Results assert that the accuracy and energy dissipation of low-precision DNNs using generalized posits outperform other well-known numerical formats, including standard posits.
Artificial intelligence (AI) technologies have a significant impact on developments in research and creative processes. With the quick development of computervision, the extensive need for interaction with machines a...
详细信息
In this paper, we propose a conditional early exiting framework for efficient video recognition. While existing works focus on selecting a subset of salient frames to reduce the computation costs, we propose to use a ...
详细信息
ISBN:
(纸本)9781665445092
In this paper, we propose a conditional early exiting framework for efficient video recognition. While existing works focus on selecting a subset of salient frames to reduce the computation costs, we propose to use a simple sampling strategy combined with conditional early exiting to enable efficient recognition. Our model automatically learns to process fewer frames for simpler videos and more frames for complex ones. To achieve this, we employ a cascade of gating modules to automatically determine the earliest point in processing where an inference is sufficiently reliable. We generate on-the-fly supervision signals to the gates to provide a dynamic trade-off between accuracy and computational cost. Our proposed model outperforms competing methods on three large-scale video benchmarks. In particular, on ActivityNet1.3 and mini-kinetics, we outperform the state-of-the-art efficient video recognition methods with 1.3x and 2.1x less GFLOPs, respectively. Additionally, our method sets a new state of the art for efficient video understanding on the HVU benchmark.
When a compact light source illuminates a horizontal shiny ground plane at an oblique angle, the resulting highlight is vertically oriented and highly elongated. We refer to such highlights as specular streaks. Specul...
详细信息
ISBN:
(数字)9781665490627
ISBN:
(纸本)9781665490627
When a compact light source illuminates a horizontal shiny ground plane at an oblique angle, the resulting highlight is vertically oriented and highly elongated. We refer to such highlights as specular streaks. Specular streaks occur commonly on wet roadways, especially at night, for example the reflections of street lamps or car headlights. Here we present a 3D model of specular streaks seen by a binocular observer. We show that specular streaks produce binocular disparities that are consistent with 3D near-vertical columns of light beneath the roadway. We validate this model with results on both synthetic and real images. Specular streaks are an important problem for vision-in-badweather, in particular, for autonomous driving systems or driver assistance systems that rely on stereo to estimate scene depth.
Being able to detect irrelevant test examples with respect to deployed deep learning models is paramount to properly and safely using them. In this paper, we address the problem of rejecting such out-of-distribution (...
详细信息
ISBN:
(纸本)9781665448994
Being able to detect irrelevant test examples with respect to deployed deep learning models is paramount to properly and safely using them. In this paper, we address the problem of rejecting such out-of-distribution (OOD) samples in a fully sample-free way, i.e., without requiring any access to in-distribution or OOD samples. We propose several indicators which can be computed alongside the prediction with little additional cost, assuming white-box access to the network. These indicators prove useful, stable and complementary for OOD detection on frequently-used architectures. We also introduce a surprisingly simple, yet effective summary OOD indicator. This indicator is shown to perform well across several networks and datasets and can furthermore be easily tuned as soon as samples become available. Lastly, we discuss how to exploit this summary in real-world settings.
Multi-task learning (MTL) is a learning paradigm that aims at joint optimization of multiple tasks using a single neural network for better performance and generalization. In practice, MTL rests on the inherent assump...
详细信息
ISBN:
(纸本)9781665448994
Multi-task learning (MTL) is a learning paradigm that aims at joint optimization of multiple tasks using a single neural network for better performance and generalization. In practice, MTL rests on the inherent assumption of availability of common datasets with ground truth labels for each of the downstream tasks. However, collecting such a common annotated dataset is laborious for complex computervision tasks such as the saliency estimation which would require the eye fixation points as the ground truth data. To this end, we propose a novel MTL framework in the absence of common annotated dataset for joint estimation of important downstream tasks in computervision - object detection and saliency estimation. Unlike many state-of-the-art methods, that rely on common annotated datasets for training, we consider the annotations from different datasets for jointly training different tasks, calling this setting as cross-domain MTL. We adapt MUTAN [3] framework to fuse features from different datasets to learn domain invariant features capturing the relatedness of different tasks. We demonstrate the improvement in the performance and generalizability of our MTL architecture. We also show that the proposed MTL network offers a 13% reduction in memory footprint due to parameter sharing between the related tasks.
A video can be represented by the composition of appearance and motion. Appearance (or content) expresses the information invariant throughout time, and motion describes the time-variant movement. Here, we propose sel...
详细信息
ISBN:
(纸本)9781665445092
A video can be represented by the composition of appearance and motion. Appearance (or content) expresses the information invariant throughout time, and motion describes the time-variant movement. Here, we propose self-supervised approaches for video Generative Adversarial Networks (GANs) to achieve the appearance consistency and motion coherency in videos. Specifically, the dual discriminators for image and video individually learn to solve their own pretext tasks;appearance contrastive learning and temporal structure puzzle. The proposed tasks enable the discriminators to learn representations of appearance and temporal context, and force the generator to synthesize videos with consistent appearance and natural flow of motions. Extensive experiments in facial expression and human action public benchmarks show that our method outperforms the state-of-the-art video GANs. Moreover, consistent improvements regardless of the architecture of video GANs confirm that our framework is generic.
patternrecognition and classification standards are important for data interpretation and decision-making processes in many fields. This article provides a comprehensive review of these two intersections, exploring t...
详细信息
暂无评论