Reduced precision hardware-based matrix multiplication accelerators are commonly employed to reduce power consumption of neural network inference. Multiplier designs used in such accelerators possess an interesting pr...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Reduced precision hardware-based matrix multiplication accelerators are commonly employed to reduce power consumption of neural network inference. Multiplier designs used in such accelerators possess an interesting property: When the same bit is 0 for two consecutive compute cycles, the multiplier consumes less power. In this paper we show that this effect can be used to reduce power consumption of neural networks by simulating low bit-width quantization on higher bit-width hardware. We show that simulating 4 bit quantization on 8 bit hardware can yield up to 17% relative reduction in power consumption on commonly used networks. Furthermore, we show that in this context, bit operations (BOPs) are a good proxy for power efficiency, and that learning mixed-precision configurations that target lower BOPs can achieve better trade-offs between accuracy and power efficiency.
This paper presents a study on the use of Convolutional Neural Networks for camera relocalisation and its application to map compression. We follow state of the art visual relocalisation results and evaluate the respo...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
This paper presents a study on the use of Convolutional Neural Networks for camera relocalisation and its application to map compression. We follow state of the art visual relocalisation results and evaluate the response to different data inputs. We use a CNN map representation and introduce the notion of map compression under this paradigm by using smaller CNN architectures without sacrificing relocalisation performance. We evaluate this approach in a series of publicly available datasets over a number of CNN architectures with different sizes, both in complexity and number of layers. This formulation allows us to improve relocalisation accuracy by increasing the number of training trajectories while maintaining a constant-size CNN.
Most team sports such as hockey involve periods of active play interleaved with breaks in play. When watching a game remotely, many fans would prefer an abbreviated game showing only periods of active play. Here we ad...
详细信息
ISBN:
(纸本)9781665448994
Most team sports such as hockey involve periods of active play interleaved with breaks in play. When watching a game remotely, many fans would prefer an abbreviated game showing only periods of active play. Here we address the problem of identifying these periods in order to produce a time-compressed viewing experience. Our approach is based on a hidden Markov model of play state driven by deep visual and optional auditory cues. We find that our deep visual cues generalize well across different cameras and that auditory cues can improve performance but only if unsupervised methods are used to adapt emission distributions to domain shift across games. Our system achieves temporal compression rates of 20-50% at a recall of 96%.
Multi-modal generative models represent an important family of deep models, whose goal is to facilitate representation learning on data with multiple views or modalities. However, current deep multi-modal models focus...
详细信息
ISBN:
(纸本)9781665448994
Multi-modal generative models represent an important family of deep models, whose goal is to facilitate representation learning on data with multiple views or modalities. However, current deep multi-modal models focus on the inference of shared representations, while neglecting the important private aspects of data within individual modalities. In this paper, we introduce a disentangled multi-modal variational autoencoder (DMVAE) that utilizes disentangled VAE strategy to separate the private and shared latent spaces of multiple modalities. We demonstrate the utility of DMVAE two image modalities of MNIST and Google Street View House Number (SVHN) datasets as well as image and text modalities from the Oxford-102 Flowers dataset. Our experiments indicate the essence of retaining the private representation as well as the private-shared disentanglement to effectively direct the information across multiple analysis-synthesis conduits.
The accuracy of finger vein recognition systems gets degraded due to low and uneven contrast between veins and surroundings, often resulting in poor detection of vein patterns. We propose a finger-vein enhancement tec...
详细信息
ISBN:
(纸本)9781665487399
The accuracy of finger vein recognition systems gets degraded due to low and uneven contrast between veins and surroundings, often resulting in poor detection of vein patterns. We propose a finger-vein enhancement technique, ResFPN (Residual Feature Pyramid Network), as a generic preprocessing method agnostic to the recognition pipeline. A bottom-up pyramidal architecture using the novel Structure Detection block (SDBlock) facilitates extraction of veins of varied widths. Using a feature aggregation module (FAM), we combine these vein-structures, and train the proposed ResFPN for detection of veins across scales. With enhanced presentations, our experiments indicate a reduction upto 5% in the average recognition errors for commonly used recognition pipeline over two publicly available datasets. These improvements are persistent even in cross-dataset scenario where the dataset used to train the ResFPN is different from the one used for recognition.
In this work, we address the issues of the missing modalities that have arisen from the Visual Question Answer-Difference prediction task and find a novel method to solve the task at hand. We address the missing modal...
详细信息
ISBN:
(纸本)9781665448994
In this work, we address the issues of the missing modalities that have arisen from the Visual Question Answer-Difference prediction task and find a novel method to solve the task at hand. We address the missing modality-the ground truth answers-that are not present at test time and use a privileged knowledge distillation scheme to deal with the issue of the missing modality. In order to efficiently do so, we first introduce a model, the "Big" Teacher, that takes the image/question/answer triplet as its input and outperforms the baseline, then use a combination of models to distill knowledge to a target network (student) that only takes the image/question pair as its inputs. We experiment our models on the VizWiz and VQA-V2 Answer Difference datasets and show through extensive experimentation and ablation the performance of our method and a diverse possibility for future research.
In this work, we propose MVFuseNet, a novel end-to-end method for joint object detection and motion forecasting from a temporal sequence of LiDAR data. Most existing methods operate in a single view by projecting data...
详细信息
ISBN:
(纸本)9781665448994
In this work, we propose MVFuseNet, a novel end-to-end method for joint object detection and motion forecasting from a temporal sequence of LiDAR data. Most existing methods operate in a single view by projecting data in either range view (RV) or bird's eye view (BEV). In contrast, we propose a method that effectively utilizes both RV and BEV for spatio-temporal feature learning as part of a temporal fusion network as well as for multi-scale feature learning in the backbone network. Further, we propose a novel sequential fusion approach that effectively utilizes multiple views in the temporal fusion network. We show the benefits of our multi-view approach for the tasks of detection and motion forecasting on two large-scale self-driving data sets, achieving state-of-the-art results. Furthermore, we show that MVFusenet scales well to large operating ranges while maintaining real-time performance.
Due to the success of generative flows to model data distributions, they have been explored in inverse problems. Given a pre-trained generative flow, previous work proposed to minimize the 2-norm of the latent variabl...
详细信息
ISBN:
(纸本)9781665487399
Due to the success of generative flows to model data distributions, they have been explored in inverse problems. Given a pre-trained generative flow, previous work proposed to minimize the 2-norm of the latent variables as a regularization term. The intuition behind it was to ensure high likelihood latent variables that produce the closest restoration. However, high-likelihood latent variables may generate unrealistic samples as we show in our experiments. We therefore propose a solver to directly produce high-likelihood reconstructions. We hypothesize that our approach could make generative flows a general purpose solver for inverse problems. Furthermore, we propose 1x1 coupling functions to introduce permutations in a generative flow. It has the advantage that its inverse does not require to be calculated in the generation process. Finally, we evaluate our method for denoising, deblurring, inpainting, and colorization. We observe a compelling improvement of our method over prior works.
Few-shot learning aims to build classifiers for new classes from a small number of labeled examples and is commonly facilitated by access to examples from a distinct set of 'base classes'. The difference in da...
详细信息
ISBN:
(纸本)9781665448994
Few-shot learning aims to build classifiers for new classes from a small number of labeled examples and is commonly facilitated by access to examples from a distinct set of 'base classes'. The difference in data distribution between the test set (novel classes) and the base classes used to learn an inductive bias often results in poor generalization on the novel classes. To alleviate problems caused by the distribution shift, previous research has explored the use of unlabeled examples from the novel classes, in addition to labeled examples of the base classes, which is known as the transductive setting. In this work, we show that, surprisingly, off-the-shelf self-supervised learning outperforms transductive few-shot methods by 3.9% for 5-shot accuracy on miniImageNet without using any base class labels. This motivates us to examine more carefully the role of features learned through self-supervision in few-shot learning. Comprehensive experiments are conducted to compare the transferability, robustness, efficiency, and the complementarity of supervised and self-supervised features.
We introduce a multimodal vision framework for precision livestock farming, harnessing the power of GroundingDINO, HQSAM, and ViTPose models. This integrated suite enables comprehensive behavioral analytics from video...
详细信息
ISBN:
(纸本)9798350365474
We introduce a multimodal vision framework for precision livestock farming, harnessing the power of GroundingDINO, HQSAM, and ViTPose models. This integrated suite enables comprehensive behavioral analytics from video data without invasive animal tagging. GroundingDINO generates accurate bounding boxes around livestock, while HQSAM segments individual animals within these boxes. ViTPose estimates key body points, facilitating posture and movement analysis. Demonstrated on a sheep dataset with grazing, running, sitting, standing, and walking activities, our framework extracts invaluable insights: activity and grazing patterns, interaction dynamics, and detailed postural evaluations. Applicable across species and video resolutions, this framework revolutionizes non-invasive livestock monitoring for activity detection, counting, health assessments, and posture analyses. It empowers data-driven farm management, optimizing animal welfare and productivity through AI-powered behavioral understanding.
暂无评论