Being able to detect irrelevant test examples with respect to deployed deep learning models is paramount to properly and safely using them. In this paper, we address the problem of rejecting such out-of-distribution (...
详细信息
ISBN:
(纸本)9781665448994
Being able to detect irrelevant test examples with respect to deployed deep learning models is paramount to properly and safely using them. In this paper, we address the problem of rejecting such out-of-distribution (OOD) samples in a fully sample-free way, i.e., without requiring any access to in-distribution or OOD samples. We propose several indicators which can be computed alongside the prediction with little additional cost, assuming white-box access to the network. These indicators prove useful, stable and complementary for OOD detection on frequently-used architectures. We also introduce a surprisingly simple, yet effective summary OOD indicator. This indicator is shown to perform well across several networks and datasets and can furthermore be easily tuned as soon as samples become available. Lastly, we discuss how to exploit this summary in real-world settings.
Most team sports such as hockey involve periods of active play interleaved with breaks in play. When watching a game remotely, many fans would prefer an abbreviated game showing only periods of active play. Here we ad...
详细信息
ISBN:
(纸本)9781665448994
Most team sports such as hockey involve periods of active play interleaved with breaks in play. When watching a game remotely, many fans would prefer an abbreviated game showing only periods of active play. Here we address the problem of identifying these periods in order to produce a time-compressed viewing experience. Our approach is based on a hidden Markov model of play state driven by deep visual and optional auditory cues. We find that our deep visual cues generalize well across different cameras and that auditory cues can improve performance but only if unsupervised methods are used to adapt emission distributions to domain shift across games. Our system achieves temporal compression rates of 20-50% at a recall of 96%.
Human action recognition in the dark is a significant task with various applications, e.g., night surveillance and self-driving at night. However, the lack of video datasets for human actions in the dark hinders its d...
详细信息
ISBN:
(纸本)9781665448994
Human action recognition in the dark is a significant task with various applications, e.g., night surveillance and self-driving at night. However, the lack of video datasets for human actions in the dark hinders its development. Recently, a public dataset ARID has been introduced to stimulate progress for the task of human action recognition in dark videos. Currently, there are multiple models that perform well for action recognition in videos shot under normal illumination. However, research shows that these methods may not be effective in recognizing actions in dark videos. In this paper, we construct a novel neural network architecture: DarkLight Networks, which involves (i) a dual-pathway structure where both dark videos and its brightened counterpart are utilized for effective video representation;and (ii) a self-attention mechanism, which fuses and extracts corresponding and complementary features from the two pathways. Our approach achieves state-of-the-art results on ARID.
Although recent years have witnessed the great advances in stereo image super-resolution (SR), the beneficial information provided by binocular systems has not been fully used. Since stereo images are highly symmetric...
详细信息
ISBN:
(纸本)9781665448994
Although recent years have witnessed the great advances in stereo image super-resolution (SR), the beneficial information provided by binocular systems has not been fully used. Since stereo images are highly symmetric under epipolar constraint, in this paper, we improve the performance of stereo image SR by exploiting symmetry cues in stereo image pairs. Specifically, we propose a symmetric bi-directional parallax attention module (biPAM) and an inline occlusion handling scheme to effectively interact cross-view information. Then, we design a Siamese network equipped with a biPAM to super-resolve both sides of views in a highly symmetric manner. Finally, we design several illuminance-robust losses to enhance stereo consistency. Experiments on four public datasets demonstrate the superior performance of our method.
Multi-modal generative models represent an important family of deep models, whose goal is to facilitate representation learning on data with multiple views or modalities. However, current deep multi-modal models focus...
详细信息
ISBN:
(纸本)9781665448994
Multi-modal generative models represent an important family of deep models, whose goal is to facilitate representation learning on data with multiple views or modalities. However, current deep multi-modal models focus on the inference of shared representations, while neglecting the important private aspects of data within individual modalities. In this paper, we introduce a disentangled multi-modal variational autoencoder (DMVAE) that utilizes disentangled VAE strategy to separate the private and shared latent spaces of multiple modalities. We demonstrate the utility of DMVAE two image modalities of MNIST and Google Street View House Number (SVHN) datasets as well as image and text modalities from the Oxford-102 Flowers dataset. Our experiments indicate the essence of retaining the private representation as well as the private-shared disentanglement to effectively direct the information across multiple analysis-synthesis conduits.
Sign Language is the dominant yet non-primary form of communication language used in the deaf and hearing-impaired community. To make an easy and mutual communication between the hearing-impaired and the hearing commu...
详细信息
ISBN:
(纸本)9781665448994
Sign Language is the dominant yet non-primary form of communication language used in the deaf and hearing-impaired community. To make an easy and mutual communication between the hearing-impaired and the hearing communities, building a robust system capable of translating the spoken language into sign language and vice versa is fundamental. To this end, sign language recognition and production are two necessary parts for making such a two-way system. Sign language recognition and production need to cope with some critical challenges. In this survey, we review recent advances in Sign Language Production (SLP) and related areas using deep learning. This survey aims to briefly summarize recent achievements in SLP, discussing their advantages, limitations, and future directions of research.
Multistage, or serial, fusion refers to the algorithms sequentially fusing an increased number of matching results at each step and making decisions about accepting or rejecting the match hypothesis, or going to the n...
详细信息
ISBN:
(纸本)9781665448994
Multistage, or serial, fusion refers to the algorithms sequentially fusing an increased number of matching results at each step and making decisions about accepting or rejecting the match hypothesis, or going to the next step. Such fusion methods are beneficial in the situations where running additional matching algorithms needed for later stages is time consuming or expensive. The construction of multistage fusion methods is challenging, since it requires both learning fusion functions and finding optimal decision thresholds for each stage. In this paper, we propose the use of single neural network for learning the multistage fusion. In addition we discuss the choices for the performance measurements of the trained algorithms and for the selection of network training optimization criteria. We perform the experiments using three face matching algorithms and IJB-A and IJB-C databases.
Deep-learning based generative models are proven to be capable for achieving excellent results in numerous image processing tasks with a wide range of applications. One significant improvement of deep-learning approac...
详细信息
ISBN:
(纸本)9781665448994
Deep-learning based generative models are proven to be capable for achieving excellent results in numerous image processing tasks with a wide range of applications. One significant improvement of deep-learning approaches compared to traditional approaches is their ability to regenerate semantically coherent images by only relying on an input with limited information. This advantage becomes even more crucial when the input size is only a very minor proportion of the output size. Such image expansion tasks can be more challenging as the missing area may originally contain many semantic features that are critical in judging the quality of an image. In this paper we propose an edge-guided generative network model for producing semantically consistent output from a small image input. Our experiments show the proposed network is able to regenerate high quality images even when some structural features are missing in the input.
In this paper, a new adaptive quantization algorithm for generalized posit format is presented, to optimally represent the dynamic range and distribution of deep neural network parameters. Adaptation is achieved by mi...
详细信息
ISBN:
(纸本)9781665448994
In this paper, a new adaptive quantization algorithm for generalized posit format is presented, to optimally represent the dynamic range and distribution of deep neural network parameters. Adaptation is achieved by minimizing the intra-layer posit quantization error with a compander. The efficacy of the proposed quantization algorithm is studied within a new low-precision framework, ALPS, on ResNet-50 and EfficientNet models for classification tasks. Results assert that the accuracy and energy dissipation of low-precision DNNs using generalized posits outperform other well-known numerical formats, including standard posits.
Imaging in low light is extremely challenging due to low photon counts. Using sensitive CMOS cameras, it is currently possible to take videos at night under moonlight (0.05-0.3 lux illumination). In this paper, we dem...
详细信息
暂无评论