The recent success of self-supervised learning relies on its ability to learn the representations from self-defined pseudo-labels that are applied to several downstream tasks. Motivated by this ability, we present a d...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
The recent success of self-supervised learning relies on its ability to learn the representations from self-defined pseudo-labels that are applied to several downstream tasks. Motivated by this ability, we present a deep image compression technique, which learns the lossy reconstruction of raw images from the self-supervised learned representation of SimCLR ResNet-50 architecture. Our framework uses a feature pyramid to achieve the variable rate compression of the image using a self-attention map for the optimal allocation of bits. The paper provides an overview to observe the effects of contrastive self-supervised representations and the self-attention map on the distortion and perceptual quality of the reconstructed image. The experiments are performed on a different class of images to show that the proposed method outperforms the other variable rate deep compression models without compromising the perceptual quality of the images.
Sign Language is the dominant yet non-primary form of communication language used in the deaf and hearing-impaired community. To make an easy and mutual communication between the hearing-impaired and the hearing commu...
详细信息
ISBN:
(纸本)9781665448994
Sign Language is the dominant yet non-primary form of communication language used in the deaf and hearing-impaired community. To make an easy and mutual communication between the hearing-impaired and the hearing communities, building a robust system capable of translating the spoken language into sign language and vice versa is fundamental. To this end, sign language recognition and production are two necessary parts for making such a two-way system. Sign language recognition and production need to cope with some critical challenges. In this survey, we review recent advances in Sign Language Production (SLP) and related areas using deep learning. This survey aims to briefly summarize recent achievements in SLP, discussing their advantages, limitations, and future directions of research.
Deep learning-based approaches have gained popularity for environment perception tasks such as semantic segmentation and object detection from images. However, the different nature of a data-driven deep neural nets (D...
详细信息
ISBN:
(纸本)9781728193601
Deep learning-based approaches have gained popularity for environment perception tasks such as semantic segmentation and object detection from images. However, the different nature of a data-driven deep neural nets (DNN) to conventional software is a challenge for practical software verification. In this work, we show how existing methods from software engineering provide benefits for the development of a DNN and in particular for dataset design and analysis. We show how combinatorial testing based on a domain model can be leveraged for generating test sets providing coverage guarantees with respect to important environmental features and their interaction. Additionally, we show how our approach can be used for growing a dataset, i.e. to identify where data is missing and should be collected next. We evaluate our approach on an internal use case and two public datasets.
Multistage, or serial, fusion refers to the algorithms sequentially fusing an increased number of matching results at each step and making decisions about accepting or rejecting the match hypothesis, or going to the n...
详细信息
ISBN:
(纸本)9781665448994
Multistage, or serial, fusion refers to the algorithms sequentially fusing an increased number of matching results at each step and making decisions about accepting or rejecting the match hypothesis, or going to the next step. Such fusion methods are beneficial in the situations where running additional matching algorithms needed for later stages is time consuming or expensive. The construction of multistage fusion methods is challenging, since it requires both learning fusion functions and finding optimal decision thresholds for each stage. In this paper, we propose the use of single neural network for learning the multistage fusion. In addition we discuss the choices for the performance measurements of the trained algorithms and for the selection of network training optimization criteria. We perform the experiments using three face matching algorithms and IJB-A and IJB-C databases.
Rate-distortion optimization (RDO) is responsible for large gains in image and video compression. While RDO is a standard tool in traditional image and video coding, it is not yet widely used in novel end-to-end train...
详细信息
ISBN:
(纸本)9781665487399
Rate-distortion optimization (RDO) is responsible for large gains in image and video compression. While RDO is a standard tool in traditional image and video coding, it is not yet widely used in novel end-to-end trained neural methods. The major reason is that the decoding function is trained once and does not have free parameters. In this paper, we present RDONet, a network containing state-of-the-art components, which is perceptually optimized and capable of rate-distortion optimization. With this network, we are able to outperform VVC Intra on MS-SSIM and two different perceptual LPIPS metrics. This paper is part of the CLIC challenge, where we participate under the team name RDONet FAU.
In this paper, a new adaptive quantization algorithm for generalized posit format is presented, to optimally represent the dynamic range and distribution of deep neural network parameters. Adaptation is achieved by mi...
详细信息
ISBN:
(纸本)9781665448994
In this paper, a new adaptive quantization algorithm for generalized posit format is presented, to optimally represent the dynamic range and distribution of deep neural network parameters. Adaptation is achieved by minimizing the intra-layer posit quantization error with a compander. The efficacy of the proposed quantization algorithm is studied within a new low-precision framework, ALPS, on ResNet-50 and EfficientNet models for classification tasks. Results assert that the accuracy and energy dissipation of low-precision DNNs using generalized posits outperform other well-known numerical formats, including standard posits.
Deep-learning based generative models are proven to be capable for achieving excellent results in numerous image processing tasks with a wide range of applications. One significant improvement of deep-learning approac...
详细信息
ISBN:
(纸本)9781665448994
Deep-learning based generative models are proven to be capable for achieving excellent results in numerous image processing tasks with a wide range of applications. One significant improvement of deep-learning approaches compared to traditional approaches is their ability to regenerate semantically coherent images by only relying on an input with limited information. This advantage becomes even more crucial when the input size is only a very minor proportion of the output size. Such image expansion tasks can be more challenging as the missing area may originally contain many semantic features that are critical in judging the quality of an image. In this paper we propose an edge-guided generative network model for producing semantically consistent output from a small image input. Our experiments show the proposed network is able to regenerate high quality images even when some structural features are missing in the input.
Before we can obfuscate portions of an image to enhance privacy, we must know what portions are considered sensitive. In this paper, we report results from a study aimed at identifying sensitive content in photos from...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Before we can obfuscate portions of an image to enhance privacy, we must know what portions are considered sensitive. In this paper, we report results from a study aimed at identifying sensitive content in photos from a human-centered perspective. We collected sensitive photos and/or descriptions of sensitive photos from participants and asked them to identify which elements of the photo made each photo sensitive. Using this information, we propose an initial two-level taxonomy of sensitive content categories. This taxonomy may be useful to privacy researchers, online social network designers, policy makers, computervision researchers and anyone wishing to identify potentially sensitive content in photos. We conclude by providing insights about how these results may be used to enhance computervision approaches to protecting image privacy.
Current methods for pruning neural network weights iteratively apply magnitude-based pruning on the model weights and re-train the resulting model to recover lost accuracy. In this work, we show that such strategies d...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Current methods for pruning neural network weights iteratively apply magnitude-based pruning on the model weights and re-train the resulting model to recover lost accuracy. In this work, we show that such strategies do not allow for the recovery of erroneously pruned weights. To enable weight recovery, we propose a simple strategy called cyclical pruning which requires the pruning schedule to be periodic and allows for weights pruned erroneously in one cycle to recover in subsequent ones. Experimental results on both linear models and large-scale deep neural networks show that cyclical pruning outperforms existing pruning algorithms, especially at high sparsity ratios. Our approach is easy to tune and can be readily incorporated into existing pruning pipelines to boost performance.
We propose a generic approach to quantization without codebook in learned image compression called onehot max (OHM, O) quantization. It reorganizes the feature space resulting in an additional dimension, along which v...
详细信息
ISBN:
(纸本)9781665487399
We propose a generic approach to quantization without codebook in learned image compression called onehot max (OHM, O) quantization. It reorganizes the feature space resulting in an additional dimension, along which vector quantization yields one-hot vectors by comparing activations. Furthermore, we show how to integrate Omega quantization into a compression system with bitrate adaptation, i.e., full control over bitrate during inference. We perform experiments on both MNIST and Kodak and report on rate-distortion trade-offs comparing with the integer rounding reference. For low bitrates (< 0.4 bpp), our proposed quantizer yields better performance while exhibiting also other advantageous training and inference properties. Code is available at https://***/ifnspaml/OHMQ.
暂无评论