In this paper we discuss and analyze possible futures for technologies in the field of computervision (CV). Using a method we have coined speculative analysis we take a broad look at research trends in the field to c...
详细信息
ISBN:
(纸本)9781538607336
In this paper we discuss and analyze possible futures for technologies in the field of computervision (CV). Using a method we have coined speculative analysis we take a broad look at research trends in the field to categorize risks, analyze which ones are most threatening and likely, and ultimately summarize conclusions for how the field may attempt to stem future harms caused by CV technologies. We develop narrative case studies to provoke dialogue and deeply explore possible risk scenarios we found to be most probable and severe. We arrive at the position that there are serious potentials for CV to cause discriminatory harm and exacerbate cybersecurity issues.
Recent research has shown that faces can be obfuscated in large-scale datasets with a minimal performance impact on image classification and downstream tasks like object recognition. In this paper, we investigate the ...
详细信息
ISBN:
(纸本)9781665448994
Recent research has shown that faces can be obfuscated in large-scale datasets with a minimal performance impact on image classification and downstream tasks like object recognition. In this paper, we investigate the role of face obfuscation in video classification datasets and quantify a more significant reduction in performance caused by face blurring. To reduce such performance effects, we propose a generalized distillation approach in which a privacy-preserving action recognition network is trained with privileged information given by face identities. We show, through experiments performed on Kinetics-400, that the proposed approach can fully close the performance gap caused by face anonymization.
In many vision problems, we want to infer two (or more) hidden factors which interact to produce our observations. We may want to disentangle illuminant and object colors in color constancy;rendering conditions from s...
详细信息
ISBN:
(纸本)0780342364
In many vision problems, we want to infer two (or more) hidden factors which interact to produce our observations. We may want to disentangle illuminant and object colors in color constancy;rendering conditions from surface shape in shape-from-shading;face identity and head pose in face recognition;or font and letter class in character recognition. We refer to these two factors generically as ''style'' and ''content''. Bilinear models offer a powerful framework for extracting the two-factor structure of a set of observations, and are familiar in computational vision from several well-known lines of research. This paper shows how bilinear models can be used to learn the style-content structure of a pattern analysis or synthesis problem, which can then be generalized to solve related tasks using different styles and/or content. We focus on three tasks: extrapolating the style of data to unseen content classes, classifying data with known content under a novel style, and translating data from novel content classes and style to a known style or content. We show examples from color constancy, face pose estimation, shape-from-shading, typography and speech.
We present an approach to perform supervised action recognition in the dark. In this work, we present our results on the ARID dataset[60]. Most previous works only evaluate performance on large, well illuminated datas...
详细信息
ISBN:
(纸本)9781665448994
We present an approach to perform supervised action recognition in the dark. In this work, we present our results on the ARID dataset[60]. Most previous works only evaluate performance on large, well illuminated datasets like Kinetics and HMDB51. We demonstrate that our work is able to achieve a very low error rate while being trained on a much smaller dataset of dark videos. We also explore a variety of training and inference strategies including domain transfer methodologies and also propose a simple but useful frame selection strategy. Our empirical results demonstrate that we beat previously published baseline models by 11%.
We present a method for computing dense visual correspondence based on general assumptions about scene geometry. Our algorithm does not rely on cor relation, and uses a variable region of support. We assume that image...
详细信息
ISBN:
(纸本)0780342364
We present a method for computing dense visual correspondence based on general assumptions about scene geometry. Our algorithm does not rely on cor relation, and uses a variable region of support. We assume that images consist of a number of connected sets of pixels with the same disparity, which we call disparity components. Using maximum likelihood arguments, at each pixel we compute a small set of plausible disparities. A pixel is assigned a disparity d based on connected components of pixels, where each pixel in a component considers d to be plausible. Our implementation chooses the largest plausible disparity component;however;global contextual constraints can also be applied. While the algorithm was originally designed for visual correspondence, it can also be used for other early vision problems such as image restoration. It runs in a few seconds on traditional benchmark images with standard parameter settings, and gives quite promising results.
This paper introduces a novel dataset for video enhancement and studies the state-of-the-art methods of the NTIRE 2021 challenge on quality enhancement of compressed video. The challenge is the first NTIRE challenge i...
详细信息
ISBN:
(纸本)9781665448994
This paper introduces a novel dataset for video enhancement and studies the state-of-the-art methods of the NTIRE 2021 challenge on quality enhancement of compressed video. The challenge is the first NTIRE challenge in this direction, with three competitions, hundreds of participants and tens of proposed solutions. Our newly collected Large-scale Diverse Video (LDV) dataset is employed in the challenge. In our study, we analyze the solutions of the challenges and several representative methods from previous literature on the proposed LDV dataset. We find that the NTIRE 2021 challenge advances the state-of-theart of quality enhancement on compressed video.
We show how to outsource data annotation to Amazon Mechanical Turk. Doing so has produced annotations in quite large numbers relatively cheaply. The quality is good, and can be checked and controlled. Annotations are ...
详细信息
ISBN:
(纸本)9781424423392
We show how to outsource data annotation to Amazon Mechanical Turk. Doing so has produced annotations in quite large numbers relatively cheaply. The quality is good, and can be checked and controlled. Annotations are produced quickly. We describe results for several different annotation problems. We describe some strategies for determining when the task is well specified and properly priced.
Several vision problems can be reduced to the problem of fitting a linear surface of low dimension to data, including the problems of structure-from-affine-motion, and of characterizing the intensity images of a Lambe...
详细信息
ISBN:
(纸本)0780342364
Several vision problems can be reduced to the problem of fitting a linear surface of low dimension to data, including the problems of structure-from-affine-motion, and of characterizing the intensity images of a Lambertian scene by constructing the intensity manifold. For these problems, one must deal with a data matrix with some missing elements. In structure-from-motion, missing elements will occur if some point features are not visible in some frames. To construct the intensity manifold missing matrix elements will arise when the surface normals of some scene points do not face the light source in some images. We propose a novel method for fitting a low rank matrix to a matrix with missing elements. We show experimentally that our method produces good results in the presence of noise. These results can be either used directly, or can serve as an excellent starting point for an iterative method.
The choice of a color space is of great importance for many computervision algorithms (e.g. edge detection and object recognition). It induces the equivalence classes to the actual algorithms. However the problem is ...
详细信息
ISBN:
(纸本)0769523722
The choice of a color space is of great importance for many computervision algorithms (e.g. edge detection and object recognition). It induces the equivalence classes to the actual algorithms. However the problem is how to automatically select the color space that produces the best result for a particular task. The subsequent difficulty then is how to obtain a proper weighting scheme for the algorithms so that the results are combined in an optimal setting. To achieve proper color space selection and fusion of feature detectors, in this paper we propose a method that exploits non-perfect correlation between the color models derived from the principles of diversification. As a consequence, the weighting scheme yields maximal color discrimination. The method is verified experimentally for two different feature detectors. The experimental results show that the model provides feature detection results having a discriminative power of 30 percent higher than the standard weighting scheme.
Towards the goal of realizing a generic automatic human activity recognition system, a new formalism is proposed. Activities are described by a chained hierarchical representation using three type of entities: image f...
详细信息
ISBN:
(纸本)0769506623
Towards the goal of realizing a generic automatic human activity recognition system, a new formalism is proposed. Activities are described by a chained hierarchical representation using three type of entities: image features, mobile object properties and scenarios. Taking image features of tracked moving regions from an image sequence as input, mobile object properties are first computed by specific methods ods while noise is suppressed by statistical methods. Scenarios are recognized from mobile object properties based on Bayesian analysis. A sequential occurance several scenarios are recognized by an algorithm using a probabilistic finite-state automation (a variant of structured HMM). The demonstration of the optimality of these recognition method is discussed. Finally, the validity and the effectiveness of our approach is demonstrated on both real-world and perturbed data.
暂无评论