In this paper we discuss and analyze possible futures for technologies in the field of computervision (CV). Using a method we have coined speculative analysis we take a broad look at research trends in the field to c...
详细信息
ISBN:
(纸本)9781538607336
In this paper we discuss and analyze possible futures for technologies in the field of computervision (CV). Using a method we have coined speculative analysis we take a broad look at research trends in the field to categorize risks, analyze which ones are most threatening and likely, and ultimately summarize conclusions for how the field may attempt to stem future harms caused by CV technologies. We develop narrative case studies to provoke dialogue and deeply explore possible risk scenarios we found to be most probable and severe. We arrive at the position that there are serious potentials for CV to cause discriminatory harm and exacerbate cybersecurity issues.
The detection and classification of human movements, as a joint field of computervision and patternrecognition, is used with an increasing rate in applications designed to describe human activity. Such applications ...
详细信息
ISBN:
(纸本)9780769549903
The detection and classification of human movements, as a joint field of computervision and patternrecognition, is used with an increasing rate in applications designed to describe human activity. Such applications require efficient methods and tools for the automatic analysis and classification of motion capture data, which constitute an active field of research. To facilitate the development and the benchmarking of methods for action recognition, several video collections have previously been proposed. In this paper, we present a new video database that can be used for an objective comparison and evaluation of different motion analysis and classification methods. The database contains video clips that capture the 3D motion of individuals. To be more specific, the set consists of 8374 video clips, which contain 12 different types of tennis actions performed by 55 individuals, captured by Kinect. Kinect provides the depth map of motion data and helps to extract the 3D skeletal joint connections. Performing experiments using state of the art algorithms, the database shows to be very challenging. It contains very similar to each other actions, offering the opportunity to algorithms dedicated to gaming and athletics, to be developed and tested. The database is freely available for research purposes.
Exergames combine exercising with game play by requiring the users to perform some kind of physical activity (and exercise) in order to score points in the game. In this paper, we present a novel mobile exergaming fra...
详细信息
ISBN:
(纸本)9780769549903
Exergames combine exercising with game play by requiring the users to perform some kind of physical activity (and exercise) in order to score points in the game. In this paper, we present a novel mobile exergaming framework, which requires the users to physically move and jump in order to score points in a game that is played on a smartphone. Our system uses a custom designed Exercising Pad (called ExerPad) in order to track the user's physical movement, and then automatically updates the corresponding game character's position on the screen. The ExerPad contains different shaped images, which are captured from the smartphone's inbuilt camera, and are automatically detected by our shape detection algorithm. We also use the smartphone's inbuilt accelerometer and gyroscope to detect other physical movements from the user such as jumping, turning etc. The experimental results show that the proposed mobile exergames helps its users to burn calories and have fun at the same time.
Given a machine learning model, adversarial perturbations transform images such that the model's output is classified as an attacker chosen class. Most research in this area has focused on adversarial perturbation...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Given a machine learning model, adversarial perturbations transform images such that the model's output is classified as an attacker chosen class. Most research in this area has focused on adversarial perturbations that are imperceptible to the human eye. However, recent work has considered attacks that are perceptible but localized to a small region of the image. Under this threat model, we discuss both defenses that remove such adversarial perturbations, and attacks that can bypass these defenses.
This work presents AFRIFASHION1600, an openly accessible contemporary African fashion image dataset containing 1600 samples labelled into 8 classes representing some African fashion styles. Each sample is coloured and...
详细信息
ISBN:
(纸本)9781665448994
This work presents AFRIFASHION1600, an openly accessible contemporary African fashion image dataset containing 1600 samples labelled into 8 classes representing some African fashion styles. Each sample is coloured and has an image size of 128 x 128. This is a niche dataset that aims to improve visibility, inclusion, and familiarity of African fashion in computervision ***1600 dataset is available here.
In this paper we consider the problem of face recognition in imagery captured in uncooperative environments using PTZ cameras. For each subject enrolled in the gallery, we acquire a high-resolution 3D model from which...
详细信息
ISBN:
(纸本)9780769549903
In this paper we consider the problem of face recognition in imagery captured in uncooperative environments using PTZ cameras. For each subject enrolled in the gallery, we acquire a high-resolution 3D model from which we generate a series of rendered face images of varying viewpoint. The result of regularly sampling face pose for all subjects is a redundant basis that over represents each target. To recognize an unknown probe image, we perform a sparse reconstruction of SIFT features extracted from the probe using a basis of SIFT features from the gallery. While directly collecting images over varying pose for all enrolled subjects is prohibitive at enrollment, the use of high speed, 3D acquisition systems allows our face recognition system to quickly acquire a single model, and generate synthetic views offline. Finally we show, using two publicly available datasets, how our approach performs when using rendered gallery images to recognize 2D rendered probe images and 2D probe images acquired using PTZ cameras.
The aim of this paper is to demonstrate that a state of the art feature matcher (LoFTR) can be made more robust to rotations by simply replacing the backbone CNN with a steerable CNN which is equivariant to translatio...
详细信息
ISBN:
(纸本)9781665487399
The aim of this paper is to demonstrate that a state of the art feature matcher (LoFTR) can be made more robust to rotations by simply replacing the backbone CNN with a steerable CNN which is equivariant to translations and image rotations. It is experimentally shown that this boost is obtained without reducing performance on ordinary illumination and viewpoint matching sequences.
In the context of variational auto-encoders, learning disentangled latent variable representations remains a challenging problem. In this abstract, we consider the semi-supervised setting, in which the factors of vari...
详细信息
ISBN:
(纸本)9781665448994
In the context of variational auto-encoders, learning disentangled latent variable representations remains a challenging problem. In this abstract, we consider the semi-supervised setting, in which the factors of variation are labelled for a small fraction of our samples. We examine how the quality of learned representations is affected by the dimension of the unsupervised component of the latent space. We also consider a variational lower bound for the mutual information between the data and the semi-supervised component of the latent space, and analyze its role in the context of disentangled representation learning.
Self-attention is a corner stone for transformer models. However, our analysis shows that self-attention in vision transformer inference is extremely sparse. When applying a sparsity constraint, our experiments on ima...
详细信息
ISBN:
(纸本)9781665448994
Self-attention is a corner stone for transformer models. However, our analysis shows that self-attention in vision transformer inference is extremely sparse. When applying a sparsity constraint, our experiments on image (ImageNet-1K) and video (Kinetics-400) understanding show we can achieve 95% sparsity on the self-attention maps while maintaining the performance drop to be less than 2 points. This motivates us to rethink the role of self-attention in vision transformer models.
We present the video demo of the prototype of an egocentric vision based assistive co-robot system. In this co-robot system, the user is wearing a pair of glasses with a forward looking camera, and is actively engaged...
详细信息
ISBN:
(纸本)9780769549903
We present the video demo of the prototype of an egocentric vision based assistive co-robot system. In this co-robot system, the user is wearing a pair of glasses with a forward looking camera, and is actively engaged in the control loop of the robot in navigational tasks. The egocentric vision glasses serve for two purposes. First, it serves as a source of visual input to request the robot to find a certain object in the environment. Second, the motion patterns computed from the egocentric video associated with a specific set of head movements are exploited to guide the robot to find the object. These are especially helpful for quadriplegic individuals who do not have the needed hand functionality for control with other modalities (e.g., joystick). In our co-robot system, when the robot does not fulfill the object finding task in a pre-specified time window, it would actively solicit user controls for guidance. Then the users can use the egocentric vision based gesture interface to orient the robot towards the direction of the object. After that the robot will automatically navigate towards the object until it finds it. Our experiments validated the efficacy of the closed-loop design to engage the human in the loop.
暂无评论