We propose to model the persistent-transient duality in human behavior using a parent-child multi-channel neural network, which features a parent persistent channel that manages the global dynamics and children transi...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
We propose to model the persistent-transient duality in human behavior using a parent-child multi-channel neural network, which features a parent persistent channel that manages the global dynamics and children transient channels that are initiated and terminated on-demand to handle detailed interactive actions. The short-lived transient sessions are managed by a proposed Transient Switch. The neural framework is trained to discover the structure of the duality automatically. Our model shows superior performances in human-object interaction motion prediction.
We investigate the problem of recognizing words from video, fingerspelled using the British Sign Language (BSL) fingerspelling alphabet. This is a challenging task since the BSL alphabet involves both hands occluding ...
详细信息
ISBN:
(纸本)9781424439942
We investigate the problem of recognizing words from video, fingerspelled using the British Sign Language (BSL) fingerspelling alphabet. This is a challenging task since the BSL alphabet involves both hands occluding each other and contains signs which are ambiguous from the observer's viewpoint. The main contributions of our work include: (i) recognition based on hand shape alone, not requiring motion cues;(ii) robust visual features for hand shape recognition;(iii) scalability to large lexicon recognition with no re-training. We report results on a dataset of 1,000 low quality web-cam videos of 100 words. The proposed method achieves a word recognition accuracy of 98.9%.
Material recognition is researched in both computervision and vision science fields. In this paper, we investigated how humans observe material images and found the eye fixation information improves the performance o...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Material recognition is researched in both computervision and vision science fields. In this paper, we investigated how humans observe material images and found the eye fixation information improves the performance of material image classification models. We first collected eye-tracking data from human observers and used it to fine-tune a generative adversarial network for saliency prediction (SalGAN). We then fused the predicted saliency map with material images and fed them to CNN models for material classification. The experiment results show that the classification accuracy is improved than those using original images. This indicates that human's visual cues could benefit computational models as priors.
Action recognition is one of the major challenges of computervision. Several approaches have been proposed using different descriptors and multi-class models. In this paper, we focus on binary ranking models for the ...
详细信息
ISBN:
(纸本)9780769549903
Action recognition is one of the major challenges of computervision. Several approaches have been proposed using different descriptors and multi-class models. In this paper, we focus on binary ranking models for the action recognition problem and address the action recognition as a ranking problem. A binary ranking model is trained for each action and used to recognize the test videos for that action. Binary ranking models are constructed using dense SIFT (DSIFT) descriptors and histogram of oriented gradients / histogram of optical flows (HOG/HOF) descriptors. We show that using ranking models, it is possible to obtain higher recognition accuracies from a baseline that is based on multi-class models on the very recent and challenging benchmark datasets;Human Motion Database (HMDB) and The Action Similarity Labeling (ASLAN).
The use of 3D technologies to represent elements and interact with them is an open and interesting research area. In this article we discuss a novel human computer interaction method that integrates mobile computing a...
详细信息
ISBN:
(纸本)9780769549903
The use of 3D technologies to represent elements and interact with them is an open and interesting research area. In this article we discuss a novel human computer interaction method that integrates mobile computing and 3D visualization techniques with applications on free viewpoint visualization and 3D rendering for interactive and realistic environments. Especially this approach is focused on augmented reality and home entertainment and it was developed and tested on mobiles and particularly on tablet computers. Finally, an evaluation mechanism on the accuracy of this interaction system is presented.
Understanding human actions in videos has been a central research theme in computervision for decades, and much progress has been achieved over the years. Much of this progress was demonstrated on standard benchmarks...
详细信息
ISBN:
(纸本)9780769549903
Understanding human actions in videos has been a central research theme in computervision for decades, and much progress has been achieved over the years. Much of this progress was demonstrated on standard benchmarks used to evaluate novel techniques. These benchmarks and their evolution, provide a unique perspective on the growing capabilities of computerized action recognition systems. They demonstrate just how far machine vision systems have come while also underscore the gap that still remains between existing state-of-the-art performance and the needs of real-world applications. In this paper we provide a comprehensive survey of these benchmarks: from early examples, such as the Weizmann set [1], to recently presented, contemporary benchmarks. This paper further provides a summary of the results obtained in the last couple of years on the recent ASLAN benchmark [12], which was designed to reflect the many challenges modern Action recognition systems are expected to overcome.
The NTIRE 2021 workshop features a Multi-modal Aerial View Object Classification Challenge. Its focus is on multi-sensor imagery classification in order to improve the performance of automatic target recognition (ATR)...
详细信息
ISBN:
(纸本)9781665448994
The NTIRE 2021 workshop features a Multi-modal Aerial View Object Classification Challenge. Its focus is on multi-sensor imagery classification in order to improve the performance of automatic target recognition (ATR) systems. In this paper we describe our entry in this challenge, a method focused on efficiency and low computational time, while maintaining a high level of accuracy. The method is a convolutional neural network with 11 convolutions, 1 max pooling layers and 3 residual blocks which has a total of 373.130 parameters. The method ranks 3rd in the Track 2 (SAR+EO) of the challenge.
Understanding the complex relationship between emotions and facial expressions is important for both psychologists and computer scientists. A large body of research in psychology investigates facial expressions, emoti...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Understanding the complex relationship between emotions and facial expressions is important for both psychologists and computer scientists. A large body of research in psychology investigates facial expressions, emotions, and how emotions are perceived from facial expressions. As computer scientists look to incorporate this research into automatic emotion perception systems, it is important to understand the nature and limitations of human emotion perception. These principles of emotion science affect the way datasets are created, methods are implemented, and results are interpreted in automated emotion perception. This paper aims to distill and align prior work in automated and human facial emotion perception to facilitate future discussions and research at the intersection of the two disciplines.
In this paper, we introduce a challenging new dataset, MLB-YouTube, designed for fine-grained activity detection. The dataset contains two settings: segmented video classification as well as activity detection in cont...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
In this paper, we introduce a challenging new dataset, MLB-YouTube, designed for fine-grained activity detection. The dataset contains two settings: segmented video classification as well as activity detection in continuous videos. We experimentally compare various recognition approaches capturing temporal structure in activity videos, by classifying segmented videos and extending those approaches to continuous videos. We also compare models on the extremely difficult task of predicting pitch speed and pitch type from broadcast baseball videos. We find that learning temporal structure is valuable for fine-grained activity recognition.
Trajectory prediction is an important task in autonomous driving. State-of-the-art trajectory prediction models often use attention mechanisms to model the interaction between agents. In this paper, we show that the a...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Trajectory prediction is an important task in autonomous driving. State-of-the-art trajectory prediction models often use attention mechanisms to model the interaction between agents. In this paper, we show that the attention information from such models can also be used to measure the importance of each agent with respect to the ego vehicle's future planned trajectory. Our experiment results on the nuPlans dataset show that our method can effectively find and rank surrounding agents by their impact on the ego's plan.
暂无评论