This paper presents a novel 360 degrees High-Dynamic Range (HDR) camera for real-time 3D 360 degrees panoramic computervision. The camera consists of (1) a pair of bio-inspired dynamic vision line sensors (1024 pixel...
详细信息
ISBN:
(纸本)9781479943098
This paper presents a novel 360 degrees High-Dynamic Range (HDR) camera for real-time 3D 360 degrees panoramic computervision. The camera consists of (1) a pair of bio-inspired dynamic vision line sensors (1024 pixel each) asynchronously generating events at high temporal resolution with on-chip time stamping (1 mu s resolution), having a high dynamic range and the sparse visual coding of the information, (2) a high-speed mechanical device rotating at up to 10 revolutions per sec (rps) on which the pair of sensor is mounted and (3) a processing unit for the configuration of the detector chip and transmission of its data through a slip ring and a gigabit Ethernet communication to the user. Within this work, we first present the new camera, its individual components and resulting panoramic edge map. In a second step, we developed a method for reconstructing the intensity images out of the event data generated by the sensors. The algorithm maps the recorded panoramic views into graylevel images by using a transform coefficient. In the last part of this work, anaglyph representation and 3D reconstruction results out of the stereo images are shown. The experimental results show the capabilities of the new camera to generate 10 x 3D panoramic views per second in real-time at an image resolution of 5000x1024 pixel and intra-scene dynamic range of more than 120 dB under natural illuminations. The camera potential for 360 degrees depth imaging and mobile computervision is briefly highlighted.
Deep networks are state-of-the-art models used for understanding the content of images, videos, audio and raw input data. Current computing systems are not able to run deep network models in real-time with low power c...
详细信息
ISBN:
(纸本)9781479943098
Deep networks are state-of-the-art models used for understanding the content of images, videos, audio and raw input data. Current computing systems are not able to run deep network models in real-time with low power consumption. In this paper we present nn-X: a scalable, low-power coprocessor for enabling real-time execution of deep neural networks. nn-X is implemented on programmable logic devices and comprises an array of configurable processing elements called collections. These collections perform the most common operations in deep networks: convolution, subsampling and non-linear functions. The nn-X system includes 4 high-speed direct memory access interfaces to DDR3 memory and two ARM Cortex-A9 processors. Each port is capable of a sustained throughput of 950 MB/s in full duplex. nn-X is able to achieve a peak performance of 227 G-ops/s, a measured performance in deep learning applications of up to 200 G-ops/s while consuming less than 4 watts of power. This translates to a performance per power improvement of 10 to 100 times that of conventional mobile and desktop processors.
We propose to model the persistent-transient duality in human behavior using a parent-child multi-channel neural network, which features a parent persistent channel that manages the global dynamics and children transi...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
We propose to model the persistent-transient duality in human behavior using a parent-child multi-channel neural network, which features a parent persistent channel that manages the global dynamics and children transient channels that are initiated and terminated on-demand to handle detailed interactive actions. The short-lived transient sessions are managed by a proposed Transient Switch. The neural framework is trained to discover the structure of the duality automatically. Our model shows superior performances in human-object interaction motion prediction.
In this paper we present a flash game that aims at generating easily ground truth for testing object detection algorithms. Flash the Fish is an online game where the user is shown videos from underwater environments a...
详细信息
ISBN:
(纸本)9780769549903
In this paper we present a flash game that aims at generating easily ground truth for testing object detection algorithms. Flash the Fish is an online game where the user is shown videos from underwater environments and has to take photos of fish by clicking on them. The initial ground truth is provided by object detection algorithms and, subsequent, cluster analysis and user evaluation techniques, allow for the generation of ground truth based on the weighted combination of these "photos". Evaluation of the platform and comparison of the obtained results against a hand drawn ground truth confirmed that reliable ground truth generation is not necessarily a cumbersome task both in terms of effort and time needed.
Understanding the complex relationship between emotions and facial expressions is important for both psychologists and computer scientists. A large body of research in psychology investigates facial expressions, emoti...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Understanding the complex relationship between emotions and facial expressions is important for both psychologists and computer scientists. A large body of research in psychology investigates facial expressions, emotions, and how emotions are perceived from facial expressions. As computer scientists look to incorporate this research into automatic emotion perception systems, it is important to understand the nature and limitations of human emotion perception. These principles of emotion science affect the way datasets are created, methods are implemented, and results are interpreted in automated emotion perception. This paper aims to distill and align prior work in automated and human facial emotion perception to facilitate future discussions and research at the intersection of the two disciplines.
Trajectory prediction is an important task in autonomous driving. State-of-the-art trajectory prediction models often use attention mechanisms to model the interaction between agents. In this paper, we show that the a...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Trajectory prediction is an important task in autonomous driving. State-of-the-art trajectory prediction models often use attention mechanisms to model the interaction between agents. In this paper, we show that the attention information from such models can also be used to measure the importance of each agent with respect to the ego vehicle's future planned trajectory. Our experiment results on the nuPlans dataset show that our method can effectively find and rank surrounding agents by their impact on the ego's plan.
Recent research has shown that faces can be obfuscated in large-scale datasets with a minimal performance impact on image classification and downstream tasks like object recognition. In this paper, we investigate the ...
详细信息
ISBN:
(纸本)9781665448994
Recent research has shown that faces can be obfuscated in large-scale datasets with a minimal performance impact on image classification and downstream tasks like object recognition. In this paper, we investigate the role of face obfuscation in video classification datasets and quantify a more significant reduction in performance caused by face blurring. To reduce such performance effects, we propose a generalized distillation approach in which a privacy-preserving action recognition network is trained with privileged information given by face identities. We show, through experiments performed on Kinetics-400, that the proposed approach can fully close the performance gap caused by face anonymization.
The use of 3D technologies to represent elements and interact with them is an open and interesting research area. In this article we discuss a novel human computer interaction method that integrates mobile computing a...
详细信息
ISBN:
(纸本)9780769549903
The use of 3D technologies to represent elements and interact with them is an open and interesting research area. In this article we discuss a novel human computer interaction method that integrates mobile computing and 3D visualization techniques with applications on free viewpoint visualization and 3D rendering for interactive and realistic environments. Especially this approach is focused on augmented reality and home entertainment and it was developed and tested on mobiles and particularly on tablet computers. Finally, an evaluation mechanism on the accuracy of this interaction system is presented.
In this paper, we introduce a challenging new dataset, MLB-YouTube, designed for fine-grained activity detection. The dataset contains two settings: segmented video classification as well as activity detection in cont...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
In this paper, we introduce a challenging new dataset, MLB-YouTube, designed for fine-grained activity detection. The dataset contains two settings: segmented video classification as well as activity detection in continuous videos. We experimentally compare various recognition approaches capturing temporal structure in activity videos, by classifying segmented videos and extending those approaches to continuous videos. We also compare models on the extremely difficult task of predicting pitch speed and pitch type from broadcast baseball videos. We find that learning temporal structure is valuable for fine-grained activity recognition.
Automatic age and gender classification has become relevant to an increasing amount of applications, particularly since the rise of social platforms and social media. Nevertheless, performance of existing methods on r...
详细信息
ISBN:
(纸本)9781467367592
Automatic age and gender classification has become relevant to an increasing amount of applications, particularly since the rise of social platforms and social media. Nevertheless, performance of existing methods on real-world images is still significantly lacking, especially when compared to the tremendous leaps in performance recently reported for the related task of face recognition. In this paper we show that by learning representations through the use of deep-convolutional neural networks (CNN), a significant increase in performance can be obtained on these tasks. To this end, we propose a simple convolutional net architecture that can be used even when the amount of learning data is limited. We evaluate our method on the recent Adience benchmark for age and gender estimation and show it to dramatically outperform current state-of-the-art methods.
暂无评论