We present a method to capture both 3D shape and spatially varying reflectance with a multi-view photometric stereo technique that works for general isotropic materials. Our data capture setup is simple, which consist...
详细信息
ISBN:
(纸本)9780769549897
We present a method to capture both 3D shape and spatially varying reflectance with a multi-view photometric stereo technique that works for general isotropic materials. Our data capture setup is simple, which consists of only a digital camera and a handheld light source. From a single viewpoint, we use a set of photometric stereo images to identify surface points withthe same distance to the camera. We collect this information from multiple viewpoints and combine it with structure-from-motion to obtain a precise reconstruction of the complete 3D shape. the spatially varying isotropic bidirectional reflectance distribution function (BRDF) is captured by simultaneously inferring a set of basis BRDFs and their mixing weights at each surface point. According to our experiments, the captured shapes are accurate to 0.3 millimeters. the captured reflectance has relative root-mean-square error (RMSE) of 9%.
We deal withthe problem of recognizing social roles played by people in an event. Social roles are governed by human interactions, and form a fundamental component of human event description. We focus on a weakly sup...
详细信息
ISBN:
(纸本)9780769549897
We deal withthe problem of recognizing social roles played by people in an event. Social roles are governed by human interactions, and form a fundamental component of human event description. We focus on a weakly supervised setting, where we are provided different videos belonging to an event class, without training role labels. Since social roles are described by the interaction between people in an event, we propose a Conditional Random Field to model the inter-role interactions, along with person specific social descriptors. We develop tractable variational inference to simultaneously infer model weights, as well as role assignment to all people in the videos. We also present a novel YouTube social roles dataset with ground truth role annotations, and introduce annotations on a subset of videos from the TRECVID-MED11 [1] event kits for evaluation purposes. the performance of the model is compared against different baseline methods on these datasets.
Understanding human actions in videos has been a central research theme in computervision for decades, and much progress has been achieved over the years. Much of this progress was demonstrated on standard benchmarks...
详细信息
ISBN:
(纸本)9780769549903
Understanding human actions in videos has been a central research theme in computervision for decades, and much progress has been achieved over the years. Much of this progress was demonstrated on standard benchmarks used to evaluate novel techniques. these benchmarks and their evolution, provide a unique perspective on the growing capabilities of computerized action recognition systems. they demonstrate just how far machine vision systems have come while also underscore the gap that still remains between existing state-of-the-art performance and the needs of real-world applications. In this paper we provide a comprehensive survey of these benchmarks: from early examples, such as the Weizmann set [1], to recently presented, contemporary benchmarks. this paper further provides a summary of the results obtained in the last couple of years on the recent ASLAN benchmark [12], which was designed to reflect the many challenges modern Action recognition systems are expected to overcome.
We propose a novel approach to both learning and detecting local contour-based representations for mid-level features. Our features, called sketch tokens, are learned using supervised mid-level information in the form...
详细信息
ISBN:
(纸本)9780769549897
We propose a novel approach to both learning and detecting local contour-based representations for mid-level features. Our features, called sketch tokens, are learned using supervised mid-level information in the form of hand drawn contours in images. Patches of human generated contours are clustered to form sketch token classes and a random forest classifier is used for efficient detection in novel images. We demonstrate our approach on both top-down and bottom-up tasks. We show state-of-the-art results on the top-down task of contour detection while being over 200x faster than competing methods. We also achieve large improvements in detection accuracy for the bottom-up tasks of pedestrian and object detection as measured on INRIA [5] and PASCAL [10], respectively. these gains are due to the complementary information provided by sketch tokens to low-level features such as gradient histograms.
Due to their high fault-tolerance, ease of installation and scalability to large networks, distributed algorithms have recently gained immense popularity in the sensor networks community, especially in computervision...
详细信息
ISBN:
(纸本)9780769549897
Due to their high fault-tolerance, ease of installation and scalability to large networks, distributed algorithms have recently gained immense popularity in the sensor networks community, especially in computervision. Multi-target tracking in a camera network is one of the fundamental problems in this domain. Distributed estimation algorithms work by exchanging information between sensors that are communication neighbors. Since most cameras are directional sensors, it is often the case that neighboring sensors may not be sensing the same target. Such sensors that do not have information about a target are termed as "naive" with respect to that target. In this paper, we propose consensus-based distributed multi-target tracking algorithms in a camera network that are designed to address this issue of naivety. the estimation errors in tracking and data association, as well as the effect of naivety, are jointly addressed leading to the development of an information-weighted consensus algorithm, which we term as the Multi-target Information Consensus (MTIC) algorithm. the incorporation of the probabilistic data association mechanism makes the MTIC algorithm very robust to false measurements/clutter. Experimental analysis is provided to support the theoretical results.
the use of 3D technologies to represent elements and interact withthem is an open and interesting research area. In this article we discuss a novel human computer interaction method that integrates mobile computing a...
详细信息
ISBN:
(纸本)9780769549903
the use of 3D technologies to represent elements and interact withthem is an open and interesting research area. In this article we discuss a novel human computer interaction method that integrates mobile computing and 3D visualization techniques with applications on free viewpoint visualization and 3D rendering for interactive and realistic environments. Especially this approach is focused on augmented reality and home entertainment and it was developed and tested on mobiles and particularly on tablet computers. Finally, an evaluation mechanism on the accuracy of this interaction system is presented.
We have been researching three dimensional (3D) ground-truth systems for performance evaluation of vision and perception systems in the fields of smart manufacturing and robot safety. In this paper we first present an...
详细信息
ISBN:
(纸本)9780769549903
We have been researching three dimensional (3D) ground-truth systems for performance evaluation of vision and perception systems in the fields of smart manufacturing and robot safety. In this paper we first present an overview of different systems that have been used to provide ground-truth (GT) measurements and then we discuss the advantages of physically-sensed ground-truth systems for our applications. then we discuss in detail the three ground- truth systems that we have used in our experiments: ultra wide-band, indoor GPS, and a camera-based motion capture system. Finally, we discuss three different perception-evaluation experiments where we have used these GT systems
the development of complex, powerful classifiers and their constant improvement have contributed much to the progress in many fields of computervision. However, the trend towards large scale datasets revived the inte...
详细信息
ISBN:
(纸本)9780769549897
the development of complex, powerful classifiers and their constant improvement have contributed much to the progress in many fields of computervision. However, the trend towards large scale datasets revived the interest in simpler classifiers to reduce runtime. Simple nearest neighbor classifiers have several beneficial properties, such as low complexity and inherent multi-class handling, however, they have a runtime linear in the size of the database. Recent related work represents data samples by assigning them to a set of prototypes that partition the input feature space and afterwards applies linear classifiers on top of this representation to approximate decision boundaries locally linear. In this paper, we go a step beyond these approaches and purely focus on 1-nearest prototype classification, where we propose a novel algorithm for deriving optimal prototypes in a discriminative manner from the training samples. Our method is implicitly multi-class capable, parameter free, avoids noise overfitting and, since during testing only comparisons to the derived prototypes are required, highly efficient. Experiments demonstrate that we are able to outperform related locally linear methods, while even getting close to the results of more complex classifiers.
We propose a detection and segmentation algorithm for the purposes of fine-grained recognition. the algorithm first detects low-level regions that could potentially belong to the object and then performs a full-object...
详细信息
ISBN:
(纸本)9780769549897
We propose a detection and segmentation algorithm for the purposes of fine-grained recognition. the algorithm first detects low-level regions that could potentially belong to the object and then performs a full-object segmentation through propagation. Apart from segmenting the object, we can also 'zoom in' on the object, i.e. center it, normalize it for scale, and thus discount the effects of the background. We then show that combining this with a state-of-the-art classification algorithm leads to significant improvements in performance especially for datasets which are considered particularly hard for recognition, e. g. birds species. the proposed algorithm is much more efficient than other known methods in similar scenarios [ 4, 21]. Our method is also simpler and we apply it here to different classes of objects, e. g. birds, flowers, cats and dogs. We tested the algorithm on a number of benchmark datasets for fine-grained categorization. It outperforms all the known state-of-the-art methods on these datasets, sometimes by as much as 11%. It improves the performance of our baseline algorithm by 3-4%, consistently on all datasets. We also observed more than a 4% improvement in the recognition performance on a challenging large-scale flower dataset, containing 578 species of flowers and 250,000 images.
In this paper we present a flash game that aims at generating easily ground truth for testing object detection algorithms. Flash the Fish is an online game where the user is shown videos from underwater environments a...
详细信息
ISBN:
(纸本)9780769549903
In this paper we present a flash game that aims at generating easily ground truth for testing object detection algorithms. Flash the Fish is an online game where the user is shown videos from underwater environments and has to take photos of fish by clicking on them. the initial ground truth is provided by object detection algorithms and, subsequent, cluster analysis and user evaluation techniques, allow for the generation of ground truth based on the weighted combination of these "photos". Evaluation of the platform and comparison of the obtained results against a hand drawn ground truth confirmed that reliable ground truth generation is not necessarily a cumbersome task both in terms of effort and time needed.
暂无评论