Understanding human actions in videos has been a central research theme in computervision for decades, and much progress has been achieved over the years. Much of this progress was demonstrated on standard benchmarks...
详细信息
ISBN:
(纸本)9780769549903
Understanding human actions in videos has been a central research theme in computervision for decades, and much progress has been achieved over the years. Much of this progress was demonstrated on standard benchmarks used to evaluate novel techniques. These benchmarks and their evolution, provide a unique perspective on the growing capabilities of computerized action recognition systems. They demonstrate just how far machine vision systems have come while also underscore the gap that still remains between existing state-of-the-art performance and the needs of real-world applications. In this paper we provide a comprehensive survey of these benchmarks: from early examples, such as the Weizmann set [1], to recently presented, contemporary benchmarks. This paper further provides a summary of the results obtained in the last couple of years on the recent ASLAN benchmark [12], which was designed to reflect the many challenges modern Action recognition systems are expected to overcome.
This paper presents a unified framework for object detection, segmentation, and classification using regions. Region features are appealing in this context because: (1) they encode shape and scale information of objec...
详细信息
ISBN:
(纸本)9781424439928
This paper presents a unified framework for object detection, segmentation, and classification using regions. Region features are appealing in this context because: (1) they encode shape and scale information of objects naturally;(2) they are only mildly affected by background clutter Regions have not been popular as features due to their sensitivity to segmentation errors. In this paper, we start by producing a robust bag of overlaid regions for each image using Arbelaez et al., cvpr 2009. Each region is represented by a rich set of image cues (shape, color and texture). We then learn region weights using a max-margin framework. In detection and segmentation, we apply a generalized Hough voting scheme to generate hypotheses of object locations, scales and support, followed by a verification classifier and a constrained segmenter on each hypothesis. The proposed approach significantly outperforms the state of the art on the ETHZ shape database (87.1% average detection rate compared to Ferrari et al. 's 67.2%), and achieves competitive performance on the Caltech 101 database.
We present a novel approach for accelerating convolutions during inference for CPU-based architectures. The most common method of computation involves packing the image into the columns of a matrix (im2col) and perfor...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
We present a novel approach for accelerating convolutions during inference for CPU-based architectures. The most common method of computation involves packing the image into the columns of a matrix (im2col) and performing general matrix multiplication (GEMM) with a matrix of weights. This results in two main drawbacks: (a) im2col requires a large memory buffer and can experience inefficient memory access, and (b) while GEMM is highly optimized for scientific matrices multiplications, it is not well suited for convolutions. We propose an approach that takes advantage of scalar-matrix multiplication and reduces memory overhead. Our experiments with commonly used network architectures demonstrate a significant speedup compared to existing indirect methods.
We present a new state-of-the-art on the text-to-video retrieval task on MSRVTT and LSMDC benchmarks where our model outperforms all previous solutions by a large margin. Moreover, state-of-the-art results are achieve...
详细信息
ISBN:
(纸本)9781665448994
We present a new state-of-the-art on the text-to-video retrieval task on MSRVTT and LSMDC benchmarks where our model outperforms all previous solutions by a large margin. Moreover, state-of-the-art results are achieved using a single model and without finetuning. This multidomain generalisation is achieved by a proper combination of different video caption datasets. We show that our practical approach for training on different datasets can improve test results of each other. Additionally, we check intersection between many popular datasets and show that MSRVTT as well as ActivityNet contains a significant overlap between the test and the training parts. More details are available at https://***/papermsucode/mdmmt.
We present an automotive-grade, real-time, vision-based Driver State Monitor. Upon detecting and tracking the driver's facial features, the system analyzes eye-closures and head pose to infer his/her fatigue or di...
详细信息
ISBN:
(纸本)0769523722
We present an automotive-grade, real-time, vision-based Driver State Monitor. Upon detecting and tracking the driver's facial features, the system analyzes eye-closures and head pose to infer his/her fatigue or distraction. This information is used to warn the driver and to modulate the actions of other safety systems. The purpose of this monitor is to increase road safety by preventing drivers from falling asleep or from being overly distracted, and to improve the effectiveness of other safety systems.
Manufacturing flaws of all types, shapes, and sizes can be exhaustively detected as abnormal pixels, if process and noise variations can be learned at every pixel in the inspection area. This statistical template appr...
详细信息
ISBN:
(纸本)0818684976
Manufacturing flaws of all types, shapes, and sizes can be exhaustively detected as abnormal pixels, if process and noise variations can be learned at every pixel in the inspection area. This statistical template approach to automated visual inspection is extremely fast, effective, and flexible, while achieving false negative rate < 10(-6). Critical to this approach are the following novel features: 1) represent both geometry *** process informations in a model template;2) align 3D surfaces with subpixel accuracy;3) compensate for local deformation and texture;4) estimate bimodal distribution robustly. This novel paradigm was applied to the automatic screening of X-ray images of turbine blades. It has been validated with over 50,000 images and shown to out perform regular inspectors looking at high-pass filtered images.
Landuse classification is an important problem in the remote sensing field. It can be used in a wide range of applications. In this paper we propose a hybrid method fusing edges and regions information for the landuse...
详细信息
ISBN:
(纸本)0769523722
Landuse classification is an important problem in the remote sensing field. It can be used in a wide range of applications. In this paper we propose a hybrid method fusing edges and regions information for the landuse classification of multispectral images. It mainly includes the steps of image pre-processing, initial segmentation and region merging. Especially, a novel spatial mean shift procedure is proposed so that some information can be extracted and used in the successive steps. Aiming at the multispectral images processing, we also design a band weighting strategy that give a proper weight to each band adaptively according to the region to be processed. Experimental results on the Landsat TM and ETM+ images validate the performance of the proposed method.
In this paper, we study the problem of reproducing the light from a single image of an object covered with random specular microfacets on the surface. We show that such reflectors can be interpreted as a randomized ma...
详细信息
ISBN:
(纸本)9781467367592
In this paper, we study the problem of reproducing the light from a single image of an object covered with random specular microfacets on the surface. We show that such reflectors can be interpreted as a randomized mapping from the lighting to the image. Such specular objects have very different optical properties from both diffuse surfaces and smooth specular objects like metals, so we design a special imaging system to robustly and effectively photograph them. We present simple yet reliable algorithms to calibrate the proposed system and do the inference. We conduct experiments to verify the correctness of our model assumptions and prove the effectiveness of our pipeline.
Translation symmetry is one of the most important pattern characteristics in natural and man-made environments. Detecting translation symmetry is a grand challenge in computervision. This has a large spectrum of real...
详细信息
ISBN:
(纸本)9780769549903
Translation symmetry is one of the most important pattern characteristics in natural and man-made environments. Detecting translation symmetry is a grand challenge in computervision. This has a large spectrum of real-world applications from industrial settings to design, arts, entertainment and eduction. This paper describes the algorithm we have submitted for the Symmetry Detection Competition 2013. We introduce two new concepts in our symmetric repetitive pattern detection algorithm. The first concept is the bottom-up detection-inference approach. This extends the versatility of current detection methods to a higher level segmentation. The second concept is the framework of a new theoretical analysis of invariant repetitive patterns. This is crucial in symmetry/non-symmetry structure extraction but has less coverage in the previous literature on pattern detection and classification.
We describe a method for training object detectors using a generalization of the cascade architecture, which results in a detection rate and speed comparable to that of the best published detectors while allowing for ...
详细信息
ISBN:
(纸本)0769523722
We describe a method for training object detectors using a generalization of the cascade architecture, which results in a detection rate and speed comparable to that of the best published detectors while allowing for easier training and a detector with fewer features. In addition, the method allows for quickly calibrating the detector for a target detection rate, false positive rate or speed. One important advantage of our method is that it enables systematic exploration of the ROC Surface, which characterizes the trade-off between accuracy and speed for a given classifier.
暂无评论