Deep networks are state-of-the-art models used for understanding the content of images, videos, audio and raw input data. Current computing systems are not able to run deep network models in real-time with low power c...
详细信息
ISBN:
(纸本)9781479943098
Deep networks are state-of-the-art models used for understanding the content of images, videos, audio and raw input data. Current computing systems are not able to run deep network models in real-time with low power consumption. In this paper we present nn-X: a scalable, low-power coprocessor for enabling real-time execution of deep neural networks. nn-X is implemented on programmable logic devices and comprises an array of configurable processing elements called collections. these collections perform the most common operations in deep networks: convolution, subsampling and non-linear functions. the nn-X system includes 4 high-speed direct memory access interfaces to DDR3 memory and two ARM Cortex-A9 processors. Each port is capable of a sustained throughput of 950 MB/s in full duplex. nn-X is able to achieve a peak performance of 227 G-ops/s, a measured performance in deep learning applications of up to 200 G-ops/s while consuming less than 4 watts of power. this translates to a performance per power improvement of 10 to 100 times that of conventional mobile and desktop processors.
Distracted driving due to cell phone usage is an increasingly costly problem in terms of lost lives and damaged property. Motivated by its impact on public safety and property, several state and federal governments ha...
详细信息
ISBN:
(纸本)9781479943098
Distracted driving due to cell phone usage is an increasingly costly problem in terms of lost lives and damaged property. Motivated by its impact on public safety and property, several state and federal governments have enacted regulations that prohibit driver mobile phone usage while driving. these regulations have created a need for cell phone usage detection for law enforcement. In this paper, we propose a computervision based method for determining driver cell phone usage using a near infrared (NIR) camera system directed at the vehicle's front windshield. the developed method consists of two stages;first, we localize the driver's face region within the front windshield image using the deformable part model (DPM). Next, we utilize a local aggregation based image classification technique to classify a region of interest (ROI) around the drivers face to detect the cell phone usage. We propose two classification architectures by using full face and half face images for classification and compare their performance in terms of accuracy, specificity, and sensitivity. We also present a comparison of various local aggregation-based image classification methods using bag-of-visual-words (BOW), vector of locally aggregated descriptors (VLAD) and Fisher vectors (FV). A data set of 1500 images was collected on a public roadway and is used to perform the experiments.
Recognizing activities in wide aerial/overhead imagery remains a challenging problem due in part to low-resolution video and cluttered scenes with a large number of moving objects. In the context of this research, we ...
详细信息
ISBN:
(纸本)9781479943098
Recognizing activities in wide aerial/overhead imagery remains a challenging problem due in part to low-resolution video and cluttered scenes with a large number of moving objects. In the context of this research, we deal with two unsynchronized data sources collected in real-world operating scenarios: full-motion videos (FMV) and analyst call-outs (ACO) in the form of chat messages (voice-to-text) made by a human watching the streamed FMV from an aerial platform. We present a multi-source multi-modal activity/event recognition system for surveillance applications, consisting of: (1) detecting and tracking multiple dynamic targets from a moving platform, (2) representing FMV target tracks and chat messages as graphs of attributes, (3) associating FMV tracks and chat messages using a probabilistic graph-based matching approach, and (4) detecting spatial-temporal activity boundaries. We also present an activity pattern learning framework which uses the multi-source associated data as training to index a large archive of FMV videos. Finally, we describe a multi-intelligence user interface for querying an index of activities of interest (AOIs) by movement type and geo-location, and for playing-back a summary of associated text (ACO) and activity video segments of targets-of-interest (TOIs) (in both pixel and geo-coordinates). Such tools help the end-user to quickly search, browse, and prepare mission reports from multi-source data.
In this paper we demonstrate that the current state of the art social grouping methodology can be enhanced withthe use of visual attention estimation. In a surveillance environment it is possible to extract the gazin...
详细信息
ISBN:
(纸本)9781479943098
In this paper we demonstrate that the current state of the art social grouping methodology can be enhanced withthe use of visual attention estimation. In a surveillance environment it is possible to extract the gazing direction of pedestrians, a feature which can be used to improve social grouping estimation. We implement a state of the art motion based social grouping technique to get a baseline success at social grouping, and implement the same grouping withthe addition of the visual attention feature. By a comparison of the success at finding social groups for two techniques we evaluate the effectiveness of including the visual attention feature. We test both methods on two datasets containing busy surveillance scenes. We find that the inclusion of visual interest improves the motion social grouping capability. For the Oxford data, we see a 5.6% improvement in true positives and 28.5% reduction in false positives. We see up to a 50% reduction in false positives in other datasets. the strength of the visual feature is demonstrated by the association of social connections that are otherwise missed by the motion only social grouping technique.
the autoencoder algorithm and its deep version as traditional dimensionality reduction methods have achieved great success via the powerful representability of neural networks. However, they just use each instance to ...
详细信息
ISBN:
(纸本)9781479943098
the autoencoder algorithm and its deep version as traditional dimensionality reduction methods have achieved great success via the powerful representability of neural networks. However, they just use each instance to reconstruct itself and ignore to explicitly model the data relation so as to discover the underlying effective manifold structure. In this paper, we propose a dimensionality reduction method by manifold learning, which iteratively explores data relation and use the relation to pursue the manifold structure. the method is realized by a so called "generalized autoencoder" (GAE), which extends the traditional autoencoder in two aspects: (1) each instance x(i) is used to reconstruct a set of instances {x(j)} rather than itself. (2) the reconstruction error of each instance (|parallel to x(j) - x(i)'parallel to(2)) is weighted by a relational function of x(i) and x(j) defined on the learned manifold. Hence, the GAE captures the structure of the data space through minimizing the weighted distances between reconstructed instances and the original ones. the generalized autoencoder provides a general neural network framework for dimensionality reduction. In addition, we propose a multilayer architecture of the generalized autoencoder called deep generalized autoencoder to handle highly complex datasets. Finally, to evaluate the proposed methods, we perform extensive experiments on three datasets. the experiments demonstrate that the proposed methods achieve promising performance.
For finger-vein recognition, many successful methods, such as Line Tracking (LT), Maximum Curvature (MC) and Wide Line Detector (WL), have been proposed. Among these, LT has a very slow matching and feature-extraction...
详细信息
ISBN:
(纸本)9789897581335
For finger-vein recognition, many successful methods, such as Line Tracking (LT), Maximum Curvature (MC) and Wide Line Detector (WL), have been proposed. Among these, LT has a very slow matching and feature-extraction phase, and LT, MC and WL are translation and rotation dependent. Moreover, we show in the paper, they are affected by noise. To overcome these drawbacks, we propose using popular feature descriptors widely used for several computervision or patternrecognition (cvpr) problems in the literature. the cvpr descriptors we test include Histogram of Oriented Gradients (HOG), Fourier Descriptors (FD), Zernike Moments (ZM), Local Binary patterns (LBP) and Global Binary patterns (GBP), which have not been applied to the finger-vein recognition problem before. We compare these descriptors against LT, MC, and WL and evaluate their running times, performance and resilience against noise, rotation and translation. We report that the LT and WL methods accuracy are comparable to each other and WL gives the best accuracy, LT method's speed is the slowest. Our results indicate that WL can be used together with ZM and GBP in case of rotation and noise, respectively.
Different from previous 3D face modeling approaches that consider the whole facial area, the proposed method reconstructs 3D facial components for handling cross-pose recognition. It has two phases, component reconstr...
详细信息
ISBN:
(纸本)9781479943098
Different from previous 3D face modeling approaches that consider the whole facial area, the proposed method reconstructs 3D facial components for handling cross-pose recognition. It has two phases, component reconstruction and component-based recognition. In the reconstruction phase, we first extract four component regions, namely two eyes, nose and mouth, from each gallery face using the pose-invariant landmarks obtained by a modified version of a landmark detection algorithm. A 3D model of each component region is reconstructed using a constrained minimization scheme with a gender and ethnicity oriented 3D model as the reference. In the recognition phase, the pose of a given probe is determined by a set of landmarks which guides the rotation of the reconstructed components so that the reconstructed can be aligned to the probe components. the match is determined by the components instead of the whole faces so that different components can be considered at different poses. Experiments on the PIE and Multi-PIE databases show that the proposed component-based approach does not just outperform its holistic counterpart, but is also competitive to many contemporary methods.
the proceedings contain 539 papers. the topics discussed include: fast and accurate image matching with cascade hashing for 3D reconstruction;minimal solvers for relative pose with a single unknown radial distortion;s...
ISBN:
(纸本)9781479951178
the proceedings contain 539 papers. the topics discussed include: fast and accurate image matching with cascade hashing for 3D reconstruction;minimal solvers for relative pose with a single unknown radial distortion;spectral graph reduction for efficient image and streaming video segmentation;video motion segmentation using new adaptive manifold denoising model;event detection using multi-level relevance labels and multiple features;full-angle quaternions for robustly matching vectors of 3D rotations;semi-supervised spectral clustering for image set classification;learning mid-level filters for person re-identification;DeepReID: deep filter pairing neural network for person re-identification;NMF-KNN: image annotation using weighted multi-view non-negative matrix factorization;beyond comparing image pairs: setwise active learning for relative attributes;and histograms of pattern sets for image classification and object recognition.
this paper introduces the Chinese chess recognition algorithm based on computervision and image processing. In order to simplify processing and enhance efficiency, the images of chessboard and chessman need preproces...
详细信息
Welcome to the 13th International conference on Document Analysis and recognition(ICDAR 2015),hosted by the *** the Association of Sustainable Innovation in Tunisia(Tunisian Chapter of IAPR),will be held in Tunis(Tuni...
详细信息
Welcome to the 13th International conference on Document Analysis and recognition(ICDAR 2015),hosted by the *** the Association of Sustainable Innovation in Tunisia(Tunisian Chapter of IAPR),will be held in Tunis(Tunisia)from August 23-26th,*** 2015 is sponsored by the International Association for patternrecognition(IAPR)and technically co-sponsored by TC-10(Graphics recognition),TC-11(Reading Systems),ieeecomputer Society(pending approval)。
暂无评论