检索结果-内蒙古大学图书馆

Pupil detection for head-mounted eye tracking in the wild: an evaluation of the state of the art

MACHINE vision AND APPLICATIONS 2016年第8期27卷 1275-1288页

作者： Fuhl, Wolfgang Tonsen, Marc Bulling, Andreas Kasneci, Enkelejda Univ Tubingen Percept Engn Grp Tubingen Germany Max Planck Inst Informat Perceptual User Interfaces Grp Saarbrucken Germany

Robust and accurate detection of the pupil position is a key building block for head-mounted eye tracking and prerequisite for applications on top, such as gaze-based human-computer interaction or attention analysis. Despite a large body of work, detecting the pupil in images recorded under real-world conditions is challenging given significant variability in the eye appearance (e.g., illumination, reflections, occlusions, etc.), individual differences in eye physiology, as well as other sources of noise, such as contact lenses or make-up. In this paper we review six state-of-the-art pupil detection methods, namely ElSe (Fuhl et al. in Proceedings of the ninth biennial ACM symposium on eye tracking research&applications, ACM. New York, NY, USA, pp 123130, 2016), ExCuSe (Fuhl et al. in computer analysis of images and patterns. Springer, New York, pp 39-51, 2015), Pupil Labs (Kassner et al. in Adjunct proceedings of the 2014 ACM international joint conference on pervasive and ubiquitous computing (UbiComp), pp 1151-1160, 2014. doi: 10.1145/2638728.2641695), SET (Javadi et al. in Front Neuroeng 8, 2015), Starburst (Li et al. in computer vision and pattern recognition-workshops, 2005. ieee computer society conference on cvpr workshops. ieee, pp 79-79, 2005), and Swirski (Swirski et al. in Proceedings of the symposium on eye tracking research and applications (ETRA). ACM, pp 173-176, 2012. doi: 10.1145/2168556.2168585). We compare their performance on a large-scale data set consisting of 225,569 annotated eye images taken from four publicly available data sets. Our experimental results show that the algorithm ElSe (Fuhl et al. 2016) outperforms other pupil detection methods by a large margin, offering thus robust and accurate pupil positions on challenging everyday eye images.

关键词： Pupil detection Head-mounted eye tracking Data set computer vision Image processing

来源：评论

学校读者我要写书评

暂无评论

Proceedings of the ieee computer Society conference on computer vision and pattern recognition

Proceedings of the IEEE Computer Society Conference on Compu...

引用

27th ieee conference on computer vision and pattern recognition, cvpr 2014

ISBN: (纸本)9781479951178

the proceedings contain 539 papers. the topics discussed include: fast and accurate image matching with cascade hashing for 3D reconstruction;minimal solvers for relative pose with a single unknown radial distortion;spectral graph reduction for efficient image and streaming video segmentation;video motion segmentation using new adaptive manifold denoising model;event detection using multi-level relevance labels and multiple features;full-angle quaternions for robustly matching vectors of 3D rotations;semi-supervised spectral clustering for image set classification;learning mid-level filters for person re-identification;DeepReID: deep filter pairing neural network for person re-identification;NMF-KNN: image annotation using weighted multi-view non-negative matrix factorization;beyond comparing image pairs: setwise active learning for relative attributes;and histograms of pattern sets for image classification and object recognition.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Uncertain LDA: Including Observation Uncertainties in Discriminative Transforms

引用

ieee TRANSACTIONS ON pattern ANALYSIS AND MACHINE INTELLIGENCE 2016年第7期38卷 1479-88页

作者： Saeidi, Rahim Astudillo, Ramon Fernandez Kolossa, Dorothea Aalto Univ Dept Signal Proc & Acoust Espoo Uusimaa Finland INESC ID Spoken Language Syst Lab Lisbon Portugal Ruhr Univ Bochum Inst Commun Acoust Univ Str 150 Bochum Nrw Germany

Linear discriminant analysis (LDA) is a powerful technique in pattern recognition to reduce the dimensionality of data vectors. It maximizes discriminability by retaining only those directions that minimize the ratio of within-class and between-class variance. In this paper, using the same principles as for conventional LDA, we propose to employ uncertainties of the noisy or distorted input data in order to estimate maximally discriminant directions. We demonstrate the efficiency of the proposed uncertain LDA on two applications using state-of-the-art techniques. First, we experiment with an automatic speech recognition task, in which the uncertainty of observations is imposed by real-world additive noise. Next, we examine a full-scale speaker recognition system, considering the utterance duration as the source of uncertainty in authenticating a speaker. the experimental results show that when employing an appropriate uncertainty estimation algorithm, uncertain LDA outperforms its conventional LDA counterpart.

关键词： Uncertainty linear discriminant analysis LDA speaker recognition speech recognition

来源：评论

学校读者我要写书评

暂无评论

Learning Deep Features for Discriminative Localization

Learning Deep Features for Discriminative Localization

引用

ieee conference on computer vision and pattern recognition

作者： Bolei Zhou Aditya Khosla Agata Lapedriza Aude Oliva Antonio Torralba Computer Science and Artificial Intelligence Laboratory MIT

ISBN: (纸本)9781467388528

In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on image-level labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that exposes the implicit attention of CNNs on an image. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014 without training on any bounding box annotation. We demonstrate in a variety of experiments that our network is able to localize the discriminative image regions despite just being trained for solving classification task.

关键词： region of an image pooling bounding boxes Imagery (Psychotherapy) images SCRAM Functional training

来源：评论

学校读者我要写书评

暂无评论

Partial Optimality by Pruning for MAP-Inference with General Graphical Models

引用

ieee TRANSACTIONS ON pattern ANALYSIS AND MACHINE INTELLIGENCE 2016年第7期38卷 1370-82页

作者： Swoboda, Paul Shekhovtsov, Alexander Kappes, Joerg Hendrik Schnoerr, Christoph Savchynskyy, Bogdan Heidelberg Univ Image & Pattern Anal Grp IPA Speyerer Str 6 D-69115 Heidelberg Germany Graz Univ Technol Inst Comp Graph & Vis ICG Inffeldgasse 16 A-8010 Graz Austria Heidelberg Univ Heidelberg Collaboratory Image Proc HCI Speyerer Str 6 D-69115 Heidelberg Germany

We consider the energy minimization problem for undirected graphical models, also known as MAP-inference problem for Markov random fields which is NP-hard in general. We propose a novel polynomial time algorithm to obtain a part of its optimal non-relaxed integral solution. Our algorithm is initialized with variables taking integral values in the solution of a convex relaxation of the MAP-inference problem and iteratively prunes those, which do not satisfy our criterion for partial optimality. We show that our pruning strategy is in a certain sense theoretically optimal. Also empirically our method outperforms previous approaches in terms of the number of persistently labelled variables. the method is very general, as it is applicable to models with arbitrary factors of an arbitrary order and can employ any solver for the considered relaxed problem. Our method's runtime is determined by the runtime of the convex relaxation solver for the MAP-inference problem.

关键词： MAP-inference Markov random fields energy minimization persistency partial optimality local polytope

来源：评论

学校读者我要写书评

暂无评论

Rethinking the Inception Architecture for computer vision

Rethinking the Inception Architecture for Computer Vision

引用

ieee conference on computer vision and pattern recognition

作者： Christian Szegedy Vincent Vanhoucke Sergey Ioffe Jon Shlens Zbigniew Wojna Google Inc. University College London

ISBN: (纸本)9781467388528

Convolutional networks are at the core of most state-of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational efficiency and low parameter count are still enabling factors for various use cases such as mobile vision and big-data scenarios. Here we are exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21.2% top-1 and 5.6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3.5% top-5 error and 17.3% top-1 error on the validation set and 3.6% top-5 error on the official test set.

关键词： Convolution computer architecture Training Computational efficiency computer vision Benchmark testing Computational modeling

来源：评论

学校读者我要写书评

暂无评论

Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks

Online Detection and Classification of Dynamic Hand Gestures...

引用

ieee conference on computer vision and pattern recognition

作者： Pavlo Molchanov Xiaodong Yang Shalini Gupta Kihwan Kim Stephen Tyree Jan Kautz NVIDIA

ISBN: (纸本)9781467388528

Automatic detection and classification of dynamic hand gestures in real-world systems intended for human computer interaction is challenging as: 1) there is a large diversity in how people perform gestures, making detection and classification difficult;2) the system must work online in order to avoid noticeable lag between performing a gesture and its classification;in fact, a negative lag (classification before the gesture is finished) is desirable, as feedback to the user can then be truly instantaneous. In this paper, we address these challenges with a recurrent three-dimensional convolutional neural network that performs simultaneous detection and classification of dynamic hand gestures from multi-modal data. We employ connectionist temporal classification to train the network to predict class labels from inprogress gestures in unsegmented input streams. In order to validate our method, we introduce a new challenging multimodal dynamic hand gesture dataset captured with depth, color and stereo-IR sensors. On this challenging dataset, our gesture recognition system achieves an accuracy of 83.8%, outperforms competing state-of-the-art algorithms, and approaches human accuracy of 88.4%. Moreover, our method achieves state-of-the-art performance on SKIG and ChaLearn2014 benchmarks.

关键词： Gestures Hand Taxonomy computer human interaction automatic detection Neural network

来源：评论

学校读者我要写书评

暂无评论

You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images

You Lead, We Exceed: Labor-Free Video Concept Learning by Jo...

引用

ieee conference on computer vision and pattern recognition

作者： Chuang Gan Ting Yao Kuiyuan Yang Yi Yang Tao Mei IIIS Tsinghua Univ. Beijing China Microsoft Res. Beijing China QCIS Univ. of Technol. Sydney Sydney NSW Australia

ISBN: (纸本)9781467388528

Video concept learning often requires a large set of training samples. In practice, however, acquiring noise-free training labels with sufficient positive examples is very expensive. A plausible solution for training data collection is by sampling from the vast quantities of images and videos on the Web. Such a solution is motivated by the assumption that the retrieved images or videos are highly correlated with the query. Still, a number of challenges remain. First, Web videos are often untrimmed. thus, only parts of the videos are relevant to the query. Second, the retrieved Web images are always highly relevant to the issued query. However, thoughtlessly utilizing the images in the video domain may even hurt the performance due to the well-known semantic drift and domain gap problems. As a result, a valid question is how Web images and videos interact for video concept learning. In this paper, we propose a Lead-Exceed Neural Network (LENN), which reinforces the training on Web images and videos in a curriculum manner. Specifically, the training proceeds by inputting frames of Web videos to obtain a network. the Web images are then filtered by the learnt network and the selected images are additionally fed into the network to enhance the architecture and further trim the videos. In addition, Long Short-Term Memory (LSTM) can be applied on the trimmed videos to explore temporal information. Encouraging results are reported on UCF101, TRECVID 2013 and 2014 MEDTest in the context of both action recognition and event detection. Without using human annotated exemplars, our proposed LENN can achieve 74.4% accuracy on UCF101 dataset.

关键词： conferences computer vision pattern recognition

来源：评论

学校读者我要写书评

暂无评论

Probabilistic Labeling Cost for High-Accuracy Multi-View Reconstruction 27

Probabilistic Labeling Cost for High-Accuracy Multi-View Rec...

引用

27th ieee conference on computer vision and pattern recognition (cvpr)

作者： Kostrikov, Ilya Horbert, Esther Leibe, Bastian Rhein Westfal TH Aachen Comp Vis Grp Aachen Germany

ISBN: (纸本)9781479951178

In this paper, we propose a novel labeling cost for multi-view reconstruction. Existing approaches use data terms with specific weaknesses that are vulnerable to common challenges, such as low-textured regions or specularities. Our new probabilistic method implicitly discards outliers and can be shown to become more exact the closer we get to the true object surface. Our approach achieves top results among all published methods on the Middlebury DINO SPARSE dataset and also delivers accurate results on several other datasets with widely varying challenges, for which it works in unchanged form.

关键词： pattern recognition

来源：评论

学校读者我要写书评

暂无评论

Histograms of pattern Sets for Image Classification and Object recognition 27

Histograms of Pattern Sets for Image Classification and Obje...

引用

27th ieee conference on computer vision and pattern recognition (cvpr)

作者： Voravuthikunchai, Winn Cremilleux, Bruno Jurie, Frederic Univ Caen Basse Normandie CNRS UMR 6072 ENSICAEN Caen France

ISBN: (纸本)9781479951178

this paper introduces a novel image representation capturing feature dependencies through the mining of meaningful combinations of visual features. this representation leads to a compact and discriminative encoding of images that can be used for image classification, object detection or object recognition. the method relies on (i) multiple random projections of the input space followed by local binarization of projected histograms encoded as sets of items, and (ii) the representation of images as Histograms of pattern Sets (HoPS). the approach is validated on four publicly available datasets (Daimler Pedestrian, Oxford Flowers, Kth Texture and PASCAL VOC2007), allowing comparisons with many recent approaches. the proposed image representation reaches state-of-the-art performance on each one of these datasets.

关键词： computer vision data mining image classification object detection visual recognition

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：