检索结果-内蒙古大学图书馆

Foreword to the Special Issue on "Geovision: computer vision for Geospatial Applications"

ieee JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 2016年第7期9卷 2840-2843页

作者： Tuia, Devis Wegner, Jan Dirk Mallet, Clement Yang, Michael Ying Univ Zurich CH-8057 Zurich Switzerland Swiss Fed Inst Technol CH-8093 Zurich Switzerland Univ Paris Est IGN LaSTIG 73 Ave Paris F-94160 St Mande France Univ Twente NL-7500 AE Enschede Netherlands

The nine papers in this special section focus on the development of new computer vision techniques for the interpretation of remote sensing images. These papers represent a follow-up of two workshops held in conjunction with the ieee conference on computer vision and pattern recognition (cvpr) 2015, that was held in Boston, MA, EARTHvision 2015 and MSF 2015. The purpose of both workshops and of this special issue is to foster fruitful collaboration of computer vision, Earth observation, and geospatial analysis communities.

关键词： Special issues and sections Meetings computer vision Geospatial analysis Remote sensing pattern recognition

来源：评论

学校读者我要写书评

暂无评论

Predicting When Saliency Maps are Accurate and Eye Fixations Consistent 29

Predicting When Saliency Maps are Accurate and Eye Fixations...

引用

2016 ieee conference on computer vision and pattern recognition (cvpr)

作者： Volokitin, Anna Gygli, Michael Boix, Xavier Swiss Fed Inst Technol Comp Vis Lab Zurich Switzerland Natl Univ Singapore Dept Elect & Comp Engn Singapore Singapore MIT CBMM 77 Massachusetts Ave Cambridge MA 02139 USA

ISBN: (纸本)9781467388511

Many computational models of visual attention use image features and machine learning techniques to predict eye fixation locations as saliency maps. Recently, the success of Deep Convolutional Neural Networks (DCNNs) for object recognition has opened a new avenue for computational models of visual attention due to the tight link between visual attention and object recognition. In this paper, we show that using features from DCNNs for object recognition we can make predictions that enrich the information provided by saliency models. Namely, we can estimate the reliability of a saliency model from the raw image, which serves as a meta-saliency measure that may be used to select the best saliency algorithm for an image. Analogously, the consistency of the eye fixations among subjects, i.e. the agreement between the eye fixation locations of different subjects, can also be predicted and used by a designer to assess whether subjects reach a consensus about salient image locations.

关键词： Object recognition

来源：评论

学校读者我要写书评

暂无评论

LSTA: Long Short-Term Attention for Egocentric Action recognition 32

LSTA: Long Short-Term Attention for Egocentric Action Recogn...

引用

32nd ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Sudhakaran, Swathikiran Escalera, Sergio Lanz, Oswald Fdn Bruno Kessler Trento Italy Univ Trento Trento Italy Comp Vis Ctr Barcelona Spain Univ Barcelona Barcelona Spain

ISBN: (纸本)9781728132938

Egocentric activity recognition is one of the most challenging tasks in video analysis. It requires a fine-grained discrimination of small objects and their manipulation. While some methods base on strong supervision and attention mechanisms, they are either annotation consuming or do not take spatio-temporal patterns into account. In this paper we propose LSTA as a mechanism to focus on features from relevant spatial parts while attention is being tracked smoothly across the video sequence. We demonstrate the effectiveness of LSTA on egocentric activity recognition with an end-to-end trainable two-stream architecture, achieving state-of-the-art performance on four standard benchmarks.

关键词： Action recognition Deep Learning

来源：评论

学校读者我要写书评

暂无评论

Learning Action Maps of Large Environments via First-Person vision 29

Learning Action Maps of Large Environments via First-Person ...

引用

2016 ieee conference on computer vision and pattern recognition (cvpr)

作者： Rhinehart, Nicholas Kitani, Kris M. Carnegie Mellon Univ Inst Robot Pittsburgh PA 15213 USA

ISBN: (纸本)9781467388511

When people observe and interact with physical spaces, they are able to associate functionality to regions in the environment. Our goal is to automate dense functional understanding of large spaces by leveraging sparse activity demonstrations recorded from an ego-centric viewpoint. The method we describe enables functionality estimation in large scenes where people have behaved, as well as novel scenes where no behaviors are observed. Our method learns and predicts "Action Maps", which encode the ability for a user to perform activities at various locations. With the usage of an egocentric camera to observe human activities, our method scales with the size of the scene without the need for mounting multiple static surveillance cameras and is well-suited to the task of observing activities up-close. We demonstrate that by capturing appearance-based attributes of the environment and associating these attributes with activity demonstrations, our proposed mathematical framework allows for the prediction of Action Maps in new environments. Additionally, we offer a preliminary glance of the applicability of Action Maps by demonstrating a proof-of-concept application in which they are used in concert with activity detections to perform localization.

关键词： Cameras

来源：评论

学校读者我要写书评

暂无评论

Large Scale Hard Sample Mining with Monte Carlo Tree Search 29

Large Scale Hard Sample Mining with Monte Carlo Tree Search

引用

2016 ieee conference on computer vision and pattern recognition (cvpr)

作者： Canevet, Olivier Fleuret, Francois Idiap Res Inst Martigny Switzerland Ecole Polytech Fed Lausanne Lausanne Switzerland

ISBN: (纸本)9781467388511

We investigate an efficient strategy to collect false positives from very large training sets in the context of object detection. Our approach scales up the standard bootstrapping procedure by using a hierarchical decomposition of an image collection which reflects the statistical regularity of the detector's responses. Based on that decomposition, our procedure uses a Monte Carlo Tree Search to prioritize the sampling toward sub-families of images which have been observed to be rich in false positives, while maintaining a fraction of the sampling toward unexplored sub-families of images. The resulting procedure increases substantially the proportion of false positive samples among the visited ones compared to a naive uniform sampling. We apply experimentally this new procedure to face detection with a collection of similar to 100,000 background images and to pedestrian detection with similar to 32,000 images. We show that for two standard detectors, the proposed strategy cuts the number of images to visit by half to obtain the same amount of false positives and the same final performance.

关键词： Face recognition

来源：评论

学校读者我要写书评

暂无评论

Robust Scene Text recognition with Automatic Rectification 29

Robust Scene Text Recognition with Automatic Rectification

引用

2016 ieee conference on computer vision and pattern recognition (cvpr)

作者： Shi, Baoguang Wang, Xinggang Lyu, Pengyuan Yao, Cong Bai, Xiang Huazhong Univ Sci & Technol Sch Elect Informat & Commun Wuhan Peoples R China

ISBN: (纸本)9781467388511

Recognizing text in natural images is a challenging task with many unsolved problems. Different from those in documents, words in natural images often possess irregular shapes, which are caused by perspective distortion, curved character placement, etc. We propose RARE (Robust text recognizer with Automatic REctification), a recognition model that is robust to irregular text. RARE is a speciallydesigned deep neural network, which consists of a Spatial Transformer Network (STN) and a Sequence recognition Network (SRN). In testing, an image is firstly rectified via a predicted Thin-Plate-Spline (TPS) transformation, into a more "readable" image for the following SRN, which recognizes text through a sequence recognition approach. We show that the model is able to recognize several types of irregular text, including perspective text and curved text. RARE is end-to-end trainable, requiring only images and associated text labels, making it convenient to train and deploy the model in practical systems. State-of-the-art or highly-competitive performance achieved on several benchmarks well demonstrates the effectiveness of the proposed model.

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

Large-Scale Location recognition and the Geometric Burstiness Problem 29

Large-Scale Location Recognition and the Geometric Burstines...

引用

2016 ieee conference on computer vision and pattern recognition (cvpr)

作者： Sattler, Torsten Havlena, Michal Schindler, Konrad Pollefeys, Marc ETH Dept Comp Sci Zurich Switzerland ETH Comp Vis Lab Zurich Switzerland ETH Inst Geodesy & Photogrammetry Zurich Switzerland

ISBN: (纸本)9781467388511

Visual location recognition is the task of determining the place depicted in a query image from a given database of geo-tagged images. Location recognition is often cast as an image retrieval problem and recent research has almost exclusively focused on improving the chance that a relevant database image is ranked high enough after retrieval. The implicit assumption is that the number of inliers found by spatial verification can be used to distinguish between a related and an unrelated database photo with high precision. In this paper, we show that this assumption does not hold for large datasets due to the appearance of geometric bursts, i.e., sets of visual elements appearing in similar geometric configurations in unrelated database photos. We propose algorithms for detecting and handling geometric bursts. Although conceptually simple, using the proposed weighting schemes dramatically improves the recall that can be achieved when high precision is required compared to the standard re-ranking based on the inlier count. Our approach is easy to implement and can easily be integrated into existing location recognition systems.

关键词： Database systems

来源：评论

学校读者我要写书评

暂无评论

Discriminative Multi-modal Feature Fusion for RGBD Indoor Scene recognition 29

Discriminative Multi-modal Feature Fusion for RGBD Indoor Sc...

引用

2016 ieee conference on computer vision and pattern recognition (cvpr)

作者： Zhu, Hongyuan Weibel, Jean-Baptiste Lu, Shijian ASTAR I2R Singapore Singapore Georgia Tech Atlanta GA USA

ISBN: (纸本)9781467388511

RGBD scene recognition has attracted increasingly attention due to the rapid development of depth sensors and their wide application scenarios. While many research has been conducted, most work used hand-crafted features which are difficult to capture high-level semantic structures. Recently, the feature extracted from deep convolutional neural network has produced state-of-the-art results for various computer vision tasks, which inspire researchers to explore incorporating CNN learned features for RGBD scene understanding. On the other hand, most existing work combines rgb and depth features without adequately exploiting the consistency and complementary information between them. Inspired by some recent work on RGBD object recognition using multi-modal feature fusion, we introduce a novel discriminative multi-modal fusion framework for rgbd scene recognition for the first time which simultaneously considers the inter-and intra-modality correlation for all samples and meanwhile regularizing the learned features to be discriminative and compact. The results from the multi-modal layer can be back-propagated to the lower CNN layers, hence the parameters of the CNN layers and multi-modal layers are updated iteratively until convergence. Experiments on the recently proposed large scale SUN RGB-D datasets show that our method achieved the state-of-the-art without any image segmentation.

关键词： Object recognition

来源：评论

学校读者我要写书评

暂无评论

Going Deeper into First-Person Activity recognition 29

Going Deeper into First-Person Activity Recognition

引用

2016 ieee conference on computer vision and pattern recognition (cvpr)

作者： Ma, Minghuang Fan, Haoqi Kitani, Kris M. Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9781467388511

We bring together ideas from recent work on feature design for egocentric action recognition under one framework by exploring the use of deep convolutional neural networks (CNN). Recent work has shown that features such as hand appearance, object attributes, local hand motion and camera ego-motion are important for characterizing first-person actions. To integrate these ideas under one framework, we propose a twin stream network architecture, where one stream analyzes appearance information and the other stream analyzes motion information. Our appearance stream encodes prior knowledge of the egocentric paradigm by explicitly training the network to segment hands and localize objects. By visualizing certain neuron activation of our network, we show that our proposed architecture naturally learns features that capture object attributes and hand-object configurations. Our extensive experiments on benchmark egocentric action datasets show that our deep architecture enables recognition rates that significantly outperform state-of-the-art techniques - an average 6.6% increase in accuracy over all datasets. Furthermore, by learning to recognize objects, actions and activities jointly, the performance of individual recognition tasks also increase by 30% (actions) and 14% (objects). We also include the results of extensive ablative analysis to highlight the importance of network design decisions.

关键词： Network architecture

来源：评论

学校读者我要写书评

暂无评论

Deep Residual Learning for Image recognition 29

Deep Residual Learning for Image Recognition

引用

2016 ieee conference on computer vision and pattern recognition (cvpr)

作者： He, Kaiming Zhang, Xiangyu Ren, Shaoqing Sun, Jian Microsoft Res Beijing Peoples R China

ISBN: (纸本)9781467388511

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers-8x deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions(1), where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

关键词： Image recognition

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：