检索结果-内蒙古大学图书馆

30th ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Briales, Jesus Gonzalez-Jimenez, Javier Univ Malaga MAPIR UMA Grp Malaga Spain

ISBN: (纸本)9781538604571

the registration of 3D models by a Euclidean transformation is a fundamental task at the core of many application in computer vision. this problem is non- convex due to the presence of rotational constraints, making traditional local optimization methods prone to getting stuck in local minima. this paper addresses finding the globally optimal transformation in various 3D registration problems by a unified formulation that integrates common geometric registration modalities (namely point-to-point, point-to-line and point-to-plane). this formulation renders the optimization problem independent of both the number and nature of the correspondences. the main novelty of our proposal is the introduction of a strengthened Lagrangian dual relaxation for this problem, which surpasses previous similar approaches [32] in effectiveness. In fact, even though with no theoretical guarantees, exhaustive empirical evaluation in both synthetic and real experiments always resulted on a tight relaxation that allowed to recover a guaranteed globally optimal solution by exploiting duality theory. thus, our approach allows for effectively solving the 3D registration with global optimality guarantees while running at a fraction of the time for the state-of-the-art alternative [34], based on a more computationally intensive Branch and Bound method.

关键词： Registration images computer vision duality theory Branch and bound methods Lagrangian Non-Convex three Dimensions point to point communication

来源：评论

学校读者我要写书评

暂无评论

Spatio-Temporal Self-Organizing Map Deep Network for Dynamic Object Detection from Videos 30

Spatio-Temporal Self-Organizing Map Deep Network for Dynamic...

引用

30th ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Du, Yang Yuan, Chunfeng Li, Bing Hu, Weiming Maybank, Stephen Univ Chinese Acad Sci Chinese Acad Sci CAS Ctr Excellence Brain Sci & Intelligence Tech Natl Lab Pattern RecognitInst Automat Beijing Peoples R China Birkbeck Coll London England

ISBN: (纸本)9781538604571

In dynamic object detection, it is challenging to construct an effective model to sufficiently characterize the spatial-temporal properties of the background. this paper proposes a new Spatio-Temporal Self-Organizing Map (STSOM) deep network to detect dynamic objects in complex scenarios. the proposed approach has several contributions: First, a novel STSOM shared by all pixels in a video frame is presented to efficiently model complex background. We exploit the fact that the motions of complex background have the global variation in the space and the local variation in the time, to train STSOM using the whole frames and the sequence of a pixel over time to tackle the variance of complex background. Second, a Bayesian parameter estimation based method is presented to learn thresholds automatically for all pixels to filter out the background. Last, in order to model the complex background more accurately, we extend the single-layer STSOM to the deep network. then the background is filtered out layer by layer. Experimental results on CDnet 2014 dataset demonstrate that the proposed STSOM deep network outperforms numerous recently proposed methods in the overall performance and in most categories of scenarios.

关键词： Self-organizing feature maps Object detection Videos Dynamics Bayes methods Kernel computer vision

来源：评论

学校读者我要写书评

暂无评论

Noisy Softmax: Improving the Generalization Ability of DCNN via Postponing the Early Softmax Saturation 30

Noisy Softmax: Improving the Generalization Ability of DCNN ...

引用

30th ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Chen, Binghui Deng, Weihong Du, Junping Beijing Univ Posts & Telecommun Sch Informat & Commun Engn Beijing Peoples R China Beijing Univ Posts & Telecommun Sch Comp Sci Beijing Peoples R China

ISBN: (纸本)9781538604571

Over the past few years, softmax and SGD have become a commonly used component and the default training strategy in CNN frameworks, respectively. However, when optimizing CNNs with SGD, the saturation behavior behind softmax always gives us an illusion of training well and then is omitted. In this paper, we first emphasize that the early saturation behavior of softmax will impede the exploration of SGD, which sometimes is a reason for model converging at a bad local-minima, then propose Noisy Softmax to mitigating this early saturation issue by injecting annealed noise in softmax during each iteration. this operation based on noise injection aims at postponing the early saturation and further bringing continuous gradients propagation so as to significantly encourage SGD solver to be more exploratory and help to find a better local-minima. this paper empirically verifies the superiority of the early softmax desaturation, and our method indeed improves the generalization ability of CNN model by regularization. We experimentally find that this early desaturation helps optimization in many tasks, yielding state-of-the-art or competitive results on several popular benchmark datasets.

关键词： Training Noise measurement Annealing Standards Robustness computer vision Telecommunications

来源：评论

学校读者我要写书评

暂无评论

Regressing Robust and Discriminative 3D Morphable Models with a very Deep Neural Network 30

Regressing Robust and Discriminative 3D Morphable Models wit...

引用

30th ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Tran, Anh Tuan Hassner, Tal Masi, Iacopo Medioni, Erard USC Inst Robot & Intelligent Syst Los Angeles CA 90007 USA USC Inst Informat Sci Los Angeles CA 90007 USA Open Univ Israel Raanana Israel

ISBN: (纸本)9781538604571

the 3D shapes of faces are well known to be discriminative. Yet despite this, they are rarely used for face recognition and always under controlled viewing conditions. We claim that this is a symptom of a serious but often overlooked problem with existing methods for single view 3D face reconstruction: when applied "in the wild", their 3D estimates are either unstable and change for different photos of the same subject or they are over-regularized and generic. In response, we describe a robust method for regressing discriminative 3D morphable face models (3DMM). We use a convolutional neural network (CNN) to regress 3DMM shape and texture parameters directly from an input photo. We overcome the shortage of training data required for this purpose by offering a method for generating huge numbers of labeled examples. the 3D estimates produced by our CNN surpass state of the art accuracy on the MICC data set. Coupled with a 3D-3D face matching pipeline, we show the first competitive face recognition results on the LFW, YTF and IJB-A benchmarks using 3D face shapes as representations, rather than the opaque deep feature vectors used by other modern systems.

关键词： three-dimensional displays Shape Face recognition Robustness Training Image reconstruction Training data

来源：评论

学校读者我要写书评

暂无评论

Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression recognition in the Wild 30

Reliable Crowdsourcing and Deep Locality-Preserving Learning...

引用

30th ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Li, Shan Deng, Weihong Du, JunPing Beijing Univ Posts & Telecommun Beijing Peoples R China

ISBN: (纸本)9781538604571

Past research on facial expressions have used relatively limited datasets, which makes it unclear whether current methods can be employed in real world. In this paper, we present a novel database, RAF-DB, which contains about 30000 facial images from thousands of individuals. Each image has been individually labeled about 40 times, then EM algorithm was used to filter out unreliable labels. Crowdsourcing reveals that real-world faces often express compound emotions, or even mixture ones. For all we know, RAF-DB is the first database that contains compound expressions in the wild. Our cross-database study shows that the action units of basic emotions in RAF-DB are much more diverse than, or even deviate from, those of lab-controlled ones. To address this problem, we propose a new DLP-CNN (Deep Locality-Preserving CNN) method, which aims to enhance the discriminative power of deep features by preserving the locality closeness while maximizing the inter-class scatters. the benchmark experiments on the 7-class basic expressions and 11-class compound expressions, as well as the additional experiments on SFEW and CK+ databases, show that the proposed DLP-CNN outperforms the state-of-the-art handcrafted features and deep learning based methods for the expression recognition in the wild.

关键词： expression recognition BENCHMARKS Facial Expression expectation-maximisation algorithm Database Emotions compounds Mug shots

来源：评论

学校读者我要写书评

暂无评论

Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification 30

Learning Spatial Regularization with Image-level Supervision...

引用

30th ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Zhu, Feng Li, Hongsheng Ouyang, Wanli Yu, Nenghai Wang, Xiaogang Univ Sci & Technol China Hefei Anhui Peoples R China Univ Sydney Sydney NSW Australia Chinese Univ Hong Kong Dept Elect Engn Hong Kong Hong Kong Peoples R China

ISBN: (纸本)9781538604571

Multi-label image classification is a fundamental but challenging task in computer vision. Great progress has been achieved by exploiting semantic relations between labels in recent years. However, conventional approaches are unable to model the underlying spatial relations between labels in multi-label images, because spatial annotations of the labels are generally not provided. In this paper, we propose a unified deep neural network that exploits both semantic and spatial relations between labels with only image-level supervisions. Given a multi-label image, our proposed Spatial Regularization Network (SRN) generates attention maps for all labels and captures the underlying relations between them via learnable convolutions. By aggregating the regularized classification results with original results by a ResNet-101 network, the classification performance can be consistently improved. the whole deep neural network is trained end-to-end with only image-level annotations, thus requires no additional efforts on image annotations. Extensive evaluations on 3 public datasets with different types of labels show that our approach significantly outperforms state-of-the-arts and has strong generalization capability. Analysis of the learned SRN model demonstrates that it can effectively capture both semantic and spatial relations of labels for improving classification performance.

关键词： dimensional relationship images Image classification Semantics computer vision ANNOTATIONS end-to-end

来源：评论

学校读者我要写书评

暂无评论

A Global Hypothesis Verification Framework for 3D Object recognition in Clutter

引用

ieee TRANSACTIONS ON pattern ANALYSIS AND MACHINE INTELLIGENCE 2016年第7期38卷 1383-1396页

作者： Aldoma, Aitor Tombari, Federico Di Stefano, Luigi Vincze, Markus Vienna Univ Technol Grp ACIN Vision4Robot A-1060 Vienna Austria Univ Bologna CVLAB Grp DISI Bologna Italy

Pipelines to recognize 3D objects despite clutter and occlusions usually end up with a final verification stage whereby recognition hypotheses are validated or dismissed based on how well they explain sensor measurements. Unlike previous work, we propose a Global Hypothesis Verification (GHV) approach which regards all hypotheses jointly so as to account for mutual interactions. GHV provides a principled framework to tackle the complexity of our visual world by leveraging on a plurality of recognition paradigms and cues. Accordingly, we present a 3D object recognition pipeline deploying both global and local 3D features as well as shape and color. thereby, and facilitated by the robustness of the verification process, diverse object hypotheses can be gathered and weak hypotheses need not be suppressed too early to trade sensitivity for specificity. Experiments demonstrate the effectiveness of our proposal, which significantly improves over the state-of-art and attains ideal performance (no false negatives, no false positives) on three out of the six most relevant and challenging benchmark datasets.

关键词： 3D object recognition hypothesis verification correspondence grouping scene understanding

来源：评论

学校读者我要写书评

暂无评论

Distributed Multi-Target Tracking and Data Association in vision Networks

引用

ieee TRANSACTIONS ON pattern ANALYSIS AND MACHINE INTELLIGENCE 2016年第7期38卷 1397-1397页

作者： Kamal, Ahmed T. Bappy, Jawadul H. Farrell, Jay A. Roy-Chowdhury, Amit K. Univ Calif Riverside Riverside CA 92521 USA

Distributed algorithms have recently gained immense popularity. With regards to computer vision applications, distributed multi-target tracking in a camera network is a fundamental problem. the goal is for all cameras to have accurate state estimates for all targets. Distributed estimation algorithms work by exchanging information between sensors that are communication neighbors. vision-based distributed multi-target state estimation has at least two characteristics that distinguishes it from other applications. First, cameras are directional sensors and often neighboring sensors may not be sensing the same targets, i.e., they are naive with respect to that target. Second, in the presence of clutter and multiple targets, each camera must solve a data association problem. this paper presents an information-weighted, consensus-based, distributed multi-target tracking algorithm referred to as the Multi-target Information Consensus (MTIC) algorithm that is designed to address both the naivety and the data association problems. It converges to the centralized minimum mean square error estimate. the proposed MTIC algorithm and its extensions to non-linear camera models, termed as the Extended MTIC (EMTIC), are robust to false measurements and limited resources like power, bandwidth and the realtime operational requirements. Simulation and experimental analysis are provided to support the theoretical results.

关键词： Consensus distributed tracking data association camera networks

来源：评论

学校读者我要写书评

暂无评论

Real-time Action recognition with Enhanced Motion Vector CNNs 29

Real-time Action Recognition with Enhanced Motion Vector CNN...

引用

2016 ieee conference on computer vision and pattern recognition (cvpr)

作者： Zhang, Bowen Wang, Limin Wang, Zhe Qiao, Yu Wang, Hanli Chinese Acad Sci Shenzhen Inst Adv Technol Shenzhen key lab Comp Vis & Pat Rec Beijing Peoples R China Tongji Univ Key Lab Embedded Syst & Serv Comp Minist Educ Shanghai Peoples R China Swiss Fed Inst Technol Comp Vis Lab Zurich Switzerland

ISBN: (纸本)9781467388511

the deep two-stream architecture [23] exhibited excellent performance on video based action recognition. the most computationally expensive step in this approach comes from the calculation of optical flow which prevents it to be real-time. this paper accelerates this architecture by replacing optical flow with motion vector which can be obtained directly from compressed videos without extra calculation. However, motion vector lacks fine structures, and contains noisy and inaccurate motion patterns, leading to the evident degradation of recognition performance. Our key insight for relieving this problem is that optical flow and motion vector are inherent correlated. Transferring the knowledge learned with optical flow CNN to motion vector CNN can significantly boost the performance of the latter. Specifically, we introduce three strategies for this, initialization transfer, supervision transfer and their combination. Experimental results show that our method achieves comparable recognition performance to the state-of-the-art, while our method can process 390.7 frames per second, which is 27 times faster than the original two-stream method.

关键词： Optical flows

来源：评论

学校读者我要写书评

暂无评论

A Comparison of Linearisation and the Unscented Transform for computer vision Applications

A Comparison of Linearisation and the Unscented Transform fo...

引用

27th Annual Symposium of the pattern-recognition-Association-of-South-Africa / 9th Robotics and Mechatronics conference of South Africa (Robmech)

作者： Chiu, Alexander Jones, thomas van Daalen, Corne E. Stellenbosch Univ Dept Elect & Elect Engn Stellenbosch South Africa

ISBN: (纸本)9781509033355

Accurate sensor noise propagation is critical for many computer vision and robotic applications. Several probabilistic computer vision techniques require estimates of sensor noise after it has been propagated through one or many non-linear transformations. We investigate the unscented transform as an alternative to the standard linearisation technique for uncertainty propagation in a computer vision framework. An evaluation is performed using synthetic data for two common computer vision sensors, an RGB-D sensor and stereo camera pair. the unscented transform is shown to outperform linearisation when used to estimate distributions of reconstructed, 3D points from image features. Experimental results also indicate that the unscented transform is a viable replacement for linearisation when used in a probabilistic visual odometry framework.

关键词： computer vision

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：