the registration of 3D models by a Euclidean transformation is a fundamental task at the core of many application in computervision. this problem is non- convex due to the presence of rotational constraints, making t...
详细信息
ISBN:
(纸本)9781538604571
the registration of 3D models by a Euclidean transformation is a fundamental task at the core of many application in computervision. this problem is non- convex due to the presence of rotational constraints, making traditional local optimization methods prone to getting stuck in local minima. this paper addresses finding the globally optimal transformation in various 3D registration problems by a unified formulation that integrates common geometric registration modalities (namely point-to-point, point-to-line and point-to-plane). this formulation renders the optimization problem independent of boththe number and nature of the correspondences. the main novelty of our proposal is the introduction of a strengthened Lagrangian dual relaxation for this problem, which surpasses previous similar approaches [32] in effectiveness. In fact, even though with no theoretical guarantees, exhaustive empirical evaluation in both synthetic and real experiments always resulted on a tight relaxation that allowed to recover a guaranteed globally optimal solution by exploiting duality theory. thus, our approach allows for effectively solving the 3D registration with global optimality guarantees while running at a fraction of the time for the state-of-the-art alternative [34], based on a more computationally intensive Branch and Bound method.
In dynamic object detection, it is challenging to construct an effective model to sufficiently characterize the spatial-temporal properties of the background. this paper proposes a new Spatio-Temporal Self-Organizing ...
详细信息
ISBN:
(纸本)9781538604571
In dynamic object detection, it is challenging to construct an effective model to sufficiently characterize the spatial-temporal properties of the background. this paper proposes a new Spatio-Temporal Self-Organizing Map (STSOM) deep network to detect dynamic objects in complex scenarios. the proposed approach has several contributions: First, a novel STSOM shared by all pixels in a video frame is presented to efficiently model complex background. We exploit the fact that the motions of complex background have the global variation in the space and the local variation in the time, to train STSOM using the whole frames and the sequence of a pixel over time to tackle the variance of complex background. Second, a Bayesian parameter estimation based method is presented to learn thresholds automatically for all pixels to filter out the background. Last, in order to model the complex background more accurately, we extend the single-layer STSOM to the deep network. then the background is filtered out layer by layer. Experimental results on CDnet 2014 dataset demonstrate that the proposed STSOM deep network outperforms numerous recently proposed methods in the overall performance and in most categories of scenarios.
Over the past few years, softmax and SGD have become a commonly used component and the default training strategy in CNN frameworks, respectively. However, when optimizing CNNs with SGD, the saturation behavior behind ...
详细信息
ISBN:
(纸本)9781538604571
Over the past few years, softmax and SGD have become a commonly used component and the default training strategy in CNN frameworks, respectively. However, when optimizing CNNs with SGD, the saturation behavior behind softmax always gives us an illusion of training well and then is omitted. In this paper, we first emphasize that the early saturation behavior of softmax will impede the exploration of SGD, which sometimes is a reason for model converging at a bad local-minima, then propose Noisy Softmax to mitigating this early saturation issue by injecting annealed noise in softmax during each iteration. this operation based on noise injection aims at postponing the early saturation and further bringing continuous gradients propagation so as to significantly encourage SGD solver to be more exploratory and help to find a better local-minima. this paper empirically verifies the superiority of the early softmax desaturation, and our method indeed improves the generalization ability of CNN model by regularization. We experimentally find that this early desaturation helps optimization in many tasks, yielding state-of-the-art or competitive results on several popular benchmark datasets.
the 3D shapes of faces are well known to be discriminative. Yet despite this, they are rarely used for face recognition and always under controlled viewing conditions. We claim that this is a symptom of a serious but ...
详细信息
ISBN:
(纸本)9781538604571
the 3D shapes of faces are well known to be discriminative. Yet despite this, they are rarely used for face recognition and always under controlled viewing conditions. We claim that this is a symptom of a serious but often overlooked problem with existing methods for single view 3D face reconstruction: when applied "in the wild", their 3D estimates are either unstable and change for different photos of the same subject or they are over-regularized and generic. In response, we describe a robust method for regressing discriminative 3D morphable face models (3DMM). We use a convolutional neural network (CNN) to regress 3DMM shape and texture parameters directly from an input photo. We overcome the shortage of training data required for this purpose by offering a method for generating huge numbers of labeled examples. the 3D estimates produced by our CNN surpass state of the art accuracy on the MICC data set. Coupled with a 3D-3D face matching pipeline, we show the first competitive face recognition results on the LFW, YTF and IJB-A benchmarks using 3D face shapes as representations, rather than the opaque deep feature vectors used by other modern systems.
Past research on facial expressions have used relatively limited datasets, which makes it unclear whether current methods can be employed in real world. In this paper, we present a novel database, RAF-DB, which contai...
详细信息
ISBN:
(纸本)9781538604571
Past research on facial expressions have used relatively limited datasets, which makes it unclear whether current methods can be employed in real world. In this paper, we present a novel database, RAF-DB, which contains about 30000 facial images from thousands of individuals. Each image has been individually labeled about 40 times, then EM algorithm was used to filter out unreliable labels. Crowdsourcing reveals that real-world faces often express compound emotions, or even mixture ones. For all we know, RAF-DB is the first database that contains compound expressions in the wild. Our cross-database study shows that the action units of basic emotions in RAF-DB are much more diverse than, or even deviate from, those of lab-controlled ones. To address this problem, we propose a new DLP-CNN (Deep Locality-Preserving CNN) method, which aims to enhance the discriminative power of deep features by preserving the locality closeness while maximizing the inter-class scatters. the benchmark experiments on the 7-class basic expressions and 11-class compound expressions, as well as the additional experiments on SFEW and CK+ databases, show that the proposed DLP-CNN outperforms the state-of-the-art handcrafted features and deep learning based methods for the expression recognition in the wild.
Multi-label image classification is a fundamental but challenging task in computervision. Great progress has been achieved by exploiting semantic relations between labels in recent years. However, conventional approa...
详细信息
ISBN:
(纸本)9781538604571
Multi-label image classification is a fundamental but challenging task in computervision. Great progress has been achieved by exploiting semantic relations between labels in recent years. However, conventional approaches are unable to model the underlying spatial relations between labels in multi-label images, because spatial annotations of the labels are generally not provided. In this paper, we propose a unified deep neural network that exploits both semantic and spatial relations between labels with only image-level supervisions. Given a multi-label image, our proposed Spatial Regularization Network (SRN) generates attention maps for all labels and captures the underlying relations between them via learnable convolutions. By aggregating the regularized classification results with original results by a ResNet-101 network, the classification performance can be consistently improved. the whole deep neural network is trained end-to-end with only image-level annotations, thus requires no additional efforts on image annotations. Extensive evaluations on 3 public datasets with different types of labels show that our approach significantly outperforms state-of-the-arts and has strong generalization capability. Analysis of the learned SRN model demonstrates that it can effectively capture both semantic and spatial relations of labels for improving classification performance.
Pipelines to recognize 3D objects despite clutter and occlusions usually end up with a final verification stage whereby recognition hypotheses are validated or dismissed based on how well they explain sensor measureme...
详细信息
Pipelines to recognize 3D objects despite clutter and occlusions usually end up with a final verification stage whereby recognition hypotheses are validated or dismissed based on how well they explain sensor measurements. Unlike previous work, we propose a Global Hypothesis Verification (GHV) approach which regards all hypotheses jointly so as to account for mutual interactions. GHV provides a principled framework to tackle the complexity of our visual world by leveraging on a plurality of recognition paradigms and cues. Accordingly, we present a 3D object recognition pipeline deploying both global and local 3D features as well as shape and color. thereby, and facilitated by the robustness of the verification process, diverse object hypotheses can be gathered and weak hypotheses need not be suppressed too early to trade sensitivity for specificity. Experiments demonstrate the effectiveness of our proposal, which significantly improves over the state-of-art and attains ideal performance (no false negatives, no false positives) on three out of the six most relevant and challenging benchmark datasets.
Distributed algorithms have recently gained immense popularity. With regards to computervision applications, distributed multi-target tracking in a camera network is a fundamental problem. the goal is for all cameras...
详细信息
Distributed algorithms have recently gained immense popularity. With regards to computervision applications, distributed multi-target tracking in a camera network is a fundamental problem. the goal is for all cameras to have accurate state estimates for all targets. Distributed estimation algorithms work by exchanging information between sensors that are communication neighbors. vision-based distributed multi-target state estimation has at least two characteristics that distinguishes it from other applications. First, cameras are directional sensors and often neighboring sensors may not be sensing the same targets, i.e., they are naive with respect to that target. Second, in the presence of clutter and multiple targets, each camera must solve a data association problem. this paper presents an information-weighted, consensus-based, distributed multi-target tracking algorithm referred to as the Multi-target Information Consensus (MTIC) algorithm that is designed to address boththe naivety and the data association problems. It converges to the centralized minimum mean square error estimate. the proposed MTIC algorithm and its extensions to non-linear camera models, termed as the Extended MTIC (EMTIC), are robust to false measurements and limited resources like power, bandwidth and the realtime operational requirements. Simulation and experimental analysis are provided to support the theoretical results.
the deep two-stream architecture [23] exhibited excellent performance on video based action recognition. the most computationally expensive step in this approach comes from the calculation of optical flow which preven...
详细信息
ISBN:
(纸本)9781467388511
the deep two-stream architecture [23] exhibited excellent performance on video based action recognition. the most computationally expensive step in this approach comes from the calculation of optical flow which prevents it to be real-time. this paper accelerates this architecture by replacing optical flow with motion vector which can be obtained directly from compressed videos without extra calculation. However, motion vector lacks fine structures, and contains noisy and inaccurate motion patterns, leading to the evident degradation of recognition performance. Our key insight for relieving this problem is that optical flow and motion vector are inherent correlated. Transferring the knowledge learned with optical flow CNN to motion vector CNN can significantly boost the performance of the latter. Specifically, we introduce three strategies for this, initialization transfer, supervision transfer and their combination. Experimental results show that our method achieves comparable recognition performance to the state-of-the-art, while our method can process 390.7 frames per second, which is 27 times faster than the original two-stream method.
Accurate sensor noise propagation is critical for many computervision and robotic applications. Several probabilistic computervision techniques require estimates of sensor noise after it has been propagated through ...
详细信息
ISBN:
(纸本)9781509033355
Accurate sensor noise propagation is critical for many computervision and robotic applications. Several probabilistic computervision techniques require estimates of sensor noise after it has been propagated through one or many non-linear transformations. We investigate the unscented transform as an alternative to the standard linearisation technique for uncertainty propagation in a computervision framework. An evaluation is performed using synthetic data for two common computervision sensors, an RGB-D sensor and stereo camera pair. the unscented transform is shown to outperform linearisation when used to estimate distributions of reconstructed, 3D points from image features. Experimental results also indicate that the unscented transform is a viable replacement for linearisation when used in a probabilistic visual odometry framework.
暂无评论