In this paper we propose to use lexical semantic networks to extend the state-of-the-art object recognition techniques. We use the semantics of image labels to integrate prior knowledge about inter-class relationships...
详细信息
In this paper we propose to use lexical semantic networks to extend the state-of-the-art object recognition techniques. We use the semantics of image labels to integrate prior knowledge about inter-class relationships into the visual appearance learning. We show how to build and train a semantic hierarchy of discriminative classifiers and how to use it to perform object detection. We evaluate how our approach influences the classification accuracy and speed on the Pascal VOC challenge 2006 dataset, a set of challenging real-world images. We also demonstrate additional features that become available to object recognition due to the extension with semantic inference tools- we can classify high-level categories, such as animals, and we can train part detectors, for example a window detector, by pure inference in the semantic network.
A major shortcoming of discriminative recognition and detection methods is their noise sensitivity, both during training and recognition. This may lead to very sensitive and brittle recognition systems focusing on irr...
详细信息
A major shortcoming of discriminative recognition and detection methods is their noise sensitivity, both during training and recognition. This may lead to very sensitive and brittle recognition systems focusing on irrelevant information. This paper proposes a method that selects generative and discriminative features. In particular, we boost classical Haar-like features and use the same features to approximate a generative model (i.e., eigenimages). A modified error function for boosting ensures that only features are selected that show a good discrimination and reconstruction. This allows a robust feature selection using boosting. Thus, we can handle problems where discriminant classifiers fail while still retaining the discriminative power. Our experiments show that we can significantly improve the recognition performance when learning from noisy data. Moreover, the feature type used allows efficient recognition and reconstruction.
We present an unsupervised method for learning a hierarchy of sparse feature detectors that are invariant to small shifts and distortions. The resulting feature extractor consists of multiple convolution filters, foll...
详细信息
We present an unsupervised method for learning a hierarchy of sparse feature detectors that are invariant to small shifts and distortions. The resulting feature extractor consists of multiple convolution filters, followed by a feature-pooling layer that computes the max of each filter output within adjacent windows, and a point-wise sigmoid non-linearity. A second level of larger and more invariant features is obtained by training the same algorithm on patches of features from the first level. Training a supervised classifier on these features yields 0.64% error on MNIST, and 54% average recognition rate on Caltech 101 with 30 training samples per category. While the resulting architecture is similar to convolutional networks, the layer-wise unsupervised training procedure alleviates the over-parameterization problems that plague purely supervised learning procedures, and yields good performance with very few labeled training samples.
This paper presents the hardware implementation of a stereo vision core algorithm, that runs in real-time and is targeted at automotive applications. The algorithm is based on the sum of absolute differences (SAD) and...
详细信息
This paper presents the hardware implementation of a stereo vision core algorithm, that runs in real-time and is targeted at automotive applications. The algorithm is based on the sum of absolute differences (SAD) and computes the disparity map using 320 times 240 input images with a maximum disparity of 100 pixels. The hardware operates at a frequency of 65 MHz and achieves a frame rate of 425 fps by calculating the data highly parallel and pipelined. Thus an implemented and basically optimized software solution, running on an Intel Pentium 4 with 3 GHz clock frequency is 166 times outperformed.
In this paper, a projection model is presented for cameras moving at constant velocity (which we refer to as Galilean cameras). To that end, we introduce the concept of spacetime projection and show that perspective i...
详细信息
In this paper, a projection model is presented for cameras moving at constant velocity (which we refer to as Galilean cameras). To that end, we introduce the concept of spacetime projection and show that perspective imaging and linear pushbroom imaging are specializations of the proposed model. The epipolar geometry between two such cameras is developed and we derive the Galilean fundamental matrix. We show how six different "fundamental" matrices can be directly recovered from the Galilean fundamental matrix including the classic fundamental matrix, the linear pushbroom (LP) fundamental matrix and a fundamental matrix relating epipolar plane images (EPIs). To estimate the parameters of this fundamental matrix and the mapping between videos in the case of planar scenes we describe linear algorithms and report experimental performance of these algorithms.
This paper presents our progress on OpenVL -a novel software architecture to address efficiency through facilitating hardware acceleration, reusability and scalability for computervision. A logical image understandin...
详细信息
This paper presents our progress on OpenVL -a novel software architecture to address efficiency through facilitating hardware acceleration, reusability and scalability for computervision. A logical image understanding pipeline is introduced to allow parallel processing. As well, we discuss our middleware -VLUT that enables applications to operate transparently over a heterogeneous collection of hardware implementations. OpenVL works as a state machine, with an event-driven mechanism to provide users with application-level interaction. Various explicit or implicit synchronization and communication methods are supported among distributed processes in the logical pipelines. The intent of OpenVL is to allow users to quickly and easily recover useful information from multiple scenes across various software environments and hardware platforms. We implement two different human tracking systems to validate the critical underlying concepts of OpenVL.
In this paper we derive differential equations for evolving radial basis functions (RBFs) to solve segmentation problems. The differential equations result from applying variational calculus to energy functionals desi...
详细信息
In this paper we derive differential equations for evolving radial basis functions (RBFs) to solve segmentation problems. The differential equations result from applying variational calculus to energy functionals designed for image segmentation. Our methodology supports evolution of all parameters of each RBF, including its position, weight, orientation, and anisotropy, if present. Our framework is general and can be applied to numerous RBF interpolants. The resulting approach retains some of the ideal features of implicit active contours, like topological adaptivity, while requiring low storage overhead due to the sparsity of our representation, which is an unstructured list of RBFs. We present the theory behind our technique and demonstrate its usefulness for image segmentation.
We present DlGlTABLE, an experimental platform we hope lessen the gap between co-present and distant interaction. DlGlTABLE is combining a multiuser tactile interactive tabletop, a video-communication system enabling ...
详细信息
We present DlGlTABLE, an experimental platform we hope lessen the gap between co-present and distant interaction. DlGlTABLE is combining a multiuser tactile interactive tabletop, a video-communication system enabling eye-contact with real size distant user visualization and a spatialized sound system for speech transmission. Based on a robust computervision module, it provides a fluid gesture visualization of each distant participant whether he/she is moving virtual digital objects or is intending to do so. Remote gesture visualization contributes to the efficiency of distant collaboration tasks because it enables the coordination among participant's actions and talk. Our main contribution addresses the development and the integration of robust and real time projector-camera processing techniques in computer supported cooperative work.
We present a new formulation to multi-view stereo that treats the problem as probabilistic 3D segmentation. Previous work has used the stereo photo-consistency criterion as a detector of the boundary between the 3D sc...
详细信息
We present a new formulation to multi-view stereo that treats the problem as probabilistic 3D segmentation. Previous work has used the stereo photo-consistency criterion as a detector of the boundary between the 3D scene and the surrounding empty space. Here we show how the same criterion can also provide a foreground/background model that can predict if a 3D location is inside or outside the scene. This model replaces the commonly used naive foreground model based on ballooning which is known to perform poorly in concavities. We demonstrate how the probabilistic visibility is linked to previous work on depth-map fusion and we present a multi-resolution graph-cut implementation using the new ballooning term that is very efficient both in terms of computation time and memory requirements.
In precision engineering scanners are widely used for laser beam positioning. Equipped with cameras, scanners enable process monitoring or even position recognition of the parts to be welded. To allow precise welding ...
详细信息
ISBN:
(纸本)9780912035888
In precision engineering scanners are widely used for laser beam positioning. Equipped with cameras, scanners enable process monitoring or even position recognition of the parts to be welded. To allow precise welding or position recognition, it is essential to calibrate a welding system. Instead of calibrating the whole system, most approaches only help to adjust the laser beam position. Consequently, the varying lateral offset between the laser's focus point and the camera's field of view, due to chromatic aberration of the scanner optics, cannot be compensated. Furthermore, these approaches require manual microscopic measurement of weld seams, which comes along with several downsides. This paper proposes two techniques for automatic calibration without these downsides by use of the system-incorporated camera. The first technique is the calibration at laser wavelength. To this end, the system automatically creates laser spots, evaluates their positions and possible offsets and finally fits an affine model for compensation. The second technique is based on a specially coded test pattern, which is used for calibration at camera wavelengths. Experimental results confirm the accuracy of the calibration obtained.
暂无评论