Localization of objects in cluttered scenes with machine learning methods is a fairly young research area. Despite the high potential of object localization for full process automation in Industry 4.0 and logistical e...
详细信息
ISBN:
(纸本)9781728188089
Localization of objects in cluttered scenes with machine learning methods is a fairly young research area. Despite the high potential of object localization for full process automation in Industry 4.0 and logistical environments, 3D data sets for such applications to train machine learning models are not openly available and only few publications have been made on that topic. To the authors knowledge, this is the first publication that describes a self-supervised and fully automated deep learning approach for object pose estimation using simulated 3D data. the solution covers the simulated generation of training data, the detection of objects in point clouds using a fully convolutional voting network and the computation of the pose for each detected object instance.
Machine learning techniques have excelled in the automatic semantic analysis of images, reaching human-level performances on challenging benchmarks. Yet, the semantic analysis of videos remains challenging due to the ...
详细信息
ISBN:
(纸本)9781728188089
Machine learning techniques have excelled in the automatic semantic analysis of images, reaching human-level performances on challenging benchmarks. Yet, the semantic analysis of videos remains challenging due to the significantly higher dimensionality of the input data, respectively, the significantly higher need for annotated training examples. By studying the automatic recognition of german sign language videos, we demonstrate that on the relatively scarce training data of 2.800 videos, modern deep learning architectures for video analysis (such as ResNeXt) along with transfer learning on large gesture recognition tasks, can achieve about 75% character accuracy. Considering that this leaves us with a probability of under 25% that a 5 letter word is spelled correctly, spell-correction systems are crucial for producing readable outputs. the contribution of this paper is to propose a convolutional neural network for spell-correction that expects the softmax outputs of the character recognition network (instead of a misspelled word) as an input. We demonstrate that purely learning on softmax inputs in combination with scarce training data yields overfitting as the network learns the inputs by heart. In contrast, training the network on several variants of the logits of the classification output i.e. scaling by a constant factor, adding of random noise, mixing of softmax and hardmax inputs or purely training on hardmax inputs, leads to better generalization while benefitting from the significant information hidden in these outputs (that have 98% top-5 accuracy), yielding a readable text despite the comparably low character accuracy.
this paper presents a reliable coin recognition system that is based on a registration approach. To optimally align two coins we search for a rotation in order to reach a maximal number of colinear gradient vectors. T...
详细信息
ISBN:
(纸本)9783540749332
this paper presents a reliable coin recognition system that is based on a registration approach. To optimally align two coins we search for a rotation in order to reach a maximal number of colinear gradient vectors. the gradient magnitude is completely neglected. After a quantization of the gradient directions the computation of the induced similarity measure can be done efficiently in the Fourier domain. the classification is realized with a simple nearest neighbor classification scheme followed by several rejection criteria to meet the demand of a low false positive rate.
this paper considers recognizing texts shown in a source language and translating into a target language, without generating the intermediate source language text image recognition results. We call this problem Cross-...
详细信息
ISBN:
(纸本)9781728188089
this paper considers recognizing texts shown in a source language and translating into a target language, without generating the intermediate source language text image recognition results. We call this problem Cross-Lingual Text Image recognition (CLTIR). To solve this problem, we propose a multi-task system containing a main task of CLTIR and an auxiliary task of Mono-Lingual Text Image recognition (MLTIR) simultaneously. Two different sequence to sequence learning methods, a convolution based attention model and a Bidirectional Long Short-Term Memory (BLSTM) model with Connectionist Temporal Classification (CTC), are adopted for these tasks respectively. We evaluate the system on a newly collected Chinese-English bilingual movie subtitle image dataset. Experimental results demonstrate the multi-task learning framework performs superiorly in both languages.
We present a method for 3D object modeling and recognition which is robust to scale and illumination changes, and to viewpoint variations. the object model is derived from the local features extracted and tracked on a...
详细信息
ISBN:
(纸本)3540444122
We present a method for 3D object modeling and recognition which is robust to scale and illumination changes, and to viewpoint variations. the object model is derived from the local features extracted and tracked on an image sequence of the object. the recognition phase is based on an SVM classifier. We analyse in depth all the crucial steps of the method, and report very promising results on a dataset of 11 objects, that show how the method is also tolerant to occlusions and moderate scene clutter.
Facial cosmetics have the ability to substantially alter the facial appearance, which can negatively affect the decisions of a face recognition. In addition, it was recently shown that the application of makeup can be...
详细信息
ISBN:
(纸本)9781728188089
Facial cosmetics have the ability to substantially alter the facial appearance, which can negatively affect the decisions of a face recognition. In addition, it was recently shown that the application of makeup can be abused to launch so-called makeup presentation attacks. In such attacks, the attacker might apply heavy makeup in order to achieve the facial appearance of a target subject for the purpose of impersonation. In this work, we assess the vulnerability of a COTS face recognition system to makeup presentation attacks employing the publicly available Makeup Induced Face Spooling (MIFS) database. It is shown that makeup presentation attacks might seriously impact the security of the face recognition system. Further, we propose an attack detection scheme which distinguishes makeup presentation attacks from genuine authentication attempts by analysing differences in deep face representations obtained from potential makeup presentation attacks and corresponding target face images. the proposed detection system employs a machine learning-based classifier, which is trained with synthetically generated makeup presentation attacks utilizing a generative adversarial network for facial makeup transfer in conjunction with image warping. Experimental evaluations conducted using the MIFS database reveal a detection equal error rate of 0.7% for the task of separating genuine authentication attempts from makeup presentation attacks.
Recognizing categories of articulated objects in real-world scenarios is a challenging problem for today's vision algorithms. Due to the large appearance changes and intra-class variability of these objects, it is...
详细信息
ISBN:
(纸本)3540444122
Recognizing categories of articulated objects in real-world scenarios is a challenging problem for today's vision algorithms. Due to the large appearance changes and intra-class variability of these objects, it is hard to define a model, which is both general and discriminative enough to capture the properties of the category. In this work, we propose an approach, which aims for a suitable trade-off for this problem. On the one hand, the approach is made more discriminant by explicitly distinguishing typical object shapes. On the other hand, the method generalizes well and requires relatively few training samples by cross-articulation learning. the effectiveness of the approach is shown and compared to previous approaches on two datasets containing pedestrians with different articulations.
In this paper a method is proposed that identifies bone positions and fine structure of bone contours in radiographs by combining active shape models (ASM) and active contours (snakes) resulting in high accuracy and s...
详细信息
ISBN:
(纸本)3540408614
In this paper a method is proposed that identifies bone positions and fine structure of bone contours in radiographs by combining active shape models (ASM) and active contours (snakes) resulting in high accuracy and stability. After a coarse estimate of the bone position has been determined by neural nets, an approximation of the contour is obtained by an active shape model. the accuracy of the landmarks and the contour in between is enhanced by applying an iterative active contour algorithm to a set of gray value profiles extracted orthogonally to the interpolation obtained by the ASM. the neural nets obtain knowledge about visual appearance as well as anatomical configuration during a training phase. the active shape model is trained with a set of training shapes, whereas the snake detects the contour with fewer constraints and decreases the influence of a priori knowledge in a controlled manner. this is of particular importance for the assessment of pathological changes of bones like erosive destructions caused by rheumatoid arthritis.
We present a novel model for object recognition and detection that follows the widely adopted assumption that objects in images can be represented as a set of loosely coupled parts. In contrast to former models, the p...
详细信息
ISBN:
(纸本)3540444122
We present a novel model for object recognition and detection that follows the widely adopted assumption that objects in images can be represented as a set of loosely coupled parts. In contrast to former models, the presented method can cope with an arbitrary number of object parts. Here, the object parts are modelled by image patches that are extracted at each position and then efficiently stored in a histogram. In addition to the patch appearance, the positions of the extracted patches are considered and provide a significant increase in the recognition performance. Additionally, a new and efficient histogram comparison method taking into account inter-bin similarities is proposed. the presented method is evaluated for the task of radiograph recognition where it achieves the best result published so far. Furthermore it yields very competitive results for the commonly used Caltech object detection tasks.
In this article, we present an approach to detect basic movements of cyclists in real world traffic situations based on image sequences, optical flow (OF) sequences, and past positions using a multi-stream 3D convolut...
详细信息
ISBN:
(纸本)9781728188089
In this article, we present an approach to detect basic movements of cyclists in real world traffic situations based on image sequences, optical flow (OF) sequences, and past positions using a multi-stream 3D convolutional neural network (3D-ConvNet) architecture. To resolve occlusions of cyclists by other traffic participants or road structures, we use a wide angle stereo camera system mounted at a heavily frequented public intersection. We created a large dataset consisting of 1,639 video sequences containing cyclists, recorded in real world traffic, resulting in over 1.1 million samples. through modeling the cyclists' behavior by a state machine of basic cyclist movements, our approach takes every situation into account and is not limited to certain scenarios. We compare our method to an approach solely based on position sequences. Both methods are evaluated taking into account frame wise and scene wise classification results of basic movements, and detection times of basic movement transitions, where our approach outperforms the position based approach by producing more reliable detections with shorter detection times. Our code and parts of our dataset are made publicly available.
暂无评论