In this paper we address the problem of building a good speech recognizer if there is only a small amount of training data available. the acoustic models can be improved by interpolation withthe well-trained models o...
详细信息
ISBN:
(纸本)3540408614
In this paper we address the problem of building a good speech recognizer if there is only a small amount of training data available. the acoustic models can be improved by interpolation withthe well-trained models of a second recognizer from a different application scenario. In our case, we interpolate a children's speech recognizer with a recognizer for adults' speech. Each hidden Markov model has its own set of interpolation partners;experiments were conducted with up to 50 partners. the interpolation weights are estimated automatically on a validation set using the EM algorithm. the word accuracy of the children's speech recognizer could be improved from 74.6% to 81.5%. this is a relative improvement of almost 10%.
the use of 3D-polar coordinate representations of the RGB colour space is widespread, although many of these representations, such as HLS and HSV, have deficiencies rendering them unsuitable for quantitative image ana...
详细信息
ISBN:
(纸本)3540408614
the use of 3D-polar coordinate representations of the RGB colour space is widespread, although many of these representations, such as HLS and HSV, have deficiencies rendering them unsuitable for quantitative image analysis. three prerequisites for 3D-polar coordinate colour spaces which do not suffer from these deficiencies are suggested, and the results of the derivation of three colour spaces based on these prerequisites are presented. An application which takes advantage of their good properties for the construction of colour histograms is also discussed.
In this paper the authors consider the problem monitoring of the drawn current by the main power rail that powers multiple consumers (servomechanisms, actuators, electromagnets, stepper or DC motors) present in cars, ...
In this paper the authors consider the problem monitoring of the drawn current by the main power rail that powers multiple consumers (servomechanisms, actuators, electromagnets, stepper or DC motors) present in cars, in order to protect the equipment present on board a car or to increase the reliability of electromechanical systems. the idea is to scan the supply current of a device connected to the main power rail using a very sensitive shunt and to store this data using a development board with a microcontroller and implementing maps of the currents that will be supplied to specific consumers. the board operates in two stages. the first stage involves memorizing the shape currents and storing the data in a memory zone. the second stage assists the current utilization of the equipment by using a software block to recognize the shape of the current. recognition of the current shape and classification of the current event into nominal or faulty operation will be done using byte oriented comparisons initially and using a variable error filter afterwards using "running average" which will be presented below. Preliminary results were done on a 16-bit microcontroller.
We present a system capable of visually detecting pointing gestures and estimating the 3D pointing direction in real-time. We use Hidden Markov Models (HMMs) trained on different phases of sample pointing gestures to ...
详细信息
ISBN:
(纸本)3540408614
We present a system capable of visually detecting pointing gestures and estimating the 3D pointing direction in real-time. We use Hidden Markov Models (HMMs) trained on different phases of sample pointing gestures to detect the occurrence of a gesture. For estimating the pointing direction, we compare two approaches: 1) the line of sight between head and hand and 2) the forearm orientation. Input features for the HMMs are the 3D trajectories of the person's head and hands. they are extracted from image sequences provided by a stereo camera. In a person-independent test scenario, our system achieved a gesture detection rate of 88%. For 90% of the detected gestures, the correct pointing target (one out of eight objects) was identified.
Ground target classification in high-resolution SAR data has become increasingly important over the years. Kernel machines like the Support Vector Machine (SVM) and the Relevance Vector Machine (RVM) afford a great ch...
详细信息
ISBN:
(纸本)3540408614
Ground target classification in high-resolution SAR data has become increasingly important over the years. Kernel machines like the Support Vector Machine (SVM) and the Relevance Vector Machine (RVM) afford a great chance to solve this problem. But it is not possible to customize these kernel machines. therefore the main objective of this work has been the development of a mechanism that controls the classification quality versus the computational effort. the investigations have been carried out with usage of the MSTAR public target dataset. the result of this work is an extended RVM, the RVMG. A single parameter is controlling the robustness of the system. the spectrum varies from a machine 15 times faster and of 10% lower quality than the SVM, goes to a 5 times faster and equal quality machine, and ends with a machine a little bit faster than the SVM and of better quality than the Lagrangian Support Vector Machine (***).
In this paper we analyse dissimilarity measures for probability distributions which are frequently used in the area of patternrecognition, image processing, -indexing and registration, amongst others. Namely chi(2), ...
详细信息
ISBN:
(纸本)3540408614
In this paper we analyse dissimilarity measures for probability distributions which are frequently used in the area of patternrecognition, image processing, -indexing and registration, amongst others. Namely chi(2), Jenson-Shannon divergence, Fidelity and Trace are discussed. We use those measures to tackle the task of recognising three dimensional objects from two dimensional images. the object reference model is defined by (several) feature distributions derived from multiple two dimensional views of each object. the experiments performed on the Columbia Object Image Library indicate that derivatives of Fidelity used as distance measures perform well in terms of recognition rate. If enough views can be provided for modelling (roughly one view per 20degrees-30degrees), up to a 100% recognition rate is achievable.
Face detection using components has been proved to produce superior results due to its robustness to occlusions and pose and illumination changes. A first level of processing is devoted to the detection of individual ...
详细信息
ISBN:
(纸本)3540408614
Face detection using components has been proved to produce superior results due to its robustness to occlusions and pose and illumination changes. A first level of processing is devoted to the detection of individual components, while a second level deals withthe fusion of the component detectors. However, the fusion methods investigated up to now neglect the uncertainties that characterize the component locations. We show that this uncertainty carries important information that, when exploited, leads to increased face localization accuracy. We discuss and compare possible solutions taking into account geometrical constraints. the efficiency and usefulness of the techniques are tested with both synthetic and real world examples.
One important and particularly challenging step in the optical character recognition of historical documents with complex layouts, such as newspapers, is the separation of text from non-text content (e.g. page borders...
详细信息
ISBN:
(纸本)9781728188089
One important and particularly challenging step in the optical character recognition of historical documents with complex layouts, such as newspapers, is the separation of text from non-text content (e.g. page borders or illustrations). this step is commonly referred to as page segmentation. While various rule-based algorithms have been proposed, the applicability of Deep Neural Networks for this task recently has gained a lot of attention. In this paper, we perform a systematic evaluation of 11 different published backbone architectures and 9 different tiling and scaling configurations for separating text, tables or table column lines. We also show the influence of the number of labels and the number of training pages on the segmentation quality, which we measure using the Matthews Correlation Coefficient. Our results show that (depending on the task) Inception-ResNetv2 and EfficientNet backbones work best, vertical tiling is generally preferable to other tiling approaches, and training data that comprises 30 to 40 pages will be sufficient most of the time.
We present a hierarchical partitioning of images using a pairwise similarity function on a graph-based representation of an image. this function measures the difference along the boundary of two components relative to...
详细信息
ISBN:
(纸本)3540408614
We present a hierarchical partitioning of images using a pairwise similarity function on a graph-based representation of an image. this function measures the difference along the boundary of two components relative to a measure of differences of component's internal differences. this definition attempts to encapsulate the intuitive notion of contrast. Two components are merged if there is a low-cost connection between them. Each component's internal difference is represented by the maximum edge weight of its minimum spanning tree. External differences are the cheapest weight of edges connecting components. We use this idea to find region borders quickly and effortlessly in a bottom-up 'stimulus-driven' way based on local differences in a specific feature, like as in preattentive vision. the components are merged ignoring the details in regions of high-variability, and preserving the details in low-variability ones.
Automatic unloading of piled boxes of unknown dimensions is undoubtedly of great importance to the industry. In this contribution a system addressing this problem is described: a laser range finder mounted on the hand...
详细信息
ISBN:
(纸本)3540408614
Automatic unloading of piled boxes of unknown dimensions is undoubtedly of great importance to the industry. In this contribution a system addressing this problem is described: a laser range finder mounted on the hand of an industrial robot is used for data acquisition. A vacuum gripper, mounted as well on the robot hand is employed from grasping the objects from their exposed surfaces. We localize the exposed surfaces of the objects via a hypothesis generation and verification framework. Accurate hypotheses about the pose and the dimensions of the boundary of the exposed surfaces are generated from edge information obtained from the input range image, using a variation of the Hough transform. Hypothesis verification is robustly performed using the range points inside the hypothesized boundary. Our system shows a variety of advantages such like computational efficiency accuracy and robustness, the combination of which cannot be found in existing approaches.
暂无评论