This paper introduces the theory of factor analysis of the mixture of Auto-Associative Neural Networks (AANNs) with application in speaker verification. First, we formulate the problem of learning a low-dimensional su...
详细信息
We present a new approach of using Auto-Associative Neural Networks (AANNs) in the conventional GMM speaker verification framework with i-vector feature extraction and PLDA modeling. In this technique, an i-vector fea...
详细信息
We introduce a new approach to training multilayer perceptrons (MLPs) for large vocabulary continuous speech recognition (LVCSR) in new languages which have only few hours of annotated in-domain training data (for exa...
详细信息
We introduce a new approach to training multilayer perceptrons (MLPs) for large vocabulary continuous speech recognition (LVCSR) in new languages which have only few hours of annotated in-domain training data (for example, 1 hour of data). In our approach, large amounts of annotated out-of-domain data from multiple languages are used to train multilingual MLP systems without dealing with the different phoneme sets for these languages. Features extracted from these MLP systems are used to train LVCSR systems in the low-resource language similar to the Tandem approach. In our experiments, the proposed features provide a relative improvement of about 30% in an low-resource LVCSR setting with only one hour of training data.
In the real world, natural conversational speech is an amalgam of speech segments, silences and environmental/ background and channel effects. Labeling the different regions of an acoustic signal according to their in...
详细信息
In the real world, natural conversational speech is an amalgam of speech segments, silences and environmental/ background and channel effects. Labeling the different regions of an acoustic signal according to their information levels would greatly benefit all automatic speechprocessing tasks. In the current work, we propose a novel segmentation approach based on a perception-based measure of speech intelligibility. Unlike segmentation approaches based on various forms of voice-activity detection (VAD), the proposed parsing approach exploits higher-level perceptual information about signal intelligibility levels. This labeling information is integrated into a novel multilevel framework for automatic speaker recognition task. The system processes the input acoustic signal along independent streams reflecting various levels of intelligibility and then fusing the decision scores from the multiple steams according to their intelligibility contribution. Our results show that the proposed system achieves significant improvements over standard baseline and VAD-based approaches, and attains a performance similar to the one obtained with oracle speech segmentation information.
We present a system for recognizing online mathematical expressions (ME). Symbol recognition is based on a template elastic matching distance between pen direction features. The structural analysis of the ME is based ...
详细信息
We present a system for recognizing online mathematical expressions (ME). Symbol recognition is based on a template elastic matching distance between pen direction features. The structural analysis of the ME is based on extracting the baseline of the ME and then classifying symbols into levels above and below the baseline. The symbols are then sequentially analyzed using six spatial relations and a respective 2d structure is processed to give the resulting MathML representation of the ME. The system was evaluated on the Competition on Recognition of Online Handwritten Mathematical Expressions (CROHME) 2011 datasets and demonstrates promising results.
Document image binarization is an initial though critical stage towards the recognition of the text components of a document. This paper describes an efficient method based on mathematical morphology for extracting te...
详细信息
Document image binarization is an initial though critical stage towards the recognition of the text components of a document. This paper describes an efficient method based on mathematical morphology for extracting text regions from degraded handwritten document images. The basic stages of our approach are: (a) top-hat-by-reconstruction to produce a filtered image with reasonable even background, (b) region growing starting from a set of seed points and attaching to each seed similar intensity neighboring pixels and (c) conditional extension of the initially detected text regions based on the values of the second derivative of the filtered image. The method was evaluated on the benchmarking dataset of the International Document Image Binarization Contest (DIBCO 2011) and show promising results.
In this paper we review the progress in the design of low-complexity digital correction structures and algorithms for time-interleaved ADCs over the last five years. We devise a discrete-time model, state the design p...
详细信息
In this paper we review the progress in the design of low-complexity digital correction structures and algorithms for time-interleaved ADCs over the last five years. We devise a discrete-time model, state the design problem, and finally derive the algorithms and structures. In particular, we discuss efficient algorithms to design time-varying correction filters as well as iterative structures utilizing polynomial based filters. Finally, we give an outlook to future research questions.
Delimiting the most informative voice segments of an acoustic signal is often a crucial initial step for any speechprocessing system. In the current work, we propose a novel segmentation approach based on a perceptio...
详细信息
We address the memory problem of maximum entropy language models(MELM) with very large feature sets. Randomized techniques are employed to remove all large, exact data structures in MELM implementations. To avoid the ...
详细信息
We propose an efficient way to train maximum entropy language models (MELM) and neural network language models (NNLM). The advantage of the proposed method comes from a more robust and efficient subsampling technique....
详细信息
暂无评论