Identifying and classifying personal, geographic, institutional or other names in a text is an important task for numerous applications. This paper describes and evaluates a language-independent bootstrapping algorith...
详细信息
This paper presents a novel method of generating and applying hierarchical, dynamic topic-based language models. It proposes and evaluates new cluster generation, hierarchical smoothing and adaptive topic-probability ...
详细信息
This paper describes and extensively evaluates a system for the automatic routing of submitted papers to reviewers and area committees, without the need for any human annotation from the reviewers or the program chair...
详细信息
Resnik and Yarowsky (1997) made a set of observations about the state-of-the-art in automatic word sense disambiguation and, motivated by those observations, offered several specific proposals regarding improved evalu...
The problem of blindly separating signal mixtures with fewer mixture components than independent signal sources is mathematically ill-defined, and requires suitable prior information on the nature of the sources. Rece...
详细信息
The problem of blindly separating signal mixtures with fewer mixture components than independent signal sources is mathematically ill-defined, and requires suitable prior information on the nature of the sources. Recently, it has been shown that sparse methods for function approximation using a Laplacian prior can be effective, but the method fails to separate a single mixture without further prior information. Other techniques track harmonics, but assume separability in the time-frequency domain. We show that a measure of temporal and spectral coherence provides an effective cue for separating independent acoustical or sonar sources, in the absence of spatial cues in the monaural case. The technique is shown to successfully separate single mixtures of sources with significant spectral overlap.
This paper addresses the problem of obtaining photo-realistic 3D models of a scene from images alone with a structure-from-motion approach. The 3D scene is observed from multiple viewpoints by freely moving a camera a...
详细信息
This paper addresses the problem of obtaining photo-realistic 3D models of a scene from images alone with a structure-from-motion approach. The 3D scene is observed from multiple viewpoints by freely moving a camera around the object. No restrictions on camera movement and interval camera parameters like zoom are imposed, as the camera pose and intrinsic parameters are calibrated from the sequence. The only restrictions on the scene content are the rigidity of the scene objects and opaque, piecewise smooth object surfaces. The approach operates independently of object scale and requires only a single low-cost consumer photo or video camera. The modeling system described here uses a three-step approach. First, the camera pose and intrinsic parameters are calibrated on-line by tracking salient feature points between the different views. Next, consecutive images of the sequence are treated as stereoscopic image pairs and dense correspondence maps are computed by area matching. Finally, dense and accurate depth maps are computed by linking together all correspondences over the viewpoints. The depth maps are converted to triangular surfaces meshes that are texture mapped for photo-realistic appearance. The resulting surface models are stored in VRML-format for easy exchange and visualization. The feasibility of the approach has been tested extensively and will be illustrated on several real scenes. In particular we will demonstrate the generation of realistic 3D models for a virtual exhibition of the archaeological excavation site in Sagalassos, Turkey.
Gopalakrishnan et al [1] described a method called "growth transform" to optimize rational functions over a domain, which has been found useful to train discriminatively Hidden Markov Models(HMM) in speech r...
详细信息
Modeling FO contours of arbitrarily long and complex sentences of the Greek language may prove to be a difficult task if one considers the various parameters involved, namely focus, position of the prominent vowel wit...
详细信息
作者:
R.I. DamperImage
Speech and Intelligent Systems ISIS Research Group Department of Electronics and Computer Science University of Southampton Southampton UK Center for Spoken Language Understanding
Department of Computer Science and Engineering Oregon Graduate Institute of Science and Technology Portland OR USA
Important aspects of the voiced/unvoiced categorization of synthetic syllable-initial stop consonants are reproduced by a two stage biocybernetic simulation of the auditory system. This behavior is emergent - it is no...
详细信息
Important aspects of the voiced/unvoiced categorization of synthetic syllable-initial stop consonants are reproduced by a two stage biocybernetic simulation of the auditory system. This behavior is emergent - it is not explicitly programmed into the model - and no fine timing information is necessary. Unlike real (human and animal) listeners, the computational auditory model can be systematically manipulated and probed to determine the basis of its behavior. This reveals the importance of the region of first formant onset to the perception of voicing for these stimuli. Spectral analysis of this region in the raw stimuli show that processing by the first stage of the model, mimicking the functions of the peripheral auditory system, is not essential to the observed behavior. Thus, in this case at least, the phonetic perception of voicing is directly recoverable from both acoustic and auditory representations of the stimuli.
Detection of speech in noisy recordings becomes a challenging problem when the noise does not follow the usual whiteness, stationarity and high signal-to-noise ratio assumptions. A robust speech detector can affect si...
详细信息
Detection of speech in noisy recordings becomes a challenging problem when the noise does not follow the usual whiteness, stationarity and high signal-to-noise ratio assumptions. A robust speech detector can affect significantly the performance of several speechprocessing tasks, such as endpoint detection, segmentation, and finally recognition, if we deal with real life data, as opposed to laboratory or controlled environment recordings. The detector proposed is based on a Gaussianity test that employs third-order cumulants of the data to decide on the binary hypotheses of noise only versus speech plus noise. speech intervals are detected by exploiting the third-order information present in the speech signal. The detector can handle a large family of additive noises, thanks to its third-order statistics basis. The sample-adaptive and decision feedback variations proposed, provide the detector with a tracking ability both with respect to the time variations of speech and the possible nonstationarity of noise. Experiments carried out using real data, recorded in a moving car interior, show satisfactory performance of the proposed algorithms down to -6 dB signal-to-noise ratio.< >
暂无评论