While a number of studies have investigated various speech enhancement and noise suppression schemes, most consider either a single channel or array processing framework. Clearly there are potential advantages in leve...
详细信息
While a number of studies have investigated various speech enhancement and noise suppression schemes, most consider either a single channel or array processing framework. Clearly there are potential advantages in leveraging the strengths of array processing solutions in suppressing noise from a direction other than the speaker, with that seen in single channel methods that include speech spectral constraints or psychoacoustically motivated processing. In this paper, we propose to integrate a combined fixed/adaptive beamforming algorithm (CFA-BF) for speech enhancement with two single channel methods based on speech spectral constrained iterative processing (Auto-LSP), and an auditory masked threshold based method using equivalent rectangular bandwidth filtering (GMMSE-AMT-ERB). After formulating the method, we evaluate performance on a subset of the TIMIT corpus with four real noise sources. We demonstrate a consistent level of noise suppression and voice communication quality improvement using the proposed method as reflected by an overall average 26dB increase in SegSNR from the original degraded audio corpus.
Noisy cars are very difficult listening environments for persons with hearing loss. While there have been numerous studies in the field of speech enhancement for car noise environments, the majority of these studies h...
详细信息
Noisy cars are very difficult listening environments for persons with hearing loss. While there have been numerous studies in the field of speech enhancement for car noise environments, the majority of these studies have focused on noise reduction for normal hearing individuals. In this paper, we present recent results in the development of more effective speech capture and enhancement processing for wireless voice interaction for persons with hearing loss in real car environments. We first present a data collection experiment for a proposed FM wireless transmission scenario using a 5-channel microphone array in the car, followed by several alternative speech enhancement algorithms. After formulating 6 different processing methods, we evaluate the performance by SegSNR improvement using data recorded in a moving car environment. Among the 6 processing configurations, the combined fixed/adaptive beamforming (CFA-BF) obtains the highest level of SegSNR improvement by up to 2.65 dB.
This paper presents work on the task of constructing an example base from a given bilingual corpus based on the annotation schema of Translation Corresponding Tree (TCT). Each TCT describes a translation example (a pa...
详细信息
Unknown word recognition is an important problem in Chinese word segmentation systems. In this paper, we propose an integrated method for Chinese unknown word extraction for offline corpus processing, in which both co...
详细信息
In this work, we are concerned with a coarse grained semantic analysis over sparse data, which labels all nouns with a set of semantic categories. To get the benefit of unlabeled data, we propose a bootstrapping frame...
详细信息
Voice Onset Time (VOT) is an important temporal feature in speech perception and speech recognition. It also benefits for accent detection[1,2]. Fixed length frame based speechprocessing inherently ignores VOT. In th...
详细信息
Voice Onset Time (VOT) is an important temporal feature in speech perception and speech recognition. It also benefits for accent detection[1,2]. Fixed length frame based speechprocessing inherently ignores VOT. In this paper we propose a more effective VOT detection scheme using the non-linear energy tracking algorithm (Teager Energy Operator (TEO)) across a sub-frequency band partition for unvoiced stops (p, t and k). The VOT detection algorithm is applied to the problem of accent classification. Three different language groups (Indian, Chinese and American English) are used from CU-Accent-Corpus to compare VOT's of both accented and native American English. Some pathological cases are considered where speakers have breathy voices or other issues in recording procedure. The VOT is detected with less than 10% error when compared to the manual detected VOT. Also, pairwise English accent classification are 87% for Chinese accent, 80% for English accent, and 47% for Indian accent (includes atypical cases for Indian case).
Human-computer interaction for in-vehicle information and navigation systems is a challenging problem because of the diverse and changing acoustic environments. It is proposed that the integration of video and audio i...
详细信息
Human-computer interaction for in-vehicle information and navigation systems is a challenging problem because of the diverse and changing acoustic environments. It is proposed that the integration of video and audio information can significantly improve dialog system performance, since the visual modality is not impacted by acoustic noise. In this paper, we propose a robust audio-visual integration system for source tracking and speech enhancement for an in-vehicle speech dialog system. The proposed system integrates both audio and visual information to locate the desired speaker source. Using real data collected in car environments, the proposed system can improve speech accuracy by up to 40.75% compared with audio data alone.
In this paper, we explore the use of Random Forests (RFs) in the structured language model (SLM), which uses rich syntactic information in predicting the next word based on words already seen. The goal in this work is...
In this paper, we explore the use of Random Forests (RFs) in the structured language model (SLM), which uses rich syntactic information in predicting the next word based on words already seen. The goal in this work is to construct RFs by randomly growing Decision Trees (DTs) using syntactic information and investigate the performance of the SLM modeled by the RFs in automatic speech ***, which were originally developed as classifiers, are a combination of decision tree classifiers. Each tree is grown based on random training data sampled independently and with the same distribution for all trees in the forest, and a random selection of possible questions at each node of the decision tree. Our approach extends the original idea of RFs to deal with the data sparseness problem encountered in language *** have been studied in the context of n-gram language modeling and have been shown to generalize well to unseen data. We show in this paper that RFs using syntactic information can also achieve better performance in both perplexity (PPL) and word error rate (WER) in a large vocabulary speech recognition system, compared to a baseline that uses Kneser-Ney smoothing.
We present results of probabilistic tagging of Portuguese texts in order to show how these techniques work for one of the highly morphologically ambiguous inflective languages by using a limited corpus as the basic tr...
详细信息
We present results of probabilistic tagging of Portuguese texts in order to show how these techniques work for one of the highly morphologically ambiguous inflective languages by using a limited corpus as the basic training source. In order to cope the ambiguities problem caused by the insufficient training data, especially the unknown words, we incorporate the lexical features into the probabilistic model. Different from other proposed tagging models, these features are introduced into the word probabilities by means of interpolation. A technique to determine the optimal set of interpolation parameters based on genetic algorithm is described. Our preliminary result shows that we can correctly tag 91.8% of the sentences based on our tagging model.
暂无评论