Noise robust LPC extraction from the voiced speech signal is addressed with a missing-data approach. Harmonics in the voiced speech spectrum are detected using a general electromagnetic motion sensor (GEMS) that is im...
详细信息
Noise robust LPC extraction from the voiced speech signal is addressed with a missing-data approach. Harmonics in the voiced speech spectrum are detected using a general electromagnetic motion sensor (GEMS) that is immune to acoustic background noise. Non-harmonic frequencies are treated as missing-data and severely suppressed while no processing is done on the harmonic frequencies since they are assumed to have high SNR. Objective measure tests using the log likelihood ratio (LLR) show significant improvement over the noisy case for severely noisy environments.
Harmonic and noise diphone concatenation is a proven method to obtain high-quality speech synthesis, but cannot be used when the basis corpus does not contain all the diphones needed. We propose a method to complete a...
详细信息
Harmonic and noise diphone concatenation is a proven method to obtain high-quality speech synthesis, but cannot be used when the basis corpus does not contain all the diphones needed. We propose a method to complete an individual's corpus using examples from other corpora. Parametrisation of five vowels from different speakers is done with an harmonic and noise model (HNM). We use multi-frame analysis (MFA) and smoothing kernels to estimate the harmonic power spectrum envelopes. Different kernels are compared to predict the harmonic envelopes of vowels using training data. We use euclidian distance to measure similarity between the real envelopes and the predicted ones. Synthesis of the interpolated vowels are then performed using learned optimal parameters. Our results show Gaussian kernels can achieve a 1.8 dB (34.4%) reduction of harmonic distorsion compared to the mean harmonic envelope estimator. As far as we know, there is no other literature on phoneme prediction for realistic speech synthesis.
Protein sequence comparison is the most powerful tool for the identification of novel protein structure and function. This type of inference is commonly based on the similar sequence-similar structure-similar function...
详细信息
Protein sequence comparison is the most powerful tool for the identification of novel protein structure and function. This type of inference is commonly based on the similar sequence-similar structure-similar function paradigm, and derived by sequence similarity searching on databases of protein sequences. As entire genomes have been being determined at a rapid rate, computational methods for comparing protein sequences will be more essential for probing the complexity of molecular machines. In this paper we introduce a pattern-comparison algorithm, which is based on the mathematical concept of linear-predictive-coding based cepstral distortion measure, for comparison and identification of protein sequences. Experimental results on a real data set of functionally related and functionally non-related protein sequences have shown the effectiveness of the proposed approach on both accuracy and computational efficiency.
This paper introduces a novel method for accurate pitch estimation and speech segmentation, named multi-feature, autocorrelation (ACR) and wavelet technique (MAWT). MAWT uses feature extraction, and ACR applied on lin...
详细信息
This paper introduces a novel method for accurate pitch estimation and speech segmentation, named multi-feature, autocorrelation (ACR) and wavelet technique (MAWT). MAWT uses feature extraction, and ACR applied on linear predictive coding (LPC) residuals, with a wavelet-based refinement step. MAWT opens the way for a unique approach to modeling: although speech is divided into segments, the success of voicing decisions is not crucial. Experiments demonstrate the superiority of MAWT in pitch period detection accuracy over existing methods, and illustrate its advantages for speech segmentation. These advantages are more pronounced for gain-varying and transitional speech, and under noisy conditions.
This paper presents a methodology that uses surface electromyogram (SEMG) signals recorded from the cheek and chin to synthesize speech. Simultaneously recorded speech and SEMG signals are blocked into frames and tran...
详细信息
This paper presents a methodology that uses surface electromyogram (SEMG) signals recorded from the cheek and chin to synthesize speech. Simultaneously recorded speech and SEMG signals are blocked into frames and transformed into features. linear predictive coding (LPC) and short-time Fourier transform coefficients are chosen as speech and SEMG features respectively. A neural network is applied to convert SEMG features into speech features on a frame-by-frame basis. The converted speech features are used to reconstruct the original speech. Feature selection, conversion methodology and experimental results are discussed. The results show that phoneme-based feature extraction and frame-based feature conversion could be applied to SEMG-based continuous speech synthesis
In this paper, we present our concept for a sequence of experiments with speech recognizers used in teaching speech recognition techniques. The experiments are performed with a combination of own tools and the hidden ...
详细信息
In this paper, we present our concept for a sequence of experiments with speech recognizers used in teaching speech recognition techniques. The experiments are performed with a combination of own tools and the hidden Markov toolkit (HTK). The first experiment demonstrates speaker dependent recognition based on the dynamic time warp algorithm. In the course of this experiment all utterances from the students are recorded and used to build up a data base. Both the recognizer and the tool used for viewing and editing the speech data are written in Java making them platform independent and easy to extend. The recorded speech data is then utilized to train and test a speaker independent recognizer.
Monitoring and diagnosis of the operating machines are very important for safety operation and maintenance in the industrial fields. These machines are mostly rotating machines. In this paper, we propose fault detecti...
详细信息
Monitoring and diagnosis of the operating machines are very important for safety operation and maintenance in the industrial fields. These machines are mostly rotating machines. In this paper, we propose fault detection and diagnosis method using the LPC (linear predictive coding) and residual signal energy. We applied our method to the induction motors depending on various status of faulted condition and could obtain good results.
It is generally thought that breathiness can be added to voices by modifying the glottal source within a source-filter model. However, this does not work well when the original voice is very different from the desired...
详细信息
It is generally thought that breathiness can be added to voices by modifying the glottal source within a source-filter model. However, this does not work well when the original voice is very different from the desired breathy voice. In this experiment, a voice conversion algorithm is used to investigate the relationship between the glottal source and the vocal tract filter. The LPC residual from one voice is fed into the LPC filter of another voice. According to a source-filter theory of the voice, the synthesized voice should take on the glottal quality of the LPC source. This hypothesis is evaluated through a perceptual test with a linguistics expert. The results suggest that the vocal tract does have an influence on the perception of breathy voices. Given the narrow nature of this experiment, further testing is recommended to verify these results.
A single chip solution for text-to-speech synthesis is presented in this paper. The proposed system is the first hardware solution for synthesizing unlimited vocabulary speech in real time for Turkish language. The in...
详细信息
A single chip solution for text-to-speech synthesis is presented in this paper. The proposed system is the first hardware solution for synthesizing unlimited vocabulary speech in real time for Turkish language. The integrated circuit converts incoming letters in ASCII format to speech by using clues from the textual context. The system has a language dependent database, which contains pre-recorded human speech samples coded by LPC (linear predictive coding) method. It has a very low complexity (consisting of only 3503 technology specific cells) compared to other board-level designs available in the market, resulting in low power and flexibility for FPGA implementation or incorporation into larger chips as IP.
暂无评论