Harmonic and noise diphone concatenation is a proven method to obtain high-quality speech synthesis, but cannot be used when the basis corpus does not contain all the diphones needed. We propose a method to complete a...
详细信息
Harmonic and noise diphone concatenation is a proven method to obtain high-quality speech synthesis, but cannot be used when the basis corpus does not contain all the diphones needed. We propose a method to complete an individual's corpus using examples from other corpora. Parametrisation of five vowels from different speakers is done with an harmonic and noise model (HNM). We use multi-frame analysis (MFA) and smoothing kernels to estimate the harmonic power spectrum envelopes. Different kernels are compared to predict the harmonic envelopes of vowels using training data. We use euclidian distance to measure similarity between the real envelopes and the predicted ones. Synthesis of the interpolated vowels are then performed using learned optimal parameters. Our results show Gaussian kernels can achieve a 1.8 dB (34.4%) reduction of harmonic distorsion compared to the mean harmonic envelope estimator. As far as we know, there is no other literature on phoneme prediction for realistic speech synthesis.
Protein sequence comparison is the most powerful tool for the identification of novel protein structure and function. This type of inference is commonly based on the similar sequence-similar structure-similar function...
详细信息
Protein sequence comparison is the most powerful tool for the identification of novel protein structure and function. This type of inference is commonly based on the similar sequence-similar structure-similar function paradigm, and derived by sequence similarity searching on databases of protein sequences. As entire genomes have been being determined at a rapid rate, computational methods for comparing protein sequences will be more essential for probing the complexity of molecular machines. In this paper we introduce a pattern-comparison algorithm, which is based on the mathematical concept of linear-predictive-coding based cepstral distortion measure, for comparison and identification of protein sequences. Experimental results on a real data set of functionally related and functionally non-related protein sequences have shown the effectiveness of the proposed approach on both accuracy and computational efficiency.
This paper introduces a novel method for accurate pitch estimation and speech segmentation, named multi-feature, autocorrelation (ACR) and wavelet technique (MAWT). MAWT uses feature extraction, and ACR applied on lin...
详细信息
This paper introduces a novel method for accurate pitch estimation and speech segmentation, named multi-feature, autocorrelation (ACR) and wavelet technique (MAWT). MAWT uses feature extraction, and ACR applied on linear predictive coding (LPC) residuals, with a wavelet-based refinement step. MAWT opens the way for a unique approach to modeling: although speech is divided into segments, the success of voicing decisions is not crucial. Experiments demonstrate the superiority of MAWT in pitch period detection accuracy over existing methods, and illustrate its advantages for speech segmentation. These advantages are more pronounced for gain-varying and transitional speech, and under noisy conditions.
This paper presents a methodology that uses surface electromyogram (SEMG) signals recorded from the cheek and chin to synthesize speech. Simultaneously recorded speech and SEMG signals are blocked into frames and tran...
详细信息
This paper presents a methodology that uses surface electromyogram (SEMG) signals recorded from the cheek and chin to synthesize speech. Simultaneously recorded speech and SEMG signals are blocked into frames and transformed into features. linear predictive coding (LPC) and short-time Fourier transform coefficients are chosen as speech and SEMG features respectively. A neural network is applied to convert SEMG features into speech features on a frame-by-frame basis. The converted speech features are used to reconstruct the original speech. Feature selection, conversion methodology and experimental results are discussed. The results show that phoneme-based feature extraction and frame-based feature conversion could be applied to SEMG-based continuous speech synthesis
In this paper, we present our concept for a sequence of experiments with speech recognizers used in teaching speech recognition techniques. The experiments are performed with a combination of own tools and the hidden ...
详细信息
In this paper, we present our concept for a sequence of experiments with speech recognizers used in teaching speech recognition techniques. The experiments are performed with a combination of own tools and the hidden Markov toolkit (HTK). The first experiment demonstrates speaker dependent recognition based on the dynamic time warp algorithm. In the course of this experiment all utterances from the students are recorded and used to build up a data base. Both the recognizer and the tool used for viewing and editing the speech data are written in Java making them platform independent and easy to extend. The recorded speech data is then utilized to train and test a speaker independent recognizer.
Monitoring and diagnosis of the operating machines are very important for safety operation and maintenance in the industrial fields. These machines are mostly rotating machines. In this paper, we propose fault detecti...
详细信息
Monitoring and diagnosis of the operating machines are very important for safety operation and maintenance in the industrial fields. These machines are mostly rotating machines. In this paper, we propose fault detection and diagnosis method using the LPC (linear predictive coding) and residual signal energy. We applied our method to the induction motors depending on various status of faulted condition and could obtain good results.
A single chip solution for text-to-speech synthesis is presented in this paper. The proposed system is the first hardware solution for synthesizing unlimited vocabulary speech in real time for Turkish language. The in...
详细信息
A single chip solution for text-to-speech synthesis is presented in this paper. The proposed system is the first hardware solution for synthesizing unlimited vocabulary speech in real time for Turkish language. The integrated circuit converts incoming letters in ASCII format to speech by using clues from the textual context. The system has a language dependent database, which contains pre-recorded human speech samples coded by LPC (linear predictive coding) method. It has a very low complexity (consisting of only 3503 technology specific cells) compared to other board-level designs available in the market, resulting in low power and flexibility for FPGA implementation or incorporation into larger chips as IP.
It is generally thought that breathiness can be added to voices by modifying the glottal source within a source-filter model. However, this does not work well when the original voice is very different from the desired...
详细信息
It is generally thought that breathiness can be added to voices by modifying the glottal source within a source-filter model. However, this does not work well when the original voice is very different from the desired breathy voice. In this experiment, a voice conversion algorithm is used to investigate the relationship between the glottal source and the vocal tract filter. The LPC residual from one voice is fed into the LPC filter of another voice. According to a source-filter theory of the voice, the synthesized voice should take on the glottal quality of the LPC source. This hypothesis is evaluated through a perceptual test with a linguistics expert. The results suggest that the vocal tract does have an influence on the perception of breathy voices. Given the narrow nature of this experiment, further testing is recommended to verify these results.
We experimentally evaluated an active speech control scheme which reduces unnecessary speech radiated into the surrounding space. The intended application of this system, typically cellular phones, does not require sp...
详细信息
We experimentally evaluated an active speech control scheme which reduces unnecessary speech radiated into the surrounding space. The intended application of this system, typically cellular phones, does not require speech to be radiated into the surrounding space, but only into the microphone. We previously proposed to reduce speech by generating phase-inverted predicted speech from a secondary loudspeaker. We used LPC recursively to predict samples ahead of the associated processing delay, which could go up to a few milliseconds. First, predicted samples of recorded speech were prepared off line. Then, both the original and the phase-inverted predicted samples were played out simultaneously from two loud speakers. It was found that: 1) speech cancellation of 10 dB is possible, but is highly speaker dependent; 2) the secondary loudspeaker should be oriented in the same direction as the primary source, i.e., the mouth for maximum cancellation.
暂无评论