The coder proposed in this paper falls in the class of segmental vocoders known as phonetic vocoders. Speaker recognisability is one of the main problems faced by vocoders at the lowest bit rates, given the need to re...
详细信息
The coder proposed in this paper falls in the class of segmental vocoders known as phonetic vocoders. Speaker recognisability is one of the main problems faced by vocoders at the lowest bit rates, given the need to reduce speaker specific information. Hence, phonetic vocoders are very suitable to speaker dependent coding, and can achieve bit rates as low as 250 bit/s. For speaker independent coding a speaker adaptation methodology is adopted, although resulting in higher bit rates to transmit the speaker specific information. In order to further reduce the corresponding bit rate, a new method is proposed that explores the intra-speaker correlation for the same phone.
This paper investigates the performance of an isolated word recognition (IWR) system in a noisy environment. Two approaches have been demonstrated to overcome the effect of the noise on the recognition accuracy. These...
详细信息
This paper investigates the performance of an isolated word recognition (IWR) system in a noisy environment. Two approaches have been demonstrated to overcome the effect of the noise on the recognition accuracy. These approaches are, using noise immune features and reference model contamination. The performance is evaluated in a noisy environment at different signal-to-noise ratios (SNR), with different feature extraction techniques including linear predictive coding (LPC), cepstrum analysis, weighted cepstrum analysis, and perceptual linear predictive coding (PLP). The performance of these features is compared based on the recognition accuracy. The results have shown that the PLP features exhibits the best noise immunity and recognition accuracy among the studied features.
This paper proposes a text-dependent speaker identification system applied to Thai language. Isolated digits 0-9 and their concatenations are used for speaking text. linear prediction coefficients (LPC) are extracted ...
详细信息
This paper proposes a text-dependent speaker identification system applied to Thai language. Isolated digits 0-9 and their concatenations are used for speaking text. linear prediction coefficients (LPC) are extracted and formed as feature vectors represented each speech signal. Dynamic time warping (DTW) is used to measure distances between referenced and evaluated vectors. These distances, indicating nearness of unknown vectors to references, incorporated with the K-nearest neighbor (KNN) decision technique are used to decide who possesses those unknown vectors. The experimental results have shown that the best identification rate for a single digit is 95.83% and the highest rate for concatenated digits of top-3, top-5, and top-7 are 98.75%, 100%, and 99.20%, respectively.
Text-to-speech synthesis is of great interest and its applications are several. For this reason, it has interested many researchers for decades. Two methods are usually used: synthesis by rule and synthesis by concate...
详细信息
Text-to-speech synthesis is of great interest and its applications are several. For this reason, it has interested many researchers for decades. Two methods are usually used: synthesis by rule and synthesis by concatenation of pre-recorded sounds. But these methods have some disadvantages such as difficulty to be adapted to a new speaker or to a new language. Recently, neural networks (NN) have been used with nonconventional problems where a traditional solution seems impossible. Text-to-speech appears as one of these problems. In this field, it has been shown that NN don't work well when they are directly fed with speech samples. Therefore, works have been done to explore and evaluate different parametric forms of speech based on linear predictive coding (LPC), used for training, and found that LSP produced the best results. However, these methods don't take into account residual signal and speech produced was machine-like and not natural. We propose in this paper to drive the NN with codebook-excited linear prediction (CELP), which provides high quality speech, to perform Arabic speech synthesis.
In this paper, a very low bit speech coder at 1.2 kbps is newly proposed. Like the LPC vocoder, it only requires gain, pitch, and spectral information, but its quality is far superior. The synthesis method is one of h...
详细信息
In this paper, a very low bit speech coder at 1.2 kbps is newly proposed. Like the LPC vocoder, it only requires gain, pitch, and spectral information, but its quality is far superior. The synthesis method is one of harmonic coding, using sinusoids whose frequencies are multiples of the fundamental frequency, where the amplitudes of the sinusoids are adaptively modulated using gammatone filters as a perceptual weighting filter. The sinusoids' phases are also adjusted so as to maximize the perceptual quality. In order to reduce the total bit rate to 1.2 kbit/s, a new segment coder for spectral information (LSP coefficients) using DP matching is also proposed. The quality of the synthesized speech was improved by 0.45 in the mean opinion score (MOS) compared with that of the simple LPC vocoder operating at the same rate, and it was comparable to that of 2.4 kbit/s MELP coder.
This paper describes an 8 kbit/s ACELP speech coder with high performance for both speech and non-speech signals such as background noise. While the traditional waveform matching LPAS structure employed in many existi...
详细信息
This paper describes an 8 kbit/s ACELP speech coder with high performance for both speech and non-speech signals such as background noise. While the traditional waveform matching LPAS structure employed in many existing speech coders provides high quality for speech signals, it has significant performance limitations for, for example, background noise. The coder presented here employs a novel adaptive gain coding technique using energy matching in combination with a traditional waveform matching criterion providing high quality for both speech and background noise. The coder has a basic structure similar to that of the 7.4 kbit/s D-AMPS EFR coder, with a 10 th order LPC, high resolution adaptive codebook and a 4 pulse algebraic codebook. The performance for speech signals is equivalent to or better than that of state-of-the-art 8 kbit/s coders, while for background noise conditions the performance is significantly improved.
This paper presents a harmonic+noise speech coder which uses an efficient spectral quantization technique and a novel voiced/unvoiced (V/UV) mixing model. The harmonic magnitudes are coded at 23 bits/frame using the m...
详细信息
This paper presents a harmonic+noise speech coder which uses an efficient spectral quantization technique and a novel voiced/unvoiced (V/UV) mixing model. The harmonic magnitudes are coded at 23 bits/frame using the magnitude response of a linear predictive coding (LPC) system. The difference between the harmonic magnitudes and the sampled magnitude response is minimized by the closed-loop approach. The V/UV mixing is modeled by a smooth function which is derived from the speech spectrum envelope based on the flatness measure. The V/UV mixing model allows noise to be added in the harmonic portion of speech spectrum so that buzzyness is reduced. The V/UV mixing information is determined from the spectral parameters available in the decoder, no bits are needed for transmitting the V/UV information. A 1.4 kbps harmonic coder is developed. The speech quality of the coder is comparable to other harmonic coders operating at higher rates.
In this paper, we propose a pitch synchronous addition method for LPC analysis by making use of the periodicity of speech. It is shown that the solution overcomes the difficulty involved with the technique of noise re...
详细信息
In this paper, we propose a pitch synchronous addition method for LPC analysis by making use of the periodicity of speech. It is shown that the solution overcomes the difficulty involved with the technique of noise reduction compatible with the stability of the LPC filter obtained by subtracting the noise part from the autocorrelation function of speech. The relation between the pitch period of speech and the improvement in signal-to-noise ratio accomplished by the method is investigated. The simulation results show the effectiveness of the proposed method especially for high-pitched speech.
This paper describes an implementation of MELP (mixed excitation linear prediction) vocoder. Subband division required for implementing the MELP vocoder was performed by the lifting wavelet transform. A new method to ...
详细信息
This paper describes an implementation of MELP (mixed excitation linear prediction) vocoder. Subband division required for implementing the MELP vocoder was performed by the lifting wavelet transform. A new method to generate an appropriate glottal waveform was devised. In addition, three kinds of fluctuations observed in the steady parts of voiced speech were incorporated to enhance the naturalness of synthesized speech.
暂无评论