The line spectral pair (LSP) conveys vocal tract information for reconstruction of speech in an analysis-synthesis speech coding system. Quantization of LSP parameters is necessary to reduce bit-rate, but causes unavo...
详细信息
The line spectral pair (LSP) conveys vocal tract information for reconstruction of speech in an analysis-synthesis speech coding system. Quantization of LSP parameters is necessary to reduce bit-rate, but causes unavoidable degradation in quality. This paper proposes an adaptive quantization method to reduce bit-rate whilst maintaining reasonable speech quality. The method exploits the relationship between LSP position and speech features.
A new algorithm is proposed to re-optimize the synthesis filter parameters and pulse amplitudes in the Multi-Pulse Excitation linear Prediction (LP) synthesizer. It is based on minimizing the perceptional weighted mea...
详细信息
ISBN:
(纸本)0780362535
A new algorithm is proposed to re-optimize the synthesis filter parameters and pulse amplitudes in the Multi-Pulse Excitation linear Prediction (LP) synthesizer. It is based on minimizing the perceptional weighted mean square errors between the original and the reconstructed speech. Steepest gradient descent and conjugate gradient descent are employed. The LP filter parameters and all the pulse amplitudes are iteratively re-optimized. Compared with the conventional multi-pulse excitation LPC algorithm, the new algorithm improves the segmental signal-to-noise ratio (SEGSNR) from about 20 dB to 23 dB. The informal listening also shows that the quality of synthesized speech is improved greatly.
This paper presents the development of Thai text dependent speaker identification system by applying two feature-feeding approaches. A well-known multilayer perceptron (MLP) network with backpropagation learning algor...
详细信息
This paper presents the development of Thai text dependent speaker identification system by applying two feature-feeding approaches. A well-known multilayer perceptron (MLP) network with backpropagation learning algorithm is chosen. It has fast processing time and good performance for pattern recognition problems. But MLP has a limitation in that a network must have a fixed amount of input nodes. Therefore, the linear interpolation time normalization is chosen to adjust the input speech signal into a fixed size of input vector. Furthermore, the windowing technique is developed to avoid the distortion caused by a time normalization process. A fixed size window is sliced through the preprocessed features with fixed amount of overlapping frames. The high identification rate observed in experiments confirms that the developed windowing is suitable for the proposed Thai text-dependent speaker identification system.
A variable rate speech coder utilizing speech classification is presented as a cost effective solution to applications where low bit rate, good quality, and less complexity are the main objectives. The coder operates ...
详细信息
A variable rate speech coder utilizing speech classification is presented as a cost effective solution to applications where low bit rate, good quality, and less complexity are the main objectives. The coder operates at an average rate of 2.4 kbps and the speech quality is close to that produced by the ITU G.723.1 codec. It has been implemented with less than 10 MIPS, 2.5 K RAM, 3.3 K DROM, and 6 K PROM on a 16-bit fixed-point general purpose DSP. The framework of the coder is based on multi-pulse and code-excited coding schemes. The rate variation and coding efficiency are manifested in a multimode excitation modeling depending on the nature of the input speech.
Many speech rate conversion methods have been proposed. In a speech rate conversion, there are two kinds of processes: one is a contraction of a speech period and the other is an extension. Specially, the extension of...
详细信息
Many speech rate conversion methods have been proposed. In a speech rate conversion, there are two kinds of processes: one is a contraction of a speech period and the other is an extension. Specially, the extension of speech period is useful for aged persons to listen easily without changing its composed frequencies of speech and personality. In these methods, the extraction of an accurate pitch period in a voice is needed. However, an accurate pitch period extraction is very complicated. This paper proposes a speech rate conversion without the extraction of an accurate pitch period. As a result, the constructed speech by the proposed method gives a better quality than other convention methods, according to the calculation of power weighted LPC cepstrum distance and the percent error of its output sample numbers.
To generate high quality speech using the linear predictive coding (LPC) technique, a method for detecting pitch contour is critical since the human ear is sensitive to small pitch variation in speech. The auto-correl...
详细信息
To generate high quality speech using the linear predictive coding (LPC) technique, a method for detecting pitch contour is critical since the human ear is sensitive to small pitch variation in speech. The auto-correlation method, though simple to implement with digital signal processors (DSPs), can result in perceptible unnaturalness. This paper describes the cross-correlation technique that can be used to obtain the pitch information more accurately than the auto-correlation method for certain speech samples. Experimental results illustrate the pitch contour detected using both techniques. In general, the cross-correlation method generates less error than the auto-correlation method for pitch determination in an LPC scheme while having the advantage of requiring less computation.
Twenty years of work with sinusoidal modeling of speech has lead to very competitive principles of low rate coding. In this study, we discuss a few issues in the design of a sinusoidal coding system. We stress that by...
详细信息
Twenty years of work with sinusoidal modeling of speech has lead to very competitive principles of low rate coding. In this study, we discuss a few issues in the design of a sinusoidal coding system. We stress that by a careful design of all blocks of the encoder and decoder, allowing for some additional complexity, it is possible to build a low rate coder free of many of the artifacts associated with conventional vocoding systems. We focus this discussion on multi-band partial voicing and on parameter smoothing and interpolation.
This paper describes how to implement the G.723.1 recommendation in the IP telephony gateway and studies in detail the programming of the TMS320C6201 DSP and optimization methods for reducing the speech processing del...
详细信息
This paper describes how to implement the G.723.1 recommendation in the IP telephony gateway and studies in detail the programming of the TMS320C6201 DSP and optimization methods for reducing the speech processing delay of the G.723.1 codec. As a result of adopting these optimization and programming methods, we have implemented a high-speed speech codec which can process concurrently 18 voice channels with a single TMS320C6201 chip in the IP telephony gateway. Finally, the paper summarizes the performance of the resulting ITU-T G.723.1 speech codec.
In this paper we investigate how polarity inversion of speech signals effects human perception, and we apply this technique for data hiding. In most languages, glottal airflow during phonation is uni-directional, caus...
详细信息
In this paper we investigate how polarity inversion of speech signals effects human perception, and we apply this technique for data hiding. In most languages, glottal airflow during phonation is uni-directional, causing constant polarity of the speech waveform. On the other hand, the human auditory system cannot discriminate between speech signals with positive and negative polarity. Based on these facts, we developed an algorithm to hide data in speech signals. We assigned one bit to each syllable of speech, and inverted the polarity of the signal at every syllable according to the assigned bit. We performed a test using 20 sentences from the TIMIT corpus to determine both whether a human could distinguish between the original and polarity-inverted signal and whether we could automatically restore the embedded binary data. We found that we were able to successfully hide data and restore it automatically.
A new, computationally efficient technique for calculation of the line spectrum frequencies (LSF) that can be applied to any order of the LPC analysis is proposed. It is based on the quotient-difference (Q-D) root-fin...
详细信息
A new, computationally efficient technique for calculation of the line spectrum frequencies (LSF) that can be applied to any order of the LPC analysis is proposed. It is based on the quotient-difference (Q-D) root-finding algorithm that enables simultaneous solution for all the LSFs. It is an iterative procedure that offers the tradeoff between accuracy and complexity, what is especially important for the real-lime applications. To improve the convergence, a nonlinear mapping of the LSFs is also proposed for low accuracy applications, the method is even more effective then the fast converging Newton-Rapshon method, but is at the same time exceptionally simple, has a very regular structure and requires only basic mathematical operations.
暂无评论