This work explores the possibility of using time-frequency distributions (TFD) to extract time varying formant information. This technique makes use of the TFD of Cohen's (see Prentice-Hall, Englewood Cliffs, NJ, ...
详细信息
This work explores the possibility of using time-frequency distributions (TFD) to extract time varying formant information. This technique makes use of the TFD of Cohen's (see Prentice-Hall, Englewood Cliffs, NJ, 1995) class and provides the profile of formant variation continuously in the time-frequency plane which can be employed to improve formant tracking and formant bandwidth estimation. The performance of this method is compared with other existing methods, which have their own pitfalls, using a modulated synthetic signal as input. It is shown the proposed method gives better formant estimation and also provides better visualization representation. This method is used to analyze real human speech and the results can be helpful for speech understanding and speech synthesis.
A new phase model for low-bit rate sinusoidal coding of speech is presented. Short-time sinusoidal phases are approximated using a combination of linear prediction, spectral sampling, delay compensation, and phase cor...
详细信息
A new phase model for low-bit rate sinusoidal coding of speech is presented. Short-time sinusoidal phases are approximated using a combination of linear prediction, spectral sampling, delay compensation, and phase correction techniques. The algorithm is different than phase compensation methods proposed for source-system LPC in that it has been tailored to sinusoidal representation of speech. Performance analysis on a large speech database indicates considerable improvement in temporal and spectral signal matching as well as improved subjective quality of the reconstructed speech. The extra parameters used for representation of the sine wave phases require a small number of bits. The method can be applied to enhance phase matching in low bit rate sinusoidal coders, where underlying sine wave amplitudes are extracted from an all-pole model.
Speech coding at very low bit-rate is useful for purposes such as voice communication over computer networks. However speech coding at around 2.0 kbit/s is difficult for CELP coders while maintaining a high quality. I...
详细信息
Speech coding at very low bit-rate is useful for purposes such as voice communication over computer networks. However speech coding at around 2.0 kbit/s is difficult for CELP coders while maintaining a high quality. In this paper, a speech coding model called 'normalized pitch waveform' and its quantization scheme are presented, aiming for effective compression coding of the 'voiced' speech. Listening tests have proven that an efficient and high quality coding has been achieved at 2.0 kbit/s, less than half of the FS1016. Furthermore this paper discusses the disadvantage of the normalized pitch waveform and presents an alternative method of using non-normalized pitch waveform. Encoding of a transitional 'mixed' state between the 'voiced' and the 'unvoiced' state is discussed for further improvements.
The intraframe correlation properties of line spectrum pair (LSP) are used to develop an efficient encoding algorithm using the Karhunen-Loeve (KL) transformation. The important nonuniform statistical characteristics ...
详细信息
ISBN:
(纸本)0818679190
The intraframe correlation properties of line spectrum pair (LSP) are used to develop an efficient encoding algorithm using the Karhunen-Loeve (KL) transformation. The important nonuniform statistical characteristics of LSP frequencies are investigated. Based upon this nonuniform property the neural network based techniques for generating the transform vectors via system training are studied. Using the principal component analysis (PCA) network to decorrelate LSP coefficients, we show that these new approaches lead to as good or better distortion as compared to other methods for speech analysis-synthesis.
CELP coders commonly use line spectral pairs (LSP) to represent linear prediction parameters, giving stable filters and efficient coding. However, manipulation of LSPs can alter frequencies within the represented sign...
详细信息
CELP coders commonly use line spectral pairs (LSP) to represent linear prediction parameters, giving stable filters and efficient coding. However, manipulation of LSPs can alter frequencies within the represented signals. This paper describes two computationally efficient LSP-based processing methods designed to enhance the intelligibility of speech degraded by acoustic interference.
This paper describes the design of a toll-quality 4-kbit/s speech coder based on phase-adaptive PSI-CELP. This adaptation method not only gives pitch periodicity to the random excitation but also synchronizes the basi...
详细信息
This paper describes the design of a toll-quality 4-kbit/s speech coder based on phase-adaptive PSI-CELP. This adaptation method not only gives pitch periodicity to the random excitation but also synchronizes the basic point of the stored random vector with the pitch phase. We further improve the proposed coder by introducing a backward gain prediction scheme. In subjective evaluation experiments, there is no significant difference between the quality of ITU-T G.726 32-kbit/s coder and that of the proposed 4-kbit/s coder under the conditions of normal and low input levels, tandem connection for clean speech. In noisy environments, there are also no significant differences between G.726 and 4-kbit/s coders from MOS results of the ACR test.
作者:
P. PrandomM. GoodwinM. VetterliLCAV
Ecole Polytech. Fed. de Lausanne Switzerland EECS
University of California Berkeley USA LCAV
Ecole Polytechnique Fédérale de Lausanne Switzerland
The idea of optimal joint time segmentation and resource allocation for signal modeling is explored with respect to arbitrary segmentations and arbitrary representation schemes. When the chosen signal modeling techniq...
详细信息
The idea of optimal joint time segmentation and resource allocation for signal modeling is explored with respect to arbitrary segmentations and arbitrary representation schemes. When the chosen signal modeling techniques can be quantified in terms of a cost function which is additive over distinct segments, a dynamic programming approach guarantees the global optimality of the scheme while keeping the computational requirements of the algorithm sufficiently low. Two immediate applications of the algorithm to LPC speech coding and to sinusoidal modeling of musical signals are presented.
Line spectrum pair (LSP) representation of linear predictive coding (LPC) parameters is widely used in speech coding applications. An efficient method for LPC to LSP conversion is Kabal's method. In this method th...
详细信息
Line spectrum pair (LSP) representation of linear predictive coding (LPC) parameters is widely used in speech coding applications. An efficient method for LPC to LSP conversion is Kabal's method. In this method the LSPs are the roots of two polynomials P'/sub p/(x) and Q'/sub p/(x), and are found by a zero crossing search followed by successive bisections and interpolation. The precision of the obtained LSPs is higher than required by most applications, but the number of bisections cannot be decreased without compromising the zero crossing search. In this paper, it is shown that, in the case of 10th-order LPC, five intervals containing each only one zero crossing of P'/sub 10/(x) and one zero crossing of Q'/sub 10/(x) can be calculated, avoiding the zero crossing search. This allows a trade-off between LSP precision and computational complexity resulting in considerable computational saving.
An efficient codebook search method for the EIA/TIA IS-54 vector-sum excited linearpredictive (VSELP) speech coder is described. The method uses a two-stage search procedure. In the first stage, diagonal approximatio...
详细信息
An efficient codebook search method for the EIA/TIA IS-54 vector-sum excited linearpredictive (VSELP) speech coder is described. The method uses a two-stage search procedure. In the first stage, diagonal approximation of the correlation matrix of the filtered basis vectors is assumed and a simple sign detection procedure is used to identify a codeword which is close to the optimum codeword. In the second stage, a refinement search is carried out on those codewords which have a Hamming distance of one from the codeword obtained in the first stage. The new search procedure has a complexity only proportional to the bit rate which is much faster than the Gray code search employed in the IS-54 VSELP coder. Simulation results show that the SNR obtained using the proposed fast procedure is the same as that obtained in the standard VSELP coder.
Line spectral frequencies (LSFs) are the most popular parameters for spectrum quantization in speech coders using linear prediction. A new method for the quantization of the LSFs is proposed in this paper. This method...
详细信息
Line spectral frequencies (LSFs) are the most popular parameters for spectrum quantization in speech coders using linear prediction. A new method for the quantization of the LSFs is proposed in this paper. This method is a scalar quantization scheme based on a nonlinear two-dimensional prediction in the index domain, and hereafter will be referred to as predictive delta adaptive scalar quantization (PDASQ). It is shown that it can be implemented efficiently with negligible computational overhead and memory requirements compared to the simple scalar quantization method. Although PDASQ needs lower bit rates, its quantization distortion is of the same order as that of the conventional scalar quantization. Satisfactory performance of the new method is verified through experimental tests using computer simulation.
暂无评论