This paper presents an approach to speech vector quantization of sources exhibiting intervector dependency. We present the optimal decoder based on a collection of received indices. We also present the optimal encoder...
详细信息
This paper presents an approach to speech vector quantization of sources exhibiting intervector dependency. We present the optimal decoder based on a collection of received indices. We also present the optimal encoder for such decoding. The optimal decoder can be implemented as a table look-up decoder, however the size of the decoder codebook grows very fast with the size of the collection of utilized indices. This leads us to introduce a method for storing an approximation to the set of optimal decoder vectors, based on linear mapping of a block code vector quantization. In this approach a heavily reduced set of parameters is employed to represent the codebook. Furthermore, we illustrate that the proposed scheme has an interpretation as nonlinearpredictive quantization. Numerical results indicate high gain over memoryless coding and memory quantization based on linear predictive coding. The results also show that the sub-optimal approach performs close to the optimal.
A new model for the spectral samples obtained in the multiband excitation speech coder (MBE) is introduced. Objective and subjective tests show that it compares favorably with the classical linear prediction (LP) mode...
详细信息
A new model for the spectral samples obtained in the multiband excitation speech coder (MBE) is introduced. Objective and subjective tests show that it compares favorably with the classical linear prediction (LP) model, specially for high pitched speakers. Strategies for efficiently quantizing the model parameters, suitable for low bit rate implementations of the MBE coder, are also addressed.
We present a new speech coding algorithm, based on an all-pole model of the vocal tract. Whereas current autoregressive (AR) based modeling techniques (e.g. CELP, LPC-10) minimize a prediction error, which is consider...
详细信息
We present a new speech coding algorithm, based on an all-pole model of the vocal tract. Whereas current autoregressive (AR) based modeling techniques (e.g. CELP, LPC-10) minimize a prediction error, which is considered to be the input to the all-pole model, our approach determines the closest (in L/sub 2/ norm) signal, which exactly satisfies an all-pole model. Each frame is then encoded by storing the parameters of the complex damped exponentials deduced from the all-pole model and its initial conditions. Decoding is performed by adding the complex damped exponentials based on the transmitted parameters. The new algorithm is demonstrated on a speech signal. The quality is compared with that of a standard coding algorithm at comparable compression ratios, by using the segmental signal-to-noise ratio (SNR).
A new approach to temporal decomposition (TD) of speech, called "spectral stability based event localizing temporal decomposition", abbreviated S/sup 2/ BEL-TD, is presented. The original method of TD propos...
详细信息
A new approach to temporal decomposition (TD) of speech, called "spectral stability based event localizing temporal decomposition", abbreviated S/sup 2/ BEL-TD, is presented. The original method of TD proposed by Atal (1983) is known to have the drawbacks of high computational cost, and the instability of the number and locations of events. In S/sup 2/ BEL-TD, the event localization is performed based on a maximum spectral stability criterion. This overcomes the instability problem of events of the Atal's method. Also, S/sup 2/ BEL-TD avoids the use of the computationally costly singular value decomposition routine used in the Atal's method, thus resulting in a computationally simpler algorithm of TD. Simulation results show that an average spectral distortion of about 1.5 dB can be achieved with LSF as the spectral parameter. Also, we have shown that the temporal pattern of the speech excitation parameters can also be well described using the S/sup 2/ BEL-TD technique.
The duration of vowel steady-states (VSS) was examined acoustically in the speech production of 40 normal young adults. VSS was assessed according to formant frequency changes in sustained /i/ productions and consonan...
详细信息
The duration of vowel steady-states (VSS) was examined acoustically in the speech production of 40 normal young adults. VSS was assessed according to formant frequency changes in sustained /i/ productions and consonant + /i/ + /d/(/Cid/) productions. The duration of the VSS was measured for the first and second formants (F1 and F2) by incorporating a fixed rate-of-change criterion. Results indicated no significant differences in VSS duration according to gender or vowel context. VSS duration based on F1 was significantly longer than F2 VSS duration. The duration of VSS was also found to be correlated to the overall vowel duration in /Cid/ contexts. Discussion focuses on the analysis and application of VSS in acoustic studies of normal and disordered speech production.
The feasibility and performance of an embedded RPE (ERPE) scheme based on multistage coding is investigated. The coding efficiency of second and subsequent stages depends on the spectral envelope difference between th...
详细信息
The feasibility and performance of an embedded RPE (ERPE) scheme based on multistage coding is investigated. The coding efficiency of second and subsequent stages depends on the spectral envelope difference between the original speech and the error signal at each stage whereas re-use of LPC parameters derived from the original speech depends on the corresponding LPC spectral difference. Suitable measures of spectral difference are defined and simulation shows that both decrease with the perceptual weighting factor. The ERPE system requires little extra coding complexity and can be simplified further by using a partial phase adaptation procedure with marginal loss of SNR performance. The simulated ERPE system shows graceful reduction of reconstructed speech quality for bit rates from 14.8 to 6.4 kb/s in 4.2 kb/s steps.
linear prediction parameters within CELP coders are commonly represented by line spectral pairs (LSP), giving stable filters and efficient coding. However, LSP manipulation can also alter the frequencies of the repres...
详细信息
linear prediction parameters within CELP coders are commonly represented by line spectral pairs (LSP), giving stable filters and efficient coding. However, LSP manipulation can also alter the frequencies of the represented signals. The authors use computationally efficient LSP manipulation to enhance the intelligibility of speech degraded by acoustic interference.
In this paper, an on-line signature verification scheme based on linear Prediction coding (LPC) cepstrum and neural networks is proposed. Cepstral coefficients derived from linear predictor coefficients of the writing...
详细信息
In this paper, an on-line signature verification scheme based on linear Prediction coding (LPC) cepstrum and neural networks is proposed. Cepstral coefficients derived from linear predictor coefficients of the writing trajectories are calculated as the features of the signatures. These coefficients are used as inputs to the neural networks. A number of single-output multilayer perceptrons (MLP's), as many as the number of words in the signature, are equipped for each registered person to verify the input signature. If the summation of output values of all MLP's is larger than verification threshold, the input signature is regarded as a genuine signature;otherwise, the input signature is a forgery. Simulations show that this scheme can detect the genuineness of the input signatures from our test database with an error rate as low as 4%.
This paper introduces noncausal all-pole models that are capable of efficiently capturing both the magnitude and phase information of voiced speech, It is shown that noncausal all-pole filter models are better able to...
详细信息
This paper introduces noncausal all-pole models that are capable of efficiently capturing both the magnitude and phase information of voiced speech, It is shown that noncausal all-pole filter models are better able to match both magnitude and phase information and are particularly appropriate for voiced speech due to the nature of the glottal excitation. By modeling speech in the frequency domain, the standard difficulties that occur when using noncausal all-pole filters are avoided. Several algorithms for determining the model parameters based on frequency-domain information and the masking effects of the ear are described. Our work suggests that high-quality voiced speech can be produced using a 14th-order noncausal all-pole model.
Low-delay techniques are proposed for coding 7 kHz speech using subband code-excited linear predictive coding (CELP). The use of separate and joint index codebooks is compared. Specifically, the joint-index-subband CE...
详细信息
Low-delay techniques are proposed for coding 7 kHz speech using subband code-excited linear predictive coding (CELP). The use of separate and joint index codebooks is compared. Specifically, the joint-index-subband CELP (JISBC) algorithm is found to provide good quality with processing delay in the range 2.375-3.375 ms at corresponding bit rates of 16-8 k bit/s.
暂无评论