Several methods are presented for the objective speech quality evaluation of narrowband LPC vocoders, based on a framework that we proposed at the 1976 ICASSP conference. In each method, the error in short-term spectr...
详细信息
Several methods are presented for the objective speech quality evaluation of narrowband LPC vocoders, based on a framework that we proposed at the 1976 ICASSP conference. In each method, the error in short-term spectral behavior between vocoded speech and the original is computed once every 10 ms. These errors are appropriately weighted and averaged over an utterance to produce a single objective score. Several short-term error measures, and time-weighting and averaging techniques are investigated. We evaluate the objective methods by correlating the resulting objective scores with formal subjective speech quality judgments. High correlations obtained indicate the usefulness of these methods.
The results of a time domain segmentation algorithm are applied to a variable frame rate (VFR) transmission system. Three speech signal parameters (short-term absolute average of the speech signal, absolute average of...
详细信息
The results of a time domain segmentation algorithm are applied to a variable frame rate (VFR) transmission system. Three speech signal parameters (short-term absolute average of the speech signal, absolute average of the differenced signal, and the ratio of these two) are measured pitch-synchronously and interpolated to short frames of 5 to 10 ms duration. From these frames, a distance measure is evaluated leading towards a "segment length function" which indicates the time distance from a given frame to exceed a parameter change threshold. This function is directly converted into a local interval sequence for the frames of a speech transmission system. The procedure has been successfully applied to a transmission system which uses a pitch-synchronous time-domain Karhunen-Loève expansion (KLE) of the speech signal.
A method for LPC analysis in a transformed domain (LPCTD) has been developed theoretically and studied experimentally in the Walsh-Hadamard domain (LPCWHD) for low-bit- rate coding of speech signals . Speech signals i...
详细信息
A method for LPC analysis in a transformed domain (LPCTD) has been developed theoretically and studied experimentally in the Walsh-Hadamard domain (LPCWHD) for low-bit- rate coding of speech signals . Speech signals in the Walsh-Hadamard domain have been modelled by their largest variance coefficients and a few prediction coefficients which represent the remaining coefficients. Determination of the prediction coefficients has been based on the correlation between the spectral coefficients. Intelligible speech at bit-rates of 8 kb/s and 4 kb/s was achieved when 16 and 64 point Walsh-Hadamard transforms were used, respectively. At the latter bit-rate the quality was significantly improved when unvoiced sounds were coded seperately by their largest variance coefficients. The main advantage of LPCWHD system is its simplicity which can lead to a far less complex implementation than that of vocoder systems.
This paper presents a time domain technique for the generation of speech which offers significant advantages over current formant synthesis and linearpredictive coder (LPC) techniques. A set of basis functions in con...
详细信息
This paper presents a time domain technique for the generation of speech which offers significant advantages over current formant synthesis and linearpredictive coder (LPC) techniques. A set of basis functions in conjunction with a time-compression (and expansion) operation is shown to span the parameter space of the vocal tract model. The relationship between these basis functions and the formant synthesis parameters is derived and graphically illustrated. The 'waveform synthesis' technique is particularly well suited for microprocessor implementation and as shown in the paper two D-A converters in conjunction with a standard microprocessor and associated ROM, RAM and I/O can be used to implement this technique.
The U.S. Government has developed a real-time 2400 bps linearpredictive Coded (LPC) voice algorithm which was designed to provide maximum intelligibility and quality within the time and accuracy limitations imposed b...
详细信息
The U.S. Government has developed a real-time 2400 bps linearpredictive Coded (LPC) voice algorithm which was designed to provide maximum intelligibility and quality within the time and accuracy limitations imposed by modern high-speed minicomputers. The algorithm which resulted provides excellent intelligibility and quality when transmitted over an ideal channel. However, the speech is significantly degraded in an error environment. This paper describes several techniques for reducing the effect of channel bit errors on the synthesized speech. These techniques cause no measurable degradation of the LPC speech transmitted over an error-free channel and they require less than a one percent increase in computer execution time.
The purpose of this study was to assess the effects of LPC/CVSD tandem connections and to investigate ways of improving performance. In the case of the CVSD-to-LPC connection, the CVSD quantizing noise severely affect...
详细信息
The purpose of this study was to assess the effects of LPC/CVSD tandem connections and to investigate ways of improving performance. In the case of the CVSD-to-LPC connection, the CVSD quantizing noise severely affects the estimate of LPC coefficients, thereby distorting the spectral representation. An averaging technique was shown to reduce the average spectral distance from the noiseless case; however, no statistically significant perceptual advantage was measured. In the LPC-to-CVSD connection, it was found that modest improvements can be measured both in signal-to-noise ratio and perceptually, by modifying the phase in LPC synthetic speech before CVSD coding.
Application of a Maximum A Posteriori (MAP) estimation procedure in estimating the LPC coefficients from speech waveforms degraded by additive random noise generally leads to solving a set of non-linear equations whic...
详细信息
Application of a Maximum A Posteriori (MAP) estimation procedure in estimating the LPC coefficients from speech waveforms degraded by additive random noise generally leads to solving a set of non-linear equations which are computationally undesirable. However, an attempt to approximate the true MAP estimation procedure leads to two iterative methods that require solving only sets of linear equations. These two methods have been applied to real speech data degraded by additive white Gaussian noise, and in this paper some preliminary results are discussed.
A statistical correlation study between 18 objective quality measures and a data base of subjective quality measures from the Paired Acceptability Rating Method (PARM) was done for nine communication systems, includin...
详细信息
A statistical correlation study between 18 objective quality measures and a data base of subjective quality measures from the Paired Acceptability Rating Method (PARM) was done for nine communication systems, including waveform coders, channel vocoders, linearpredictive coders, and adaptive predictive coders. The results of this study show which of the candidate objective measures are most effective in predicting the subjective results. The measure which was found to be most effective over all systems was a gain weighted L 2 spectral distance metric which had a correlation coefficient of -.83. Supported by DCA/DCEC via the RADC Post Doctoral Program.
Recent work of Olive and Spickenagel has shown that pseudo-area parameters used for LPC synthesis can be linearly interpolated between dyad boundaries without producing excessive distortion in synthetic speech. This s...
详细信息
Recent work of Olive and Spickenagel has shown that pseudo-area parameters used for LPC synthesis can be linearly interpolated between dyad boundaries without producing excessive distortion in synthetic speech. This study investigates whether such interpolation can be done equally successfully on the power spectrum of the speech waveform. The spectrum is of special interest because speech can be synthesized in real time from spectral parameters on readily available programmable digital filters. Our results show that the distortion introduced by dyadic interpolation of spectrum is perceptually significant but it can be reduced considerably by using an additional point within the dyad boundaries for interpolation. The reasons for good quality of speech synthesized from dyadically-interpolated area parameters were also investigated. It was found that formant frequency movements are reproduced fairly accurately after dyadic interpolation. Formant bandwidths however are not reproduced accurately but the bandwidth errors are not as important subjectively.
暂无评论