Background: Traditional time synchronous (TS) parametric speech coders [1-3] Cannot Currently produce speech of toll quality owing to the inaccurate modelling of perceptually important speech transitions and lack of a...
详细信息
Background: Traditional time synchronous (TS) parametric speech coders [1-3] Cannot Currently produce speech of toll quality owing to the inaccurate modelling of perceptually important speech transitions and lack of accurate speech parameter analysis. To improve their performance pitch synchronous (PS) parametric speech coders such as the PS SB-LPC [4] and 1-MELP [5] have been developed which operate off a pitch cycle waveform (PCW) basis. The differences between TS and PS coder types are demonstrated in Fig. I when applied to the voicing classification Of input speech. In Fig. I the TS coder may be incorrect in its classification (Fig. 1d) of the voicing content of the speech signal at points 3 and 4. At point 3 the segment is too short compared to window length. at point 4 the TS method does not provide the necessary transition time accuracy. Because of its smaller analysis window the PS method (Fig. 1c) should be able to capture the finer detail and with an appropriate voicing classification scheme be able to classify the first and third speech segments as voiced. Many of the standard metrics used for analysis in TS coders are based off periodicity such as autocorrelation and AMDF: as these techniques require several cycles of speech they cannot be used with great accuracy for voicing analysis of single cycles. Non-periodic metrics such as peakiness and zero crossing can be applied to single cycles. Peakiness is a measure of how the energy of a signal is spread. We decided to employ peakiness since it was found that its phase behaviour changes predictively when the signal is hand limited. Also. as we utilise phase rather than Periodicity. the technique is more robust to the effects of incorrect pitch detection and irregular pitch variation.
This paper presents a new Analysis-by-Synthesis (AbS) technique for joint optimization of the excitation and model parameters based on minimizing the closed loop synthesis error instead of the linear prediction error....
详细信息
ISBN:
(纸本)0780374029
This paper presents a new Analysis-by-Synthesis (AbS) technique for joint optimization of the excitation and model parameters based on minimizing the closed loop synthesis error instead of the linear prediction error. By minimizing the synthesis error, the analysis and synthesis stages become more compatible. Using a gradient search in the root domain, model parameters for a given excitation are optimized to minimize the error between the original and the synthesized speech. Since the optimization starts from the LPC solution, the synthesis error is guaranteed to be lower than that obtained using the LPC coefficients. For multipulse LPC, there is a 0.5-1 dB improvement in the segmental SNR for male and female speakers over 4 to 6 second long sentences. Listening tests and objective MOS scores confirm the improved speech quality. By adding an extra optimization step, the technique can be incorporated into the LPC, multi-pulse LPC and CELP-type speechcoders.
暂无评论