This paper presents a technique using artificial neural networks (ANNs) for speaker identification that results in a better success rate compared to other techniques. The technique used in this paper uses both power s...
详细信息
This paper presents a technique using artificial neural networks (ANNs) for speaker identification that results in a better success rate compared to other techniques. The technique used in this paper uses both power spectral densities (PSDs) and linear prediction coefficients (LPCs) as feature inputs to a self organizing feature map to achieve a better identification performance. Results for speaker identification with different methods are presented and compared.
A high quality audio coding scheme based on a novel hybrid algorithm combining warped linear prediction (WLP) and subband coding is proposed. The proposed codec is capable of providing a high quality audio output at a...
详细信息
ISBN:
(纸本)1864354518
A high quality audio coding scheme based on a novel hybrid algorithm combining warped linear prediction (WLP) and subband coding is proposed. The proposed codec is capable of providing a high quality audio output at a low bit-rate. Subjective tests have shown that the proposed codec is able to provide a performance comparable to that of the MPEG audio layer II while operating at a lower bit-rate.
Current parametric speech coding schemes can achieve high communications quality speech at bit rates in the range of 2.4 to 1.5 kbits/sec. Most schemes sample and quantise, at regular intervals, the "tracks in ti...
详细信息
Current parametric speech coding schemes can achieve high communications quality speech at bit rates in the range of 2.4 to 1.5 kbits/sec. Most schemes sample and quantise, at regular intervals, the "tracks in time" generated by the parameters of the speech production model. As a result, reconstructed "parameter tracks" do not evolve "smoothly" with time. Furthermore, no advantage is taken of the "linguistic event" nature of speech. In this paper, model parameter "time tracks" are split into non-overlapping speech "event" related segments. These segment based evolutions of model parameters are then vector quantised to provide at the receiver a smooth and subjectively meaningful reconstruction. Thus the paper presents an application of this generic segmental speech model quantisation approach to a 1.5 kbits/sec prototype interpolation coding (PIC) system. Results indicate that the proposed methodology can almost halve the bit rate of this PIC system while preserving overall recovered speech quality.
In this paper, we present a novel background noise coding scheme for variable rate speech coders. Existing approaches to noise coding at very low bit rates (i.e. below 1 kbps) fail to faithfully reproduce background n...
详细信息
In this paper, we present a novel background noise coding scheme for variable rate speech coders. Existing approaches to noise coding at very low bit rates (i.e. below 1 kbps) fail to faithfully reproduce background noise resulting in a degradation of the overall perceptual quality. In our approach, classification of the noise type is used to select the type of excitation to be used at the receiver. To illustrate the benefits of our scheme, we have modified the noise coding mode of the CDMA enhanced variable rate codec (EVRC) to include the proposed class-dependent noise excitation model. Evaluation tests have shown that we have improved the overall quality with the proposed noise coding scheme without an increase in bit rate.
Code-excited linear prediction coding with generalized pitch prediction (GPP-CELP) requires linear prediction filtering of the stochastic codebook output prior to addition of the adaptive codebook (ACE) component. The...
详细信息
Code-excited linear prediction coding with generalized pitch prediction (GPP-CELP) requires linear prediction filtering of the stochastic codebook output prior to addition of the adaptive codebook (ACE) component. The ACE component represents a sequence of past reconstructed samples passed through a low-pass filter to reflect the reduced pitch periodicity of the higher speech frequencies. The spectrum of the residual manifests broad peaks leading to significantly narrower distributions in the LPC parameter space. Additionally, the quantization error of the residual may be masked by the significantly greater energy of the ACE component. This work compares the quantization requirements for the information required to represent the time-varying LPC filter of the GPP-CELP coder with that of the classical CELP coder. With non-predictivecoding of the LPC information a bit-rate reduction from 20 bits/20 ms to 16 bits/20 ms appears feasible without introducing noticeable degradation due to quantization.
This paper deals with multi-stage vector quantization of line spectrum pair (LSP) parameters in wideband speech coders and discusses commonly used spectral distortion measures and their relation to the perceptual qual...
详细信息
This paper deals with multi-stage vector quantization of line spectrum pair (LSP) parameters in wideband speech coders and discusses commonly used spectral distortion measures and their relation to the perceptual quality of the speech coding.
The United States government has developed a new Federal Standard 2400 bps vocoding algorithm called MELP-mixed excitation linear prediction. This vocoder has a very acceptable voice quality under benign error-free ch...
详细信息
The United States government has developed a new Federal Standard 2400 bps vocoding algorithm called MELP-mixed excitation linear prediction. This vocoder has a very acceptable voice quality under benign error-free channel conditions. However, when subjected to high error conditions as could be experienced in tactical vehicular operations, amelioration techniques may be employed which take advantage of the underlying inter-frame residual redundancy of the MELP parameters themselves. This paper describes experiments conducted on the MELP vocoding algorithm in conjunction with Viterbi convolutional error decoding, and enhanced with maximum a posteriori techniques which utilize these redundancy statistics. Both hard and soft Viterbi decoding implementations are investigated in addition to turbo codes.
Identification of digitally modulated radio signals is one of the essential problems in intelligent radio links and radio monitoring systems. FSK signals are widely used for this purpose. An estimated instantaneous fr...
详细信息
Identification of digitally modulated radio signals is one of the essential problems in intelligent radio links and radio monitoring systems. FSK signals are widely used for this purpose. An estimated instantaneous frequency (IF) can be used as one of the identification parameters. LPC and DESA estimators for real signals are compared to frequency estimators obtained for complex signals. Autoregressive models are proposed as a method of precise estimation of time-varying frequency in a noisy environment. A hardware solution for radio signal acquisition in complex form is also presented.
The human voice is the most difficult musical instrument to simulate convincingly. Yet a great deal of progress has been made in voice coding, the parameterization and re-synthesis of a source signal according to an a...
详细信息
The human voice is the most difficult musical instrument to simulate convincingly. Yet a great deal of progress has been made in voice coding, the parameterization and re-synthesis of a source signal according to an assumed voice model. Source-filter models of the human voice, particularly linear predictive coding (LPC), are the basis of most low bit rate (speech) coding techniques in use today. This paper introduces a technique for coding the singing voice using LPC and prior knowledge of the musical score to aid in the process of encoding, reducing the amount of data required to represent the voice. This approach advances the singing voice closer towards a structured audio model in which musical parameters such as pitch, duration, and phonemes are represented orthogonally to the synthesis technique and can thus be modified prior to re-synthesis.
This paper presents a method for obtaining numerical estimates of high rate vector quantization (VQ) performance suitable for sources for which the PDF is not analytically available. In the proposed method, the VQ poi...
详细信息
This paper presents a method for obtaining numerical estimates of high rate vector quantization (VQ) performance suitable for sources for which the PDF is not analytically available. In the proposed method, the VQ point density is described from a Gaussian mixture model optimized for the data. Employing this method for LPC spectrum quantization, we obtain high rate expressions for both the average spectral distortion (SD) and the distribution function of the SD. We estimate the minimum bits required for a quantizer to obtain an average SD of 1 dB and the outlier statistics for that quantizer. We find that approximately 3 bits can be saved as compared to a 2-split LSF-based vector quantizer.
暂无评论