The audio quality, robustness and complexity issues of a novel mobile digital audio broadcast (DAB) scheme are addressed. The audio codec is based on a combination of subband coding (SBC) and multipulse excited linear...
详细信息
The audio quality, robustness and complexity issues of a novel mobile digital audio broadcast (DAB) scheme are addressed. The audio codec is based on a combination of subband coding (SBC) and multipulse excited linear predictive coding (MPLPC), where the bit allocation is dynamically adapted according to both the signal power in different subbands and a perceptual hearing model. Typically a segmental signal to noise ratio (SEGSNR) in excess of 30 dB associated with high fidelity (hi-fi) subjective quality was achieved for 2.67 bits/sample transmissions at a mono bit rate of 86 kbits/s. Four different source-matched forward error correction (FEC) schemes were investigated in order to explore the complexity, bit rate and robustness trade-offs. When using 4 bit/symbol 16-level star-constellation quadrature amplitude modulation (16-StQAM) the overall signalling rate became approximately 30 kBaud, accommodating two stereo DAB channels in a conventional 200 kHz analogue FM channel's bandwidth. The diversity assisted DAB scheme required a channel signal to noise ratio (SNR) of about 25 dB for unimpaired audio quality via the worst-case Rayleigh fading mobile channel, when the mobile speed was 30 mph and the propagation frequency was 1.5 GHz. In case of the stationary Gaussian scenario an SNR of about 20 dB was required.< >
Distance measures robust against noise disturbances are required for reliable recognition of noisy speech. The local signal-to-noise ratio (SNR) of degraded speech varies in a wide range and the characteristics of spe...
详细信息
Distance measures robust against noise disturbances are required for reliable recognition of noisy speech. The local signal-to-noise ratio (SNR) of degraded speech varies in a wide range and the characteristics of speech with low SNR tend to be lost. Pattern matching, however, is performed uniformly without taking the local SNR of each analysis frame into account. The behavior of representative LPC distance measures versus segmental SNR is investigated, which shows the necessity of accounting for the effect of the segmental SNR on the distance measure. A double autocorrelation analysis is proposed as a spectrum estimation method. A pattern matching method is also introduced in which the segmental SNR is taken into account as a weight. Experiments of isolated word recognition were performed. The results show the effectiveness of the proposed method.< >
Studies the cepstral coefficients as a suitable representation of the linear prediction filter for spectral coding purposes. Spectral coding methods in predictive speech coders are usually evaluated using the spectral...
详细信息
Studies the cepstral coefficients as a suitable representation of the linear prediction filter for spectral coding purposes. Spectral coding methods in predictive speech coders are usually evaluated using the spectral distance measure. The average spectral distance combined with a measure of the percentage of spectra with high distortion are used to predict the perceptual quality when quantizing the prediction filter. The authors show that the spectral distance is equivalent to a squared error in the cepstral domain. Methods for spectral quantization using vector quantization of cepstral coefficients are analyzed. Better results than for quantization of line spectrum frequencies are reported for both single-stage VQ at 11-14 bits as well as 2-stage VQ at 18-22 bits. It is concluded that the cepstral coefficients are the right representation for LPC spectral coding purposes.< >
This paper presents the M-LCELP (multi-mode learned code excited LPC) speech coder, which has been developed for the North American half-rate digital cellular systems. M-LCELP develops the following techniques to achi...
详细信息
This paper presents the M-LCELP (multi-mode learned code excited LPC) speech coder, which has been developed for the North American half-rate digital cellular systems. M-LCELP develops the following techniques to achieve high-quality synthetic speech at 4 kbps: (1) Multimode and multi-codebook coding, (2) Pitch lag differential coding with pitch tracking, (3) A two-stage joint design regular-pulse codebook with common phase structure in voiced frames, (4) An efficient vector quantization for LSP parameters, (5) An adaptive MA type comb filter to suppress excitation signal inter-harmonic noise. The MOS subjective test shows that 4.075 kbps M-LCELP synthetic speech quality is high, and that its quality is mostly equivalent to that for an 8 kbps North American full-rate VSELP coder.< >
A hybrid neural network is described. It consists of a Kohonen map and a perceptron. The hybrid is proposed firstly for speaker independent, isolated word recognition. However, it may also be used for other classifica...
详细信息
A hybrid neural network is described. It consists of a Kohonen map and a perceptron. The hybrid is proposed firstly for speaker independent, isolated word recognition. However, it may also be used for other classification problems. The novel idea in this system is the usage of a Kohonen map as the feature extractor which converts phonetic similarities of the speech frames into spatial adjacency in the map. This property simplifies the classification task. The system performance was evaluated for recognition of a limited number of Farsi words (numbers "zero" through "ten"). The overall performance of the recognizer showed to be 93.82%.< >
A text-independent voice recognition experiment was conducted using an artificial neural network. The speech data were collected from three different speakers uttering thirteen different words. Each word was repeated ...
详细信息
A text-independent voice recognition experiment was conducted using an artificial neural network. The speech data were collected from three different speakers uttering thirteen different words. Each word was repeated ten times. The speech data were then pre-processed for signal conditioning. A total of 12 feature parameters were obtained from Cepstral coefficients via a linear predictive coding (LPC). These feature parameters then served as inputs to the neural network for speaker classification. A standard two-layer feedforward neural network was trained to identify different feature sets associated with the corresponding speakers. The network was tested for the remaining unseen words in text-independent mode. The results were very promising with a voice recognition accuracy of more than 90%. The success rate could be increased by adding more utterances from each speaker.< >
Several techniques for speech coding at rates of 4 kb/s and lower require quantization of spectral magnitudes at a set of frequencies which are harmonics of the fundamental pitch period of the talker (for example: mul...
详细信息
Several techniques for speech coding at rates of 4 kb/s and lower require quantization of spectral magnitudes at a set of frequencies which are harmonics of the fundamental pitch period of the talker (for example: multiband excitation coding, sinusoidal transform coding, and time-frequency interpolation). The number of harmonic magnitudes to be quantized depends on the fundamental frequency value and hence is variable, changing from frame to frame. The variable number of components to be quantized makes it difficult to use fixed-dimension vector quantization for harmonic magnitude encoding. In this paper, we introduce a quantization technique called non-square transform vector quantization (NSTVQ) which uses a fixed-dimension vector quantizer combined with a variable-size non-square transform which maps the variable-dimension harmonic magnitude vector into a fixed-dimension vector. The optimal reconstruction procedure for non-square transforms is derived and shown to be equivalent to an optimal least-square estimation procedure. The proposed technique is evaluated experimentally as part of a new coding system called spectral excitation coding (SEC). The results are compared to an existing technique which estimates the spectral shape using all-pole modeling followed by vector quantization of the LSP parameters.
Addresses the question of how to extract the nonlinearities in speech with the prime purpose of facilitating coding of the residual signal in residual excited coders. The short-term prediction of speech in speech code...
详细信息
Addresses the question of how to extract the nonlinearities in speech with the prime purpose of facilitating coding of the residual signal in residual excited coders. The short-term prediction of speech in speech coders is extensively based on linear models, e.g. the linear predictive coding technique (LPC), which is one of the most basic elements in modern speech coders. This technique does not allow extraction of nonlinear dependencies. If nonlinearities are absent from speech the technique is sufficient, but if the speech contains nonlinearities the technique is inadequate. The authors give evidence for nonlinearities in speech and propose nonlinear short-term predictors that can substitute the LPC technique. The technique, called nonlinear predictive coding, is shown to be superior to the LPC technique. Two different nonlinear predictors are presented. The first is based on a second-order Volterra filter, and the second is based on a time delay neural network. The latter is shown to be the more suitable for speech coding applications.< >
The paper proposes a nonlinear cepstral equalization method for speech recognition. The method is based on AR modeling of clean speech power spectrum and the adding noise power to the speech. A noise ratio is introduc...
详细信息
ISBN:
(纸本)078031865X
The paper proposes a nonlinear cepstral equalization method for speech recognition. The method is based on AR modeling of clean speech power spectrum and the adding noise power to the speech. A noise ratio is introduced to provide a mechanism for adapting the reference template. An iterative algorithm is proposed to find a near optimal adaptation of reference cepstral parameters. Experiments showed that the proposed method is superior to the projection approach under severe noisy environment.< >
暂无评论