Spectral dynamics have attracted the attention of researchers in speech recognition for a long time. As part of the speech feature vector they are found to be useful and hence are almost part of any feature extraction...
详细信息
Spectral dynamics have attracted the attention of researchers in speech recognition for a long time. As part of the speech feature vector they are found to be useful and hence are almost part of any feature extraction algorithm for speech recognition. However, the usual cepstral dynamics do not directly reflect the dynamics of the speech spectrum, as they are extracted from cepstral parameters. In this paper we show that dynamic parameters obtained directly from the speech spectrum can perform better under low-SNR noisy speech conditions, in comparison to the conventional dynamic cepstral parameters. Results on a compact set of the Aurora task have been reported.
CELP (code-excited linearpredictive), one of the speech coding techniques, is a part of the MPEG-4 standard. The distinct feature of the MPEG-4 CELP is that it has the bitrate and bandwidth scalability for different ...
详细信息
ISBN:
(纸本)0780381858
CELP (code-excited linearpredictive), one of the speech coding techniques, is a part of the MPEG-4 standard. The distinct feature of the MPEG-4 CELP is that it has the bitrate and bandwidth scalability for different users with different network bandwidth. However, subjective experiments show that the scalable mechanism of the MPEG-4 CELP does not perform well at low core bitrates, e.g., 3.85 kbps. In this paper, we propose a method to generate the enhancement information that it better suited for low core bitrates. Experimental results show that the proposed approach has a better coded quality or such a situation.
This paper examines the effects of postfiltering techniques on the performance of the line spectrum frequency (LSF) parameters for low bit rate speech coders in tandem connections. The speech coder platform consists o...
详细信息
This paper examines the effects of postfiltering techniques on the performance of the line spectrum frequency (LSF) parameters for low bit rate speech coders in tandem connections. The speech coder platform consists of a mixed multiband excitation (MMBE) linear predictive coding (LPC) algorithm, that encodes voiced frames at 1.75 kb/s and unvoiced frames at 0.4 kb/s. The analysis is performed for the well known adaptive spectral enhancement (ASE) technique and a recently reported scheme, called spectral envelope restoration combined with noise reduction (SERNR) postfilter, using the same MMBE platform. Spectral distortions and percentages of outliers of a switched predictive LSF vector quantiser, when the speech coder is operating in tandem connections, are presented. These results, along with subjective listening tests, show that the SERNR technique is clearly superior to the ASE postfilter in tandem connection situations.
The digital waveguide mesh is a technique used in the modelling of room acoustics and musical instruments. The paper details a project that applies the theory of waveguide mesh acoustic modelling to the production of ...
详细信息
The digital waveguide mesh is a technique used in the modelling of room acoustics and musical instruments. The paper details a project that applies the theory of waveguide mesh acoustic modelling to the production of human-like vowel sounds. A 2D software mesh model is created that approximates the shape of the vocal tract in different vowel positions, and a glottal flow input is applied. The resulting signal bears similar resonant frequencies or formants to that of recorded speech. Recommendations are made towards extending the model to include some of the more complex features of the mouth, potentially constructing an acoustical model of the human vocal tract capable of creating speech sounds of increased naturalness.
The parametric coders provide a good communication quality at low bit rate. Efficient encoding of variable dimension harmonic spectral envelope is an essential task in parametric speech coders. In this paper, we propo...
详细信息
ISBN:
(纸本)0780376633
The parametric coders provide a good communication quality at low bit rate. Efficient encoding of variable dimension harmonic spectral envelope is an essential task in parametric speech coders. In this paper, we propose an efficient vector quantization (VQ) scheme with perception consideration to improve the performance of parametric speech coders. With the benefit of reduction in dimension, the computational complexity of spectral envelope VQ (SEVQ) has been reduced while the speech quality is retained. Experimental results show that the proposed perceptual SEVQ method significantly reduces the computational complexity of SEVQ by a factor of 9.
A novel linear Prediction (LPC) based Automatic Speaker Identification (ASI) technique employing multiple representations of the LPC is presented. The proposed ASI system has two modes namely, the encoding mode, and t...
详细信息
A novel linear Prediction (LPC) based Automatic Speaker Identification (ASI) technique employing multiple representations of the LPC is presented. The proposed ASI system has two modes namely, the encoding mode, and the Speaker Identification (SI) mode. During the encoding mode, otherwise known as the training mode, the linear Prediction Coefficients (LPC) are extracted for each speaker as speech features. Multiple Representation Split Vector Quantization (MRSVQ) is employed to form representative codebooks corresponding to each representation, for each speaker. During the SI (running) mode, the ASI system identifies the codebooks of the speaker in the database that best matches the LPC extracted from the speech signal of the unknown speaker. The synthesized all pole vocal tract transfer function is used as a measure of vocal tract for ASI. Employing the normalized vocal tract transfer function error measure, the proposed technique is consistently found to obtain enhanced ASI accuracy in comparison with vector quantization employing existing LPC representation, at the expense of a modest increase in computational complexity. Our ASI technique can be used in a stand-alone system or as part of an ASI environment.
The paper describes and analyzes the different problems of information encoding in digital communication and storage systems. The common and sufficiently full structure scheme of these systems as the cascade connectio...
详细信息
The paper describes and analyzes the different problems of information encoding in digital communication and storage systems. The common and sufficiently full structure scheme of these systems as the cascade connection of six encoding methods is discussed. Some main characteristics of delta modulation, Reed-Solomon codes and transmission (line) codes are also discussed.
The paper presents a new method for all-pole model estimation based on minimization of the weighted mean square error in the sampled spectral domain. Due to discrete nature of the proposed distance measure, emphasis c...
详细信息
The paper presents a new method for all-pole model estimation based on minimization of the weighted mean square error in the sampled spectral domain. Due to discrete nature of the proposed distance measure, emphasis can be put on an arbitrary set of spectral samples what can greatly improve the model accuracy for periodic signals. Weighting can also be applied to improve the fitting in certain spectral regions according to any desired fidelity criterion. Iterative algorithm for determination of the optimal model is proposed and an exceptionally fast convergence rate is demonstrated. Accuracy of the estimation algorithm is verified on an example of a synthetic vowel for a broad range of pitch frequencies.
Several windows are designed following an optimization procedure, with the target application being for linear prediction (LP) analysis within the ETSI adaptive multi-rate (AMR) coder. The goal is to explore the possi...
详细信息
Several windows are designed following an optimization procedure, with the target application being for linear prediction (LP) analysis within the ETSI adaptive multi-rate (AMR) coder. The goal is to explore the possibility of reducing the lengths of the windows so as to decrease computation. In addition, several windows are designed that are placed at different shifted positions with respect to the original with the objective of lowering the coding delay. It is shown that the alternative windows produce a similar perceptual quality with respect to the original, with the added benefits of reduced computational cost and lowered coding delay.
Humans interact with others in several ways, such as speech, gesture, eye contact etc. Among them, speech is the most effective way of communication through which people can readily exchange information without the ne...
详细信息
ISBN:
(纸本)0780379802
Humans interact with others in several ways, such as speech, gesture, eye contact etc. Among them, speech is the most effective way of communication through which people can readily exchange information without the need for any other tool. Emotions color the speech, and can make the meaning more complex and tell about how it is said. A Mandarin speech based emotion classification method is presented. Five basic human emotions, anger, boredom, happiness, neutral and sadness, are investigated. The features extracted include 16 LPC (linearpredictive cepstrum) coefficients and 20 MFCC (Mel-frequency cepstral coefficients) components, and the presented recognizer is based on two statistical pattern recognition techniques, the minimum-distance method and the nearest class mean method. For minimum-distance emotion recognition, an average accuracy of 79.1% is obtained. For the nearest class mean emotion recognition, higher accuracy of 89.1% is achieved.
暂无评论