CELP (code-excited linearpredictive), one of the speech coding techniques, is a part of the MPEG-4 standard. The distinct feature of the MPEG-4 CELP is that it has the bitrate and bandwidth scalability for different ...
详细信息
ISBN:
(纸本)0780381858
CELP (code-excited linearpredictive), one of the speech coding techniques, is a part of the MPEG-4 standard. The distinct feature of the MPEG-4 CELP is that it has the bitrate and bandwidth scalability for different users with different network bandwidth. However, subjective experiments show that the scalable mechanism of the MPEG-4 CELP does not perform well at low core bitrates, e.g., 3.85 kbps. In this paper, we propose a method to generate the enhancement information that it better suited for low core bitrates. Experimental results show that the proposed approach has a better coded quality or such a situation.
This paper examines the effects of postfiltering techniques on the performance of the line spectrum frequency (LSF) parameters for low bit rate speech coders in tandem connections. The speech coder platform consists o...
详细信息
This paper examines the effects of postfiltering techniques on the performance of the line spectrum frequency (LSF) parameters for low bit rate speech coders in tandem connections. The speech coder platform consists of a mixed multiband excitation (MMBE) linear predictive coding (LPC) algorithm, that encodes voiced frames at 1.75 kb/s and unvoiced frames at 0.4 kb/s. The analysis is performed for the well known adaptive spectral enhancement (ASE) technique and a recently reported scheme, called spectral envelope restoration combined with noise reduction (SERNR) postfilter, using the same MMBE platform. Spectral distortions and percentages of outliers of a switched predictive LSF vector quantiser, when the speech coder is operating in tandem connections, are presented. These results, along with subjective listening tests, show that the SERNR technique is clearly superior to the ASE postfilter in tandem connection situations.
The digital waveguide mesh is a technique used in the modelling of room acoustics and musical instruments. The paper details a project that applies the theory of waveguide mesh acoustic modelling to the production of ...
详细信息
The digital waveguide mesh is a technique used in the modelling of room acoustics and musical instruments. The paper details a project that applies the theory of waveguide mesh acoustic modelling to the production of human-like vowel sounds. A 2D software mesh model is created that approximates the shape of the vocal tract in different vowel positions, and a glottal flow input is applied. The resulting signal bears similar resonant frequencies or formants to that of recorded speech. Recommendations are made towards extending the model to include some of the more complex features of the mouth, potentially constructing an acoustical model of the human vocal tract capable of creating speech sounds of increased naturalness.
The parametric coders provide a good communication quality at low bit rate. Efficient encoding of variable dimension harmonic spectral envelope is an essential task in parametric speech coders. In this paper, we propo...
详细信息
ISBN:
(纸本)0780376633
The parametric coders provide a good communication quality at low bit rate. Efficient encoding of variable dimension harmonic spectral envelope is an essential task in parametric speech coders. In this paper, we propose an efficient vector quantization (VQ) scheme with perception consideration to improve the performance of parametric speech coders. With the benefit of reduction in dimension, the computational complexity of spectral envelope VQ (SEVQ) has been reduced while the speech quality is retained. Experimental results show that the proposed perceptual SEVQ method significantly reduces the computational complexity of SEVQ by a factor of 9.
A novel linear Prediction (LPC) based Automatic Speaker Identification (ASI) technique employing multiple representations of the LPC is presented. The proposed ASI system has two modes namely, the encoding mode, and t...
详细信息
A novel linear Prediction (LPC) based Automatic Speaker Identification (ASI) technique employing multiple representations of the LPC is presented. The proposed ASI system has two modes namely, the encoding mode, and the Speaker Identification (SI) mode. During the encoding mode, otherwise known as the training mode, the linear Prediction Coefficients (LPC) are extracted for each speaker as speech features. Multiple Representation Split Vector Quantization (MRSVQ) is employed to form representative codebooks corresponding to each representation, for each speaker. During the SI (running) mode, the ASI system identifies the codebooks of the speaker in the database that best matches the LPC extracted from the speech signal of the unknown speaker. The synthesized all pole vocal tract transfer function is used as a measure of vocal tract for ASI. Employing the normalized vocal tract transfer function error measure, the proposed technique is consistently found to obtain enhanced ASI accuracy in comparison with vector quantization employing existing LPC representation, at the expense of a modest increase in computational complexity. Our ASI technique can be used in a stand-alone system or as part of an ASI environment.
The paper describes and analyzes the different problems of information encoding in digital communication and storage systems. The common and sufficiently full structure scheme of these systems as the cascade connectio...
详细信息
The paper describes and analyzes the different problems of information encoding in digital communication and storage systems. The common and sufficiently full structure scheme of these systems as the cascade connection of six encoding methods is discussed. Some main characteristics of delta modulation, Reed-Solomon codes and transmission (line) codes are also discussed.
The paper presents a new method for all-pole model estimation based on minimization of the weighted mean square error in the sampled spectral domain. Due to discrete nature of the proposed distance measure, emphasis c...
详细信息
The paper presents a new method for all-pole model estimation based on minimization of the weighted mean square error in the sampled spectral domain. Due to discrete nature of the proposed distance measure, emphasis can be put on an arbitrary set of spectral samples what can greatly improve the model accuracy for periodic signals. Weighting can also be applied to improve the fitting in certain spectral regions according to any desired fidelity criterion. Iterative algorithm for determination of the optimal model is proposed and an exceptionally fast convergence rate is demonstrated. Accuracy of the estimation algorithm is verified on an example of a synthetic vowel for a broad range of pitch frequencies.
Several windows are designed following an optimization procedure, with the target application being for linear prediction (LP) analysis within the ETSI adaptive multi-rate (AMR) coder. The goal is to explore the possi...
详细信息
Several windows are designed following an optimization procedure, with the target application being for linear prediction (LP) analysis within the ETSI adaptive multi-rate (AMR) coder. The goal is to explore the possibility of reducing the lengths of the windows so as to decrease computation. In addition, several windows are designed that are placed at different shifted positions with respect to the original with the objective of lowering the coding delay. It is shown that the alternative windows produce a similar perceptual quality with respect to the original, with the added benefits of reduced computational cost and lowered coding delay.
Humans interact with others in several ways, such as speech, gesture, eye contact etc. Among them, speech is the most effective way of communication through which people can readily exchange information without the ne...
详细信息
ISBN:
(纸本)0780379802
Humans interact with others in several ways, such as speech, gesture, eye contact etc. Among them, speech is the most effective way of communication through which people can readily exchange information without the need for any other tool. Emotions color the speech, and can make the meaning more complex and tell about how it is said. A Mandarin speech based emotion classification method is presented. Five basic human emotions, anger, boredom, happiness, neutral and sadness, are investigated. The features extracted include 16 LPC (linearpredictive cepstrum) coefficients and 20 MFCC (Mel-frequency cepstral coefficients) components, and the presented recognizer is based on two statistical pattern recognition techniques, the minimum-distance method and the nearest class mean method. For minimum-distance emotion recognition, an average accuracy of 79.1% is obtained. For the nearest class mean emotion recognition, higher accuracy of 89.1% is achieved.
In our (knowledge-based) synthesis system [G. L. Jayavardhana Rama et al., 2002], we use single instances of basic-units, which are polyphones such as CV, VC, VCV, VCCV and VCCCV, where C stands for consonant and V fo...
详细信息
In our (knowledge-based) synthesis system [G. L. Jayavardhana Rama et al., 2002], we use single instances of basic-units, which are polyphones such as CV, VC, VCV, VCCV and VCCCV, where C stands for consonant and V for vowel. These basic-units are recorded in an isolated manner from a speaker and not from continuous speech or carrier-words. Modification of the pitch, amplitude and duration of basic-units is required in our speech synthesis system [G. L. Jayavardhana Rama et al., 2002] to ensure that the overall characteristics of the concatenated units matches with the true characteristic of the target word or sentence. Duration modification is carried out on the vowel parts of the basic-unit leaving the consonant portion in the basic-unit intact. Thus, we need to segment these polyphones into consonant and vowel parts. When the consonant present in any basic-unit is a plosive or fricative, the energy based method is good enough to segment the vowel and consonant parts. However, this method fails when there is a co-articulation between the vowel and the consonant. We propose the use of oriented principal component analysis (OPCA) to segment the co-articulated units. The test feature vectors (LPC-cepstrum & mel-cepstrum) are projected on the consonant and vowel subspaces. Each of these subspaces are represented by generalized eigenvectors obtained by applying OPCA on the training feature vectors. Our approach successfully segments co-articulated basic-units.
暂无评论