The authors consider signals originated from a sequence of sources. More specifically, the problems of segmenting such signals and relating the segments to their sources are addressed. This issue has wide applications...
详细信息
The authors consider signals originated from a sequence of sources. More specifically, the problems of segmenting such signals and relating the segments to their sources are addressed. This issue has wide applications in many fields. The report describes a resolution method that is based on an ergodic hidden Markov model (HMM), in which each HMM state corresponds to a signal source. The signal source sequence can be determined by using a decoding procedure (Viterbi algorithm or forward algorithm) over the observed sequence. Baum-Welch training is used to estimate HMM parameters from the training material. As an example of the multiple signal source classification problem, an experiment is performed on unknown speaker classification. The results show a classification rate of 79% for 4 male speakers. The results also indicate that the model is sensitive to the initial values of the ergodic HMM and that employing the long-distance LPC cepstrum is effective for signal preprocessing.
The recently adopted ITU-T G.729 8 kbps codec standard's performance is evaluated for digital cellular channels that are characterized by Rayleigh fading. Two forward error correction schemes (FEC) are studied. Th...
详细信息
The recently adopted ITU-T G.729 8 kbps codec standard's performance is evaluated for digital cellular channels that are characterized by Rayleigh fading. Two forward error correction schemes (FEC) are studied. They are the convolutional code based FEC and the Nordstrom Robinson based FEC. The effects of interleaving depth are investigated for both FEC schemes. Both flat fading as well as co-channel interference limited models are used in our study.
This paper introduces a new parametric formulation for the line spectrum pairs representation. Due to its robustness against to quantization, LSPs are widely utilized as an alternative of the LPC parameters. In the co...
详细信息
ISBN:
(纸本)0780336828
This paper introduces a new parametric formulation for the line spectrum pairs representation. Due to its robustness against to quantization, LSPs are widely utilized as an alternative of the LPC parameters. In the conventional method, the explicit formula having a fixed order has been employed for computing the LSP parameters where the vocal track information is embedded on. Subsequently, the LSP parameters are quantized into a priori assigned number of bits. To provide the flexibility on bit allocation for the LSP parameters, this paper proposes a new explicit LSP representation such that the spectral envelope component is represented in terms of the reduced number of LSPs without causing any major spectral distortions. This invokes the reduction on the bit allocation for the LSP parameters, and provides the ability of quantizing the spectral envelope at a variable bit rate depending on the characteristics of the framed speech. Simulation results are presented to show the validity of the proposed formulation.
This paper presents a new strategy to encode the LPC spectral envelope of speech. The proposed scheme uses an interpolation-based differential vector coding of the LSF parameters in order to better track the temporal ...
详细信息
This paper presents a new strategy to encode the LPC spectral envelope of speech. The proposed scheme uses an interpolation-based differential vector coding of the LSF parameters in order to better track the temporal variations of the speech short-time spectral envelope. Two consecutive sets of LSF parameters are simultaneously encoded during each speech frame. Simulation results show major improvements over techniques that vector quantize a single set of LSF parameters per frame.
We describe a voice transformation method which changes the source speaker's acoustic features to those of a target speaker. In the method acoustic features are divided into two parts, linear and nonlinear parts. ...
详细信息
ISBN:
(纸本)0780335554
We describe a voice transformation method which changes the source speaker's acoustic features to those of a target speaker. In the method acoustic features are divided into two parts, linear and nonlinear parts. linear parts are characterized by LPC cepstrum coefficients which are obtained from LP analysis. The nonlinear part, which represents the excitation signal, is modelled by the long-delay nonlinear predictor using a neural net. Conversion rules for the excitation signal are generated by the average pitch ratio and the mapping codebook, and those for LPC cepstrum coefficients are based on the orthogonal vector space conversion. In addition, the spectral envelope compensation is proposed to correct spectral distortion. In the transformed speech a listening test shows that the proposed method makes it possible to convert speaker's individuality while maintaining high quality.
The application of the sample-selective LPC method in standard CELP coder, U.S.A. FED STD 1016 4.8 kb/s, in sense of decreasing LPC spectral degradation compared to the standard LPC, methods is considered in the paper...
详细信息
The application of the sample-selective LPC method in standard CELP coder, U.S.A. FED STD 1016 4.8 kb/s, in sense of decreasing LPC spectral degradation compared to the standard LPC, methods is considered in the paper. Comparative experimental analysis is done referred to the results of three different spectral measures related to the RMS LOG spectral measure: likelihood ratios, cosh measure and cepstral distance. Presented experimental analysis justify the use of the proposed sample-selective LPC method in standard CELP speech coder.
A large number of parameters, including pitch, LPCC, /spl Delta/LPCC, PARCOR, MFCC, /spl Delta/MFCC, and residual cepstrum (RCEP) were extracted from speech signals and their effectiveness for text-independent speaker...
详细信息
A large number of parameters, including pitch, LPCC, /spl Delta/LPCC, PARCOR, MFCC, /spl Delta/MFCC, and residual cepstrum (RCEP) were extracted from speech signals and their effectiveness for text-independent speaker identification was evaluated. In addition, the usefulness of two signal processing techniques, preemphasis and cepstral weighting, was also studied. The VQ-based speaker recognition method with codebooks fine-tuned by LVQ algorithm was used. It was shown that both LPCC and MFCC are effective representations, for smaller number of parameters, LPCC representation performs better but is surpassed by MFCC if the analysis order is larger. Pitch is an independent parameter so that it can be used jointly with other spectral features. In an evaluation experiment, the correct identification rate for 112 male speakers with test utterances of less than one second reached 98.2%.
An internal study has been carried out at ESTEC/WSP (Onboard Image and Signal Processing Section) about compression of synthetic optical multispectral images which will reflect characteristics of future satellites'...
详细信息
An internal study has been carried out at ESTEC/WSP (Onboard Image and Signal Processing Section) about compression of synthetic optical multispectral images which will reflect characteristics of future satellites' imagers. The images produced by these instruments are composed by up to 15 bands in the range of 400-1050 nm, with radiometric resolutions of 8-16 bpp and a spatial resolution of around 250-300 m. The aim of this study is to evaluate the effectiveness of a fast and simple onboard data compression process to reduce the amount of data to be transmitted to ground stations or to be stored on the satellites' on-board recorders.
Both linear predictive coding (LPC) and mel scale frequency cepstral coefficient (MFCC) analysis, the most common techniques for speech recognition signal processing, make the assumption that the speech signal is stat...
详细信息
Both linear predictive coding (LPC) and mel scale frequency cepstral coefficient (MFCC) analysis, the most common techniques for speech recognition signal processing, make the assumption that the speech signal is stationary for some analysis window and produce a representation based upon the "stationary" frequency content within the window. This work uses a technique based upon Cohen's (1989) class of generalized time frequency representations (TFR) to produce selected frequency representations that are not based upon an assumption of stationarity. This representation is used in a speech recognition system to produce improved accuracy. The proposed approach requires a kernel design to specify the attributes of the representations. The considerations used for analyzing speech signals and the resulting attributes are discussed. Comparisons with standard analysis techniques are presented. The significant computational requirements are also discussed.
暂无评论