Both linear predictive coding (LPC) and mel scale frequency cepstral coefficient (MFCC) analysis, the most common techniques for speech recognition signal processing, make the assumption that the speech signal is stat...
详细信息
Both linear predictive coding (LPC) and mel scale frequency cepstral coefficient (MFCC) analysis, the most common techniques for speech recognition signal processing, make the assumption that the speech signal is stationary for some analysis window and produce a representation based upon the "stationary" frequency content within the window. This work uses a technique based upon Cohen's (1989) class of generalized time frequency representations (TFR) to produce selected frequency representations that are not based upon an assumption of stationarity. This representation is used in a speech recognition system to produce improved accuracy. The proposed approach requires a kernel design to specify the attributes of the representations. The considerations used for analyzing speech signals and the resulting attributes are discussed. Comparisons with standard analysis techniques are presented. The significant computational requirements are also discussed.
The effect of filtering the time trajectories of spectral envelopes on speech intelligibility was investigated. Since the LPC cepstrum forms the basis of many automatic speech recognition systems, the authors filtered...
详细信息
The effect of filtering the time trajectories of spectral envelopes on speech intelligibility was investigated. Since the LPC cepstrum forms the basis of many automatic speech recognition systems, the authors filtered time trajectories of the LPC cepstrum of speech sounds, and the modified speech was reconstructed after the filtering. For processing, they applied low-pass, high-pass and band-pass filters. The accuracy results from the perceptual experiments for Japanese syllables show that speech intelligibility is not severely impaired as long as the filtered spectral components have 1) a rate of change faster than 1 Hz when high-pass filtered, 2) a rate of change slower than 24 Hz when low-pass filtered, and 3) a rate of change between 1 and 16 Hz when band-pass filtered.
This paper presents a new strategy to encode the LPC spectral envelope of speech. The proposed scheme uses an interpolation-based differential vector coding of the LSF parameters in order to better track the temporal ...
详细信息
This paper presents a new strategy to encode the LPC spectral envelope of speech. The proposed scheme uses an interpolation-based differential vector coding of the LSF parameters in order to better track the temporal variations of the speech short-time spectral envelope. Two consecutive sets of LSF parameters are simultaneously encoded during each speech frame. Simulation results show major improvements over techniques that vector quantize a single set of LSF parameters per frame.
This paper introduces a new parametric formulation for the line spectrum pairs representation. Due to its robustness against to quantization, LSPs are widely utilized as an alternative of the LPC parameters. In the co...
详细信息
ISBN:
(纸本)0780336828
This paper introduces a new parametric formulation for the line spectrum pairs representation. Due to its robustness against to quantization, LSPs are widely utilized as an alternative of the LPC parameters. In the conventional method, the explicit formula having a fixed order has been employed for computing the LSP parameters where the vocal track information is embedded on. Subsequently, the LSP parameters are quantized into a priori assigned number of bits. To provide the flexibility on bit allocation for the LSP parameters, this paper proposes a new explicit LSP representation such that the spectral envelope component is represented in terms of the reduced number of LSPs without causing any major spectral distortions. This invokes the reduction on the bit allocation for the LSP parameters, and provides the ability of quantizing the spectral envelope at a variable bit rate depending on the characteristics of the framed speech. Simulation results are presented to show the validity of the proposed formulation.
The application of the sample-selective LPC method in standard CELP coder, U.S.A. FED STD 1016 4.8 kb/s, in sense of decreasing LPC spectral degradation compared to the standard LPC, methods is considered in the paper...
详细信息
The application of the sample-selective LPC method in standard CELP coder, U.S.A. FED STD 1016 4.8 kb/s, in sense of decreasing LPC spectral degradation compared to the standard LPC, methods is considered in the paper. Comparative experimental analysis is done referred to the results of three different spectral measures related to the RMS LOG spectral measure: likelihood ratios, cosh measure and cepstral distance. Presented experimental analysis justify the use of the proposed sample-selective LPC method in standard CELP speech coder.
A large number of parameters, including pitch, LPCC, /spl Delta/LPCC, PARCOR, MFCC, /spl Delta/MFCC, and residual cepstrum (RCEP) were extracted from speech signals and their effectiveness for text-independent speaker...
详细信息
A large number of parameters, including pitch, LPCC, /spl Delta/LPCC, PARCOR, MFCC, /spl Delta/MFCC, and residual cepstrum (RCEP) were extracted from speech signals and their effectiveness for text-independent speaker identification was evaluated. In addition, the usefulness of two signal processing techniques, preemphasis and cepstral weighting, was also studied. The VQ-based speaker recognition method with codebooks fine-tuned by LVQ algorithm was used. It was shown that both LPCC and MFCC are effective representations, for smaller number of parameters, LPCC representation performs better but is surpassed by MFCC if the analysis order is larger. Pitch is an independent parameter so that it can be used jointly with other spectral features. In an evaluation experiment, the correct identification rate for 112 male speakers with test utterances of less than one second reached 98.2%.
An internal study has been carried out at ESTEC/WSP (Onboard Image and Signal Processing Section) about compression of synthetic optical multispectral images which will reflect characteristics of future satellites'...
详细信息
An internal study has been carried out at ESTEC/WSP (Onboard Image and Signal Processing Section) about compression of synthetic optical multispectral images which will reflect characteristics of future satellites' imagers. The images produced by these instruments are composed by up to 15 bands in the range of 400-1050 nm, with radiometric resolutions of 8-16 bpp and a spatial resolution of around 250-300 m. The aim of this study is to evaluate the effectiveness of a fast and simple onboard data compression process to reduce the amount of data to be transmitted to ground stations or to be stored on the satellites' on-board recorders.
This paper presents a high quality speech coder based on the multi-band excitation (MBE) model operating at 2.4 kb/s and 1.2 kb/s. The features of this coder mainly focus on two aspects. One is an accurate and reliabl...
详细信息
ISBN:
(纸本)0780331923
This paper presents a high quality speech coder based on the multi-band excitation (MBE) model operating at 2.4 kb/s and 1.2 kb/s. The features of this coder mainly focus on two aspects. One is an accurate and reliable pitch estimation and voiced/unvoiced decision algorithm. The other is the efficient quantization of the variable dimension spectral amplitude vector. Besides representing the spectral envelope information using an all-pole model, we also encode error vectors of the spectral amplitude vector at several important positions. Informal listening tests indicate that the speech quality of this coder at 2.4 kb/s is comparable and even superior to the INMARSAT-M IMBE 4.15 kb/s coder. The coder has been implemented in real-time on a single TMS320C31 floating point DSP.
The transmission of spectral information consumes a large portion of total bit rate in medium-to-low bit rate speech coding. The conventional coding methods of LSP parameters generate redundant spectral information or...
详细信息
The transmission of spectral information consumes a large portion of total bit rate in medium-to-low bit rate speech coding. The conventional coding methods of LSP parameters generate redundant spectral information or spectral distortion by the fixed update rate of LSP parameters independent of the order of coefficients and phonetic context. We propose a multiple type frame segmentation (MTFS) method which allows various types of two-dimensional segmentation of speech frames to save the transmission rate of the LSP parameters without increasing the spectral distortion. The intra-frame spectral distortion (IFSD) is defined to measure the spectral distortion of the reconstructed spectrum. The proposed method generates a less distorted spectrum with fewer bits compared with the conventional single type frame segmentation (STFS) method.
暂无评论