In this paper, a modification to the group vector quantization (GVQ) discriminative training algorithm is proposed to train VQ codebooks for closed set speaker identification. The proposed algorithm, referred to as mo...
详细信息
In this paper, a modification to the group vector quantization (GVQ) discriminative training algorithm is proposed to train VQ codebooks for closed set speaker identification. The proposed algorithm, referred to as modified GVQ (MGVQ), shifts the decision surfaces between speakers smoothly toward the Bayes limits. This is achieved by varying the learning rate during training iterations. The proposed MGVQ algorithm achieves higher speaker identification rate compared to the standard GVQ
This paper describes a new method for extraction of click evoked otoacoustic emissions (CEOAE), where the stimulus artifact is eliminated by the use of linear predictive coding (LPC). In this method, the prediction co...
详细信息
This paper describes a new method for extraction of click evoked otoacoustic emissions (CEOAE), where the stimulus artifact is eliminated by the use of linear predictive coding (LPC). In this method, the prediction coefficients are computed over the first samples of the click response, which is mainly formed by passive oscillations, and the unpredicted part of the remaining response is taken as the CEOAE signal. Preliminary tests were made with fifteen signals collected from normal hearing adults presenting stimulus artifacts in their responses. Results show the advantage of eliminating most of the stimulus artifact, while preserving a better signal-to-noise ratio than the standard nonlinear stimulus cancellation method.
This paper presents the recognition of Cantonese speech commands using a proposed neural fuzzy network with rule switches. By introducing a switch to each rule, the optimal number of rules can be learned. An improved ...
详细信息
This paper presents the recognition of Cantonese speech commands using a proposed neural fuzzy network with rule switches. By introducing a switch to each rule, the optimal number of rules can be learned. An improved genetic algorithm (GA) is proposed to train the parameters of the membership functions and the optimal rule set for the proposed neural fuzzy network. An application example of Cantonese command recognition in electronic books will be given to illustrate the merits of the proposed approach.
The quality of low bit-rate speech coders is reduced at transitions where speech spectral characteristics vary significantly, as usual speech parameter interpolation assumptions fail to correctly model such variations...
详细信息
The quality of low bit-rate speech coders is reduced at transitions where speech spectral characteristics vary significantly, as usual speech parameter interpolation assumptions fail to correctly model such variations. This paper presents a joint quantisation-interpolation algorithm for coding of LPC parameters in pitch synchronous speech coders to model the rapidly evolving parameters. In this technique a number of sets of pitch synchronous LPC parameters, corresponding to a frame of speech, are jointly coded by coding two reference sets of LSF's and an interpolation trajectory. coding an interpolation function allows the parameters to vary within the set. The proposed joint quantisation-interpolation coding of the pitch synchronous LSF is evaluated by comparison with time synchronous extraction and linear interpolation. It is also compared with linear interpolation between sets of pitch synchronous LSF's. Comparison results show that the joint quantisation-interpolation method reduces the average spectral distortion when compared to fixed interpolation. The proposed quantiser was included in to the PS-SBLPC coder and informal listening tests carried out. The synthesised speech was found to be of better quality when joint quantisation-interpolation is used.
This paper reports the first full process integration of nano-crystal memory (NCM) with 4.6F/sup 2/ cell (size: 0.0777 /spl mu/m/sup 2/) based on NOR type, which is achieved by landing plug polysilicon contact (LPC) a...
详细信息
This paper reports the first full process integration of nano-crystal memory (NCM) with 4.6F/sup 2/ cell (size: 0.0777 /spl mu/m/sup 2/) based on NOR type, which is achieved by landing plug polysilicon contact (LPC) and direct tungsten (W) bitline (BL). Robust 4-threshold voltage (VT) states for 2 bits operation per cell are verified. Also, the comparable characteristics to NCM with conventional silicide BL contact are obtained and NCM reliability is significantly improved by properly fluorinated effect while still keeping process compatibility and controllability, which is the only alternative for volume manufacture of high density NCM.
This paper presents an algorithm for encoding a speech signal at 2.3 kbit/s based on a uniform harmonic modeling of the excitation signal. The algorithm uses the robust pitch detection and efficient voicing analysis t...
详细信息
ISBN:
(纸本)0780376633
This paper presents an algorithm for encoding a speech signal at 2.3 kbit/s based on a uniform harmonic modeling of the excitation signal. The algorithm uses the robust pitch detection and efficient voicing analysis to split the LPC excitation into two bands. The lower band is related to the voiced parts of speech, while the upper band represents unvoiced speech. A fixed phase spectrum from a voiced segment generated by a male speaker is added into the uniform harmonic modeling of the excitation signal. This kind of fixed phase reduced the buzz effectively and produced soft natural speech. A short-term post-filter is utilized at the decoder to enhance the quality of synthesized speech. Subjective testing in Chinese showed that the 2.3 kbit/s HE-LPC coder performance is better than that of the federal standard 2.4 kbit/s MELP coder.
The use of address vector quantisation (VQ) in the compression of linearpredictive coded (LPC) and line spectral pairs (LSP) speech parameters in a speaker dependent system are examined. Four speakers are investigate...
详细信息
The use of address vector quantisation (VQ) in the compression of linearpredictive coded (LPC) and line spectral pairs (LSP) speech parameters in a speaker dependent system are examined. Four speakers are investigated; two male and two female. The speech waveform is coded to LPC and LSP parameters using LPC techniques and is vector quantised using an unsupervised neural network, a Kohonen self organising feature map (KSOFM), to create a codebook of 128 entries. Address VQ is applied to the codebook and the data examined for recurring sequences to exploit redundancy. Preliminary results indicate that approximately 46% additional compression is achievable using this method. As Address VQ is a loss-less compression scheme, this reduction is achieved without any further reduction in speech quality.
In this paper, we investigate the application of maximum entropy discrimination (MED) feature selection in speech recognition problems. We compare the MED algorithm with a classical wrapper feature selection algorithm...
详细信息
In this paper, we investigate the application of maximum entropy discrimination (MED) feature selection in speech recognition problems. We compare the MED algorithm with a classical wrapper feature selection algorithm and we propose a hybrid wrapper/MED algorithm. We experiment with the three approaches on a phoneme recognition task on the TIMIT database. Results show that the MED algorithm achieves error rates comparable with the wrapper algorithm, requiring a reduced computational charge. Furthermore, the use of a probabilistic framework shows that the MED algorithm gives very good results even with a very limited amount of data.
In many speech coding systems, the LPC coefficients are transformed to the line spectrum pairs (LSP) parameters which are very effective representation for quantization of the LPC information. LSP representation consu...
详细信息
In many speech coding systems, the LPC coefficients are transformed to the line spectrum pairs (LSP) parameters which are very effective representation for quantization of the LPC information. LSP representation consumes a large part of the total bit rate of the coder. Typically, the LSP are highly correlated from one frame to the next one, and a considerable reduction in bit rate can be achieved by exploiting this interframe correlation. However, interframe LSP coding can cause error propagation when frame erasures occur. In this paper, we compare the erasure performance of two intraframe quantization schemes namely a two-stage VQ-Lattice vector quantizer and a split vector quantizer based on the ITU G.723.1 standard coder. Our results show that a 24 bits/frame split vector quantizer improves the average of spectral distortion of the 24 bits/frame predictive split vector quantizer based on the ITU G.723.1 for different loss rates
Mel-frequency cepstral coefficients (MFCC) have been shown to be very useful in tasks of speech recognition and are the preferred features in state of the art speech recognition systems. The author present features de...
详细信息
Mel-frequency cepstral coefficients (MFCC) have been shown to be very useful in tasks of speech recognition and are the preferred features in state of the art speech recognition systems. The author present features derived from filter bank outputs whose performance is comparable to that of MFCCs for connected digit recognition using a hidden Markov model (HMM) based speech recognition system. The feature extraction method we present is easily implementable in floating gate analog VLSI circuitry which makes it a viable option for low power speech recognition tasks.
暂无评论