Mexican Spanish has received little attention so far despite being one of the most spoken Spanish dialects in the world with an enormous potential for interest. It presents some particular characteristics that differe...
详细信息
Mexican Spanish has received little attention so far despite being one of the most spoken Spanish dialects in the world with an enormous potential for interest. It presents some particular characteristics that differentiate it from the Spanish spoken in Spain that has been the dialect mostly studied during the past. We present our study on the properties of phones in Mexican Spanish and acoustic modeling required for the development of an utterance verification system for Mexican Spanish. Two different approaches for modeling the alternative hypothesis in the subword-level utterance verification system are also presented and compared.
The NLMS algorithm has low computational cost and exhibits optimal performance with excitation by Gaussian noise, but has poor performance with coloured signals such as speech. This paper proposes an acausally-conditi...
详细信息
The NLMS algorithm has low computational cost and exhibits optimal performance with excitation by Gaussian noise, but has poor performance with coloured signals such as speech. This paper proposes an acausally-conditioned (AC-NLMS) method for coloured signals which adjusts the correlation matrix governing adaptation behaviour to an LMS approximation of that for Gaussian noise so as to permit near-optimal NLMS performance. The low computational complexity of NLMS is conserved. The technique has potential applications to acoustic echo and noise cancellation.
A new method for modifying the pitch of units of recorded female speech is described. This method was developed to overcome limitations in an otherwise promising technique called residual-excited linear prediction (RE...
详细信息
A new method for modifying the pitch of units of recorded female speech is described. This method was developed to overcome limitations in an otherwise promising technique called residual-excited linear prediction (RELP). In the new method, the stored speech unit is processed with a suitably shaped time-varying filter. The filtered signal is modified according to the required change in the fundamental frequency. The modified filtered signal is applied to the inverse of the above-mentioned prefilter. Based on observations of spectra of multiple recordings of the same speech unit at different pitch frequencies, the magnitude response of the inverse filter was chosen to have a significantly less peaky structure than that which is typically obtained in LPC. Speech modifications using this method were found to be superior in quality to those obtained by RELP, while at the same time being less sensitive than RELP to changes in pitch marking.
Current speech or speaker recognition systems rely largely on voiced parts of utterance, though a great amount of information for speech perception is contained in the nonstationary consonants and transition. How to m...
详细信息
ISBN:
(纸本)0780336763
Current speech or speaker recognition systems rely largely on voiced parts of utterance, though a great amount of information for speech perception is contained in the nonstationary consonants and transition. How to model and characterize the dynamic spectral features describing the transition still remains a question. This paper investigates the modeling and detection of the spectral transition based on time-frequency analysis. linear and nonlinear modeling of the transitions are proposed using linear and quadratic frequency modulation signals. Then two strategies of detection of the spectral transition are presented, i.e., the Radon-Wigner transform (RWT) and Radon-ambiguity transform (RAT). Both simulated and real speech data from the TIMIT database are used to test the detection procedure.
A new and simple extraction method for source series resistance and mobility reduction coefficient with the gate transverse field, based on the MOSFET transconductance modeling in the saturation region, is reported. T...
详细信息
A new and simple extraction method for source series resistance and mobility reduction coefficient with the gate transverse field, based on the MOSFET transconductance modeling in the saturation region, is reported. The proposed procedure is validated on partially depleted SIMOX MOSFETs.
This paper proposes a speech and audio coder which operates at 1 bit/sample, namely an 8 kbit/s coder for 8 kHz sampling or a 16 kbit/s coder for 16 kHz sampling. The basic structure is inherited from a Twin VQ (trans...
详细信息
This paper proposes a speech and audio coder which operates at 1 bit/sample, namely an 8 kbit/s coder for 8 kHz sampling or a 16 kbit/s coder for 16 kHz sampling. The basic structure is inherited from a Twin VQ (transform domain weighted interleave vector quantization) high-quality audio coding scheme. A periodical component extraction scheme is newly added to the quantization of the MDCT coefficients. This scheme is found to be effective for reducing distortion and improving the robustness against channel errors. The qualities for music signals at 8 kbit/s are better than those of G.729 at the same bit rates, while they are worse for clean speech. The qualities at 16 kbit/s are comparable to or better than those of G.722 at 48 kbit/s.
We present a new kind of speech coding. Usually the coding is obtained by a linear predictor LPC (or derivative LAR, LPCC) or by spectral analysis, as FFT, Cepstre or MECC. We propose to use a three layer neural netwo...
详细信息
We present a new kind of speech coding. Usually the coding is obtained by a linear predictor LPC (or derivative LAR, LPCC) or by spectral analysis, as FFT, Cepstre or MECC. We propose to use a three layer neural network to learn phonemes extracted from the DARPA-TIMIT database. The network is designed to predict the next input signal value from the N previous ones. During the training stage, the first weight layer is the same for each phoneme. The second weight layer is different for each phoneme. In the generalization stage, the first weight layer remains fixed and initialized with those given by the training phase. When coding a test database phoneme, the output weight layer is trained to predict each phoneme values. The final neural predictivecoding (NPC) corresponds to this second weight layer. We show that normalized coding can easily be obtained by using a nonlinear function of the weights instead of the weights themselves. Results are compared with others on temporal speech coding. A study of NPC by discriminant analysis and an application of MLP to phoneme recognition is presented.
This paper presents a 1.2 kb/s mixed LPC vocoder based on multiband excitation (MBE) model. The vocoder extracts the pitch by a robust and efficient tracking algorithm. The cut-off frequency, which is a fundamental pa...
详细信息
This paper presents a 1.2 kb/s mixed LPC vocoder based on multiband excitation (MBE) model. The vocoder extracts the pitch by a robust and efficient tracking algorithm. The cut-off frequency, which is a fundamental parameter in a mixed excitation system, is obtained by the v/uv decision of MBE analysis. In order to reduce the bit rate, the coder has used frame interpolation between neighboring frames. A fast and reliable linear interpolation algorithm is proposed. Informal listening tests indicate that for either clean speech or telephone speech, the synthesized speech sounds natural and intelligible, and the quality is better than that of 2.4 kb/s LPC-10e standard.
A novel speech separation structure which simulates the cocktail party effect using a modified iterative Wiener filter and a multi-layer perceptron neural network is presented. The neural network is used as a speaker ...
详细信息
A novel speech separation structure which simulates the cocktail party effect using a modified iterative Wiener filter and a multi-layer perceptron neural network is presented. The neural network is used as a speaker recognition system to control the iterative Wiener filter. The neural network is a modified perceptron with a hidden layer using feature data extracted from LPC cepstral analysis. The proposed technique has been successfully used for speech separation when the interference is competing speech or broad band noise.
In this paper, we propose a neural approach to the speaker-independent word recognition, based on the algorithms of dynamic time warping (DTW) and fuzzy ARTMAP. DTW has some drawbacks: (1) It is space and time consumi...
详细信息
ISBN:
(纸本)0780340531
In this paper, we propose a neural approach to the speaker-independent word recognition, based on the algorithms of dynamic time warping (DTW) and fuzzy ARTMAP. DTW has some drawbacks: (1) It is space and time consuming for a large set of training patterns. (2) It gives an equal importance to each frame of a pattern. To obtain better performance, the training patterns need to be prefiltered by human experts. Our approach attempts to address these shortcomings of DTW. We use a modified Fuzzy ARTMAP to be the framework of our approach. Our architecture is a four-layer sequential neural network. Our training algorithm and recalling algorithm are similar to fuzzy ARTMAP. However, our neural approach is a sequential algorithm. Experiments on the recognition of English alphabets have been performed. The recognition rates obtained by our approach and DTW are 87% and 80%, respectively, while memory space used in our approach is two or three times smaller than that used in DTW. Furthermore, prefiltering on training patterns is not required.
暂无评论