We present a method that uses artificial neural networks for acoustic to articulatory mapping. An assembly of Kohonen (1982) neural nets is used, in the first stage a network maps cepstral values, each neuron contains...
详细信息
We present a method that uses artificial neural networks for acoustic to articulatory mapping. An assembly of Kohonen (1982) neural nets is used, in the first stage a network maps cepstral values, each neuron contains a subnet in a second stage that maps the articulatory space. The method allows both the acoustic to articulatory mapping, ensuring smooth varying vocal tract shapes, and the study of the nonuniqueness problem.
Presents a new method for speech training of deaf and hard-of-hearing people, using a computer accompanied with an A/D sound card. The idea is to produce a visual feedback of the deaf person's voice, also giving i...
详细信息
Presents a new method for speech training of deaf and hard-of-hearing people, using a computer accompanied with an A/D sound card. The idea is to produce a visual feedback of the deaf person's voice, also giving information about what must be done in order to correctly produce the required speech sound. The visual representation attempts to reproduce the configuration of the speaker's vocal tract in real time, during vowel articulation. Since changes in this configuration reflect changes of the spectral envelope of the speech signal, we need to solve the inverse problem: obtaining the vocal tract shape from the speech signal. To solve this problem, signal processing techniques such as linear predictive coding were used. We hope that this representation can be more effective than other representations in use today, such as spectra and spectrograms, due to its closer association with the speech production processes.
In linearpredictive coders the output of the LP analysis filter is used to represent the glottal excitation signal. For high pitched voices during nasal sounds or nasalized vowels, the speech signal takes on a sinuso...
详细信息
In linearpredictive coders the output of the LP analysis filter is used to represent the glottal excitation signal. For high pitched voices during nasal sounds or nasalized vowels, the speech signal takes on a sinusoidal shape. The corresponding residual signal has a very low energy and the pitch pulses are weak or absent, resulting in poor pitch tracking. These segments of speech are also characterized by large frame-to-frame variations of the LP coefficients. In this paper we propose a composite formant prediction error criterion leading to a clear track of residual pulses even for for the sinusoid-like speech, while enhancing the smoothness of the filter parameter evolution.
We study enhancing the quality of coded speech by nonlinear filtering the linear-prediction residual to make it more periodic. By attenuating the low-magnitude samples, more of the residual energy can be concentrated ...
详细信息
We study enhancing the quality of coded speech by nonlinear filtering the linear-prediction residual to make it more periodic. By attenuating the low-magnitude samples, more of the residual energy can be concentrated in the regularly spaced major excitation pulses. Wideband additive noise in the speech signal is attenuated because its residual energy is spread in time, rather than being concentrated in narrow temporal regions.
This paper presents a full-duplex, real-time implementation of ITU-T G.723.1 speech coder using the SSP1820, Samsung's DSP chip which is based on a 16-bit fixed-point Oak core. Optimization methods are proposed in...
详细信息
ISBN:
(纸本)0780343719
This paper presents a full-duplex, real-time implementation of ITU-T G.723.1 speech coder using the SSP1820, Samsung's DSP chip which is based on a 16-bit fixed-point Oak core. Optimization methods are proposed in order to reduce the total cycle time consumed in real-time implementation. The multi-pulse maximum likelihood quantization (MP-MLQ) excitation search block which is the most computation-intensive block in the coder is restructured to reduce the algorithmic redundancy. In addition, efficient filtering methods and memory management are utilized for further optimization. The bit-exact verification with the ITU test vectors and performance evaluation aspects are also discussed in this paper.
This paper presents a physics-based model for the MOS transistor, suitable for circuit design and simulation and valid from weak to strong inversion. Each static or dynamic characteristic is accurately described by a ...
详细信息
This paper presents a physics-based model for the MOS transistor, suitable for circuit design and simulation and valid from weak to strong inversion. Each static or dynamic characteristic is accurately described by a single-piece function of two saturation currents.
According to the progress of high-speed network and multimedia technologies, related new experimental software is being researched and developed. Multiuser games are a kind of network-based software. The authors descr...
详细信息
ISBN:
(纸本)0780339053
According to the progress of high-speed network and multimedia technologies, related new experimental software is being researched and developed. Multiuser games are a kind of network-based software. The authors describe the implementation of a multiuser and multimedia game engine which enables several users to play the same game simultaneously under a multimedia environment. The implemented game engine provides a script language which can be used to describe the game scenario and facilitates the development of a new game. It also supports the separation of multimedia data necessary for clients from the server, which can reduce the network load and maintain the multimedia data effectively.
We present a spectral transformation technique for musical tones. It can be used to modify the brightness or timbre of a musical tone. We perform spectral transformation by modifying the line spectral frequencies or L...
详细信息
We present a spectral transformation technique for musical tones. It can be used to modify the brightness or timbre of a musical tone. We perform spectral transformation by modifying the line spectral frequencies or LPC roots of the original spectral envelope and filtering the tone with a spectral transformation filter in the time domain. One of the application is pitch modification where frequency scaling is used to modify the fundamental frequency of a tone and spectral transformation is used to capture the original brightness and timbre. We have applied this spectral transformation technique to restore the original brightness of frequency scaled musical tones produced from a variety of musical instruments, such as piano, trumpet, violin, flute and bassoon by one octave up or down. In all experiments conducted, the pitch modified tones resemble the original brightness and timbre. This technique allows us to control the timbre of musical tones such as brightness, friction or other modulation by manipulating the spectral parameters.
In an effort to improve the recognition performance of talker-independent speech systems, many adaptive methods have been proposed. The methods generally seek to exploit the higher recognition performance rate of talk...
详细信息
In an effort to improve the recognition performance of talker-independent speech systems, many adaptive methods have been proposed. The methods generally seek to exploit the higher recognition performance rate of talker-dependent systems and extend it to talker-independent systems. This is achieved by some form of placing talkers into several categories, usually using gender or vocal-tract size. We investigate a similar idea, but categorize each utterance independently. An utterance is processed using several spectral compressions, and the compression with the maximum likelihood is then used to train a better model. For testing, the spectral compression with the maximum likelihood is used to decode the utterance. While the spectral compressions divided the utterances well, this did not translate into significant improvement in performance, and the computational cost increase was significant.
This paper presents a lattice low-delay code-excited linear prediction speech coder (LLD-CELP) based on the analysis-by-synthesis configuration. The coder achieves a one-way coding delay of less than 2 ms by making bo...
详细信息
This paper presents a lattice low-delay code-excited linear prediction speech coder (LLD-CELP) based on the analysis-by-synthesis configuration. The coder achieves a one-way coding delay of less than 2 ms by making both the LPC predictor and the excitation gain backward-adaptive, and by using a small excitation vector size of four samples. The introduction of a lattice filter as a short term predictor, and a perceptual weighting filter has significant advantages, such as fast tracking of speech signal nonstationarities, simple stability verification, and uniform distribution of the computation load. The introduction of backward adaptive prediction gain imbedded in the excitation codebook achieves a lower complexity with the same quality and with the same bit rate.
暂无评论