This paper presents a new shaped fixed codebook (FCB) search technique for code excited linearpredictive (CELP) coding. The state of art CELP coding techniques operate at rates above 4.0 kbps, as it gets harder to bu...
详细信息
This paper presents a new shaped fixed codebook (FCB) search technique for code excited linearpredictive (CELP) coding. The state of art CELP coding techniques operate at rates above 4.0 kbps, as it gets harder to build a good FCB contribution with a minimal bit budget. In this paper the shaped FCB search is presented to ease this problem, and achieve a better FCB contribution with a reduced bit budget. The shaped FCB search integrated to a 4 kbps CELP coder is presented and the subjective performance results are reported which show that the coder is significantly better than the IS-127 half rate coder at 4 kbps.
Summary form only given. The article gives an overview of text-to-speech (TTS) technology and a description of some issues of potential interest to speech coding experts. After motivation for the use of TTS technology...
详细信息
Summary form only given. The article gives an overview of text-to-speech (TTS) technology and a description of some issues of potential interest to speech coding experts. After motivation for the use of TTS technology, it describes the general architecture of a text-to-speech system with particular emphasis on the speech synthesis component. Both formant synthesis and concatenative synthesis are presented, offering different degrees of flexibility and quality. Several well-known speech coding techniques (including LPC vocoders, waveform interpolation, harmonic coding, and layered coding) have been used in speech synthesis. It explains how they have been applied, and the advantages and limitations of those techniques when used in speech synthesis. The main goal is to increase cooperation between the speech coding community and the TTS community, and in particular to motivate the need for speech coding algorithms that meet the requirements of the next generation speech synthesis technology.
This paper presents an efficient low-delay CELP speech coder based on a structure given by Chen et el. (1992). The proposed coder can operate at a rate of 8 Kb/s and has an arithmetic complexity that is 20% lower than...
详细信息
ISBN:
(纸本)0780365429
This paper presents an efficient low-delay CELP speech coder based on a structure given by Chen et el. (1992). The proposed coder can operate at a rate of 8 Kb/s and has an arithmetic complexity that is 20% lower than that of the CELP of Chen et al. with an acceptable increase in the delay. The proposed coder has been tested to provide a good-quality speech.
Distributed speech recognition services (DSRSs) provide an anytime, anywhere and any-device speech recognition environment that is intelligent enough to interact with users in a more natural manner. The primary goal i...
详细信息
Distributed speech recognition services (DSRSs) provide an anytime, anywhere and any-device speech recognition environment that is intelligent enough to interact with users in a more natural manner. The primary goal is to provide users with the ability to dictate commands and/or documents among other potential services. The system coordinates the efforts of applications running in a distributed environment. For example, a user is able to dictate a document using their local word processor and a DSRS's remotely located speech engine. DSRSs encourage cooperation among individual programs in order to combine the efforts of individual applications to fulfill a user's request.
A novel a audio/speech coding algorithm, hybrid audio coding (HAC) is described. New features of the algorithm include window switching with generalized MDCT, an improved quantization scheme of the MDCT coefficients, ...
详细信息
A novel a audio/speech coding algorithm, hybrid audio coding (HAC) is described. New features of the algorithm include window switching with generalized MDCT, an improved quantization scheme of the MDCT coefficients, and waveform normalization in the time domain. HAC provides a good quality at a bit rate of 8 to 16 kbps, and it is also proven that the developed algorithm is effective for both audio and speech signals.
In this paper, a digital processing method is described for modifying tone contrast that was defined as the difference in frequencies between peaks and valleys of pitch curves in natural utterances. Speech signals wit...
详细信息
In this paper, a digital processing method is described for modifying tone contrast that was defined as the difference in frequencies between peaks and valleys of pitch curves in natural utterances. Speech signals with modified tones were presented to hearing-impaired Chinese listeners who were asked to identify four alternative Mandarin words. Employing this method, it was found that modified speech with enhanced tone contrast contributed moderate gains in the percentage correct word identification when compared to unmodified speech, while reducing tone contrast generally reduced the percentage correct identification. These findings therefore offer support to the assertion that a hearing aid with tone modifications is indeed effective for hearing-impaired Chinese.
In this paper, we introduce an auto-regressive moving average (ARMA) lattice model for speech modeling. The speech characteristics are modeled and expressed in the form of lattice reflection coefficients for classific...
详细信息
In this paper, we introduce an auto-regressive moving average (ARMA) lattice model for speech modeling. The speech characteristics are modeled and expressed in the form of lattice reflection coefficients for classification. Self Organization Map (SOM) is used to build codebooks for classification and recognition of the lattice reflection coefficients. Experimental results based on an isolated word speech database of 10 words/names indicate that the ARMA lattice model achieves superior recognition performance as compared to those of the conventional auto-regressive (AR) model.
We present in this paper a new binomial sine pulse (BSP) excitation signal used in linear prediction-based speech codecs. The structure of the BSP excitation signal is actually a sine wave whose amplitude is modulated...
详细信息
We present in this paper a new binomial sine pulse (BSP) excitation signal used in linear prediction-based speech codecs. The structure of the BSP excitation signal is actually a sine wave whose amplitude is modulated by a binomial signal. The binomial signal describes the various trends of excitation signals in a pitch period, and the pulsatance of the BSP excitation signal coincides with the vibration frequency of vocal folds. In experiments, processing is going on frame by frame and the same excitation signal is placed at every pitch excitation moment in a frame. Speech codecs based on this new BSP excitation have the advantages of low complexity and low delay. Experiment results prove that such a new speech codec can provide highly intelligible synthesized speech below 3 kbps.
We present an MPEG slice layer model for VBR encoded video using linear predictive coding (LPC) and generalized periodic Markov chains. Each slice position within an MPEG frame is modeled using an LPC autoregressive f...
详细信息
We present an MPEG slice layer model for VBR encoded video using linear predictive coding (LPC) and generalized periodic Markov chains. Each slice position within an MPEG frame is modeled using an LPC autoregressive function. The selection of the particular LPC function is governed by a generalized periodic Markov chain; one chain is defined for each I, P, and B frame type. The model is sufficiently modular in that sequences which exclude B frames can eliminate the corresponding Markov chain. We show that the model matches the pseudo-periodic autocorrelation function quite well. We present simulation results of an asynchronous transfer mode (ATM) video transmitter using a FIFO queue and measure the average cell delay. Simulation results showed good agreement with results obtained using actual traces as sources.
暂无评论