The authors describe a noisy channel finite-state vector quantizer (FSVQ) for quantizing speech line spectrum pair (LSP) parameters. First they show that under noiseless channel conditions, the ordinary FSVQ of J. Fos...
详细信息
The authors describe a noisy channel finite-state vector quantizer (FSVQ) for quantizing speech line spectrum pair (LSP) parameters. First they show that under noiseless channel conditions, the ordinary FSVQ of J. Foster et al (1985), when used to encode the LSP parameters, achieves a 1-dB average spectral distortion at 24 b/frame; each LSP vector is split into three subvectors and an eight-state FSVQ is used for each subvector. However, when channel noise is present, the ordinary FSVQ performance degrades drastically. Therefore, the authors describe a modified FSVQ system that is designed taking into account the channel noise. It is shown by means of simulations that the new system is robust to channel noise and outperforms the channel-optimized VQ and the channel-matched multistage VQ of N. Phamdo et al, (1990) by saving 1.5-4 b/vector, depending on the level of noise in the channel.< >
A telephone speech recognition system for an isolated word which has sufficient performance in practical use is reported here. Word spotting technique is applied to the system so as to keep it relatively immune from n...
详细信息
A telephone speech recognition system for an isolated word which has sufficient performance in practical use is reported here. Word spotting technique is applied to the system so as to keep it relatively immune from noise. Word spotting is performed by a new time normalization algorithm based on linear time distortion pattern matching method named COLM(Continuous linear Matching). The method includes only simple operations such as look up tables and additions. The entire system is implemented in a small board including a single digital signal processor. An experiment proved 96.4% average recognition rate. The experiment was carried out using 10 Japanese numerals pronounced by 240 males and females through telephone lines.
We want to perform the attenuation correction in the case of 3D attenuated ray transform with a parallel geometry. We suppose that the attenuation function is available but not registered with the data. We use the sum...
详细信息
We want to perform the attenuation correction in the case of 3D attenuated ray transform with a parallel geometry. We suppose that the attenuation function is available but not registered with the data. We use the sum on each slice of the 2D data consistency conditions of the attenuated Radon transform to register the attenuation function with the data. We then correct for the attenuation using the Novikov formula. We show numerical experiments indicating the feasibility of the approach and propose a scheme including the diffusion correction for the registration of CT to SPECT for SPECT imaging improvement.
Current speech, audio, and video coding and transmission systems are either analogue or digital, with a strong shift from analogue systems to digital systems during the last decades. We have combined both digital and ...
详细信息
Current speech, audio, and video coding and transmission systems are either analogue or digital, with a strong shift from analogue systems to digital systems during the last decades. We have combined both digital and analogue schemes for the benefit of saving transmission bandwidth, complexity, and of improving the achievable quality at any given signal-to-noise ratio on the channel. The combination is achieved by transmitting pseudo analogue samples of the unquantized residual signal of a linearpredictive digital filter which is called mixed pseudo analogue-digital (MAD) transmission. In this paper a new modulation scheme based on QPSK for digital information and an Archimedes spiral for the time discrete, pseudo analogue residual signal is introduced and evaluated.
In this paper, we present a Gaussian mixture model-based block quantiser for coding line spectral frequencies that uses multiple frames and mean squared error as the quantiser selection criterion. The efficiency gaine...
详细信息
In this paper, we present a Gaussian mixture model-based block quantiser for coding line spectral frequencies that uses multiple frames and mean squared error as the quantiser selection criterion. The efficiency gained from jointly coding multiple frames permits the use of the mean squared error distortion (MSE) criterion rather than the computationally expensive spectral distortion. The proposed coder encompasses improvements in both distortion performance and complexity with transparency achieved at 23 bits per frame when coding two frames jointly or 21 bits per frame when coding 3 frames.
A speaker independent isolated word speech recognition system is developed based on computer generated phonemes (CGP). A CGP is a vector of features that has been generated to represent a region of speech. The CGP cre...
详细信息
A speaker independent isolated word speech recognition system is developed based on computer generated phonemes (CGP). A CGP is a vector of features that has been generated to represent a region of speech. The CGP creation algorithm looks for stable sounds in the incoming word through the use of a similarity measure. When a stable sound is detected a CGP is created to represent it. In addition to the creation of CGPs for stable vocal tract sounds, when unvoiced fricatives occur at either the beginning or end of a word a representative CGP is created. Using a heavily constrained dynamic time warping algorithm, the CGPs of the incoming word are then compared against reference templates, which consist of previously created strings of CGPs. The identity of the reference template which is closest in distance to the incoming test word is chosen as the estimate of the test word.
Representation of the speech signal by a set of discrete elements which respect its acoustical and perceptive structures is considered. The signal is pre-analyzed frame by frame, and the spectral envelope obtained for...
详细信息
Representation of the speech signal by a set of discrete elements which respect its acoustical and perceptive structures is considered. The signal is pre-analyzed frame by frame, and the spectral envelope obtained for each frame is segmented into regions comprising a single peak. The signal is then filtered in each region, and the elementary waveforms are spotted in the time domain. The problem of grouping the waveforms in adjacent channels is thus circumvented. The resulting representation is satisfactory, as is the signal reconstruction, except for some modeling problems remaining in the lowest part of the spectrum.< >
A description is presented of a framework for developing compact and accurate representation of LPC (linear predictive coding) excitation. In this representation, the excitation waveform is expressed as a linear combi...
详细信息
A description is presented of a framework for developing compact and accurate representation of LPC (linear predictive coding) excitation. In this representation, the excitation waveform is expressed as a linear combination of the eigenvectors of the autocorrelation matrix of the LPC filter's impulse response. This representation allows a systematic study of changes in the filter excitation on the speech output. The author presents results on the precision that is necessary in the excitation without producing perceptible distortion in the output speech signal and an estimate of the minimum number of bits necessary for accurate reproduction of the excitation.< >
The paper presents an artificial vision system able to analyze the image of a car given by a camera, to locate the registration plate and to recognize its registration number. It describes practical problems encounter...
详细信息
The paper presents an artificial vision system able to analyze the image of a car given by a camera, to locate the registration plate and to recognize its registration number. It describes practical problems encountered in implementing this application and the proposed solutions. The system has been designed using a modular approach. Sub-modules can be upgraded and or substituted independently, thus making the system potentially suitable in a large variety of vision applications. Performances of the system, have been evaluated in real situation.
In many speech analysis/synthesis schemes, the source for excitation for voiced speech is a train of impulses. However, the quality of speech that has been attained due to the introduction of a dynamically varying sou...
详细信息
In many speech analysis/synthesis schemes, the source for excitation for voiced speech is a train of impulses. However, the quality of speech that has been attained due to the introduction of a dynamically varying source, e.g., parametric source model, multipulse excitation has been found to be better than that using impulse excitation. The authors describe a pitch-synchronous glottal autoregressive moving average analysis/synthesis scheme in which a parametric voice source model is used in jointly estimating the source and vocal tract parameters from the speech signal. This method is then compared with closed-phase linear predictive coding (LPC), wherein covariance analysis is required to be carried out in the closed glottis interval, and with robust LPC, in which the analysis frame is insensitive to glottal closure. The superiority of the proposed scheme over the latter two methods is shown in terms of better formant/bandwidth tracking capability and efficiency of resynthesis.< >
暂无评论