In embedded multirate speech coding, it is desired to have a transmitter/receiver system that operates efficiently over a wide range of channel transmission rates. We are currently investigating a system based on adap...
详细信息
In embedded multirate speech coding, it is desired to have a transmitter/receiver system that operates efficiently over a wide range of channel transmission rates. We are currently investigating a system based on adaptive transform coding of speech, in which we code and transmit the system parameters and the discrete cosine transform (DCT) coefficients of the fullband linear prediction residual waveform. The multirate property of the system is achieved by allowing the channel to discard some of the bits generated by the high data rate transmitter. Stripping off bits results in an absence of DCT components, which the receiver regenerates by a spectral duplication method. An inverse DCT at the receiver yields the time domain residual waveform to be used as input to the linear prediction synthesis filter. The lowest data rate achievable by the system is about 2.5 kb/s, in which case the system reduces to a narrowband LPC pitch-excited vocoder.
The pitch contour plays an important role of marking syntactic boundaries in the natural language. In order to improve the continuous speech recognition system we have developed, we have carried out experiments on how...
详细信息
The pitch contour plays an important role of marking syntactic boundaries in the natural language. In order to improve the continuous speech recognition system we have developed, we have carried out experiments on how the pitch contour reflects word boundaries in an artificial language-continuously read material of a programming language. On the analysis of 173 sentences of 5 male speakers including 1655 word boundaries, 67.1 % of the total word boundaries were correctly detected and the false alarm is 28.8 %. The preceding and the following boundaries of keywords were found at the correct rate of 94.2 % and 85.0 % respectively. The application of boundary information to the recognition of continuous speech is also discussed in this paper.
Inverse filter parameters may be determined from estimates of the autocorrelation or covariance of a sampled signal by methods which admit corruption of partial results and generate subsequently optimal values for the...
详细信息
Inverse filter parameters may be determined from estimates of the autocorrelation or covariance of a sampled signal by methods which admit corruption of partial results and generate subsequently optimal values for the remaining parameters. Experimental results are presented to indicate the extent to which such techniques can compensate for the effects of irregular or very gross parameter quantization. Three-way comparisons are made between the spectral responses of the inverse filters designed with the globally optimal parameters, corrupted versions of those parameters and parameters determined with the corruption introduced within the estimation algorithm.
linear predictive coding of speech has traditionally used a least mean square (LMS) criterion for determining the optimal filter parameters. Auto-regressive (AR) models using LMS are in wide-spread use for low bit rat...
详细信息
linear predictive coding of speech has traditionally used a least mean square (LMS) criterion for determining the optimal filter parameters. Auto-regressive (AR) models using LMS are in wide-spread use for low bit rate speech compression. These models are particularly susceptible to moderate amounts of added noise and distortion. This paper proposes a new model which does not use the least-mean-square criterion but rather uses the "peakiness" of the deconvolved waveform to determine the filter parameters. Since voiced output speech is reproduced with a maximally "peaked" waveform, viz, an impulse, this criterion is naturally suited for the characteristics of the synthesis process. Preliminary analysis also shows the feasability of implementing the proposed algorithm in microprocessor based hardware.
This paper presents results of research on the enhancement of speaker independent wordspotting in conversational, telephone bandwidth speech from a variety of talkers. The research involved the comparison of five LPC ...
详细信息
This paper presents results of research on the enhancement of speaker independent wordspotting in conversational, telephone bandwidth speech from a variety of talkers. The research involved the comparison of five LPC based parameter sets (filter coefficients, autocorrelation coefficients, cepstral coefficients, vocal tract area functions, and pseudoformants), in three levels of additive white noise, and the evaluation of two methods of noise reduction. The techniques were studied with a dynamic programming based wordspotting system. The performance of the system was measured the percent of keywords spotted with no false alarms. The results show that pseudoformants, without the use of speaker normalization or noise reduction techniques, perform as well as or better than the other parameter sets with these techniques.
This paper presents a unique approach for the implementation of the speech synthesis portion of a LPC vocoder. The implementation uses eight LSI-11 microprocessors operating synchronously around a time division multip...
详细信息
This paper presents a unique approach for the implementation of the speech synthesis portion of a LPC vocoder. The implementation uses eight LSI-11 microprocessors operating synchronously around a time division multiplexed multiport memory. The implementation of the recursive digital filter portion of the LPC synthesis is accomplished without the use of any high speed external arithmetic elements and in a way which requires no synchronization overhead. This particular technique illustrates a general procedure which is applicable to the implementation of a large class of digital signal processing algorithms on multiprocessor machines.
In previous papers we discussed a speech synthesis by rule scheme where segments obtained from natural speech were linearly concatenated. These segments included the consonants and the transitions from consonants to v...
详细信息
In previous papers we discussed a speech synthesis by rule scheme where segments obtained from natural speech were linearly concatenated. These segments included the consonants and the transitions from consonants to vowels, vowels to vowels, and vowels to consonants. Each synthesis parameter was defined by few sets of LPC area parameters, and in the concatenative process, straight line interpolation was used to obtain the complete set of area parameters. Informal listening and some formal intelligibility testing revealed that this simplified description of the synthesis segments was not sufficient to produce the speech quality that would satisfy us. Consequently, it was decided to improve the definition of the concatenative units. This paper will discuss in detail the scheme for synthesis employing these concatenative units.
This paper explores the intermediate solutions between fixed prediction and forward adaptative prediction in ADPCM which consists of using a finite number, L, of preselected linear predictors of order M. The design pr...
详细信息
This paper explores the intermediate solutions between fixed prediction and forward adaptative prediction in ADPCM which consists of using a finite number, L, of preselected linear predictors of order M. The design problem of selecting the optimum set of predictors with respect to the overall prediction gain is formulated and an iterative procedure is described to obtain the solutions. The relative prediction-gain improvement is computed for a 3 sec. speech sample and for several values of L,M, and block size showing that \fraclinear{2} of the adaptative over fixed-prediction improvement in dB is reached with only L=4 and 2/3 with L=8 . The design problem solved by minimizing Itakura distance is shown to yield essentially identical performances. A linear discriminant property in the autocorrelation space is pointed out. Based on that property a pattern classification approach is proposed as an hardware-efficient coding algorithm.
The purpose of this paper is to describe a scheme for low-bit-rate transmission of speech. The scheme consists of a cascade of a codec with a post-processor. The codec is based on a modified version of the LPC Vocoder...
详细信息
The purpose of this paper is to describe a scheme for low-bit-rate transmission of speech. The scheme consists of a cascade of a codec with a post-processor. The codec is based on a modified version of the LPC Vocoder Driven Adaptive Transform coding algorithm. The Post-Processor performs a short-time Fourier analysis/synthesis at the receiver output and, by exploring the known structure of the quantization noise introduced by the codec, it is capable of speech enhancement. The performance of this scheme will be demonstrated at 9.6 kb/s and at 7.2 kb/s.
For many sounds, the successive frames of a speech signal do not differ significantly, and they can be represented by the same set of parameters (PARCOR coefficients). This paper examines the possibility of transmitti...
详细信息
For many sounds, the successive frames of a speech signal do not differ significantly, and they can be represented by the same set of parameters (PARCOR coefficients). This paper examines the possibility of transmitting only some of the new parameters and replacing the rest by the parameters already transmitted. Several frames are considered, and all the possible decision sequences (paths) are examined by using a dynamic programming approach. The path which minimizes a preselected cost function is chosen for transmission, resulting in a reduced average data rate.
暂无评论