This paper presents a novel digital data modulation and demodulation algorithm ARDMA based on the principles of autoregressive modeling (AR) of speech production. In the first step a sustained voiced speech signal cha...
详细信息
This paper presents a novel digital data modulation and demodulation algorithm ARDMA based on the principles of autoregressive modeling (AR) of speech production. In the first step a sustained voiced speech signal characteristics are analyzed using autoregressive modeling principle and then the two sets of linear prediction (LPC) coefficients are obtained and converted to linear spectrum frequencies (LSF). The input binary data stream drives the selection mechanism of LSF coefficients which are then applied as filter coefficients of the modulation signal synthesis filter. This filter is excited with specially designed excitation signal which corresponds to the basic characteristics of typical excitation signal of human vocal tract. Finally, a speech-alike modulation signal is produced. This modulation signal is then sent through the voice channel of the GSM system. The demodulator analyzes the incoming modulation signal using autoregressive modeling. The most likely LSF vector which modulated the particular symbol was determined by the demodulation process and converted to the respective string of binary data. The performance of proposed modulation scheme was compared to the regular frequency shift keying method (FSK). The performance improvement of ARDMA against FSK is observed at higher bit-rates in the case of three compared GSM speech coders. (c) 2008 Elsevier Inc. All rights reserved.
A speaker-independent isolated word recognition system is described which is based on the use of multiple templates for each word in the vocabulary. The word templates are obtained from a statistical clustering analys...
详细信息
A speaker-independent isolated word recognition system is described which is based on the use of multiple templates for each word in the vocabulary. The word templates are obtained from a statistical clustering analysis of a large database consisting of 100 replications of each word (i.e., once by each of 100 talkers). The recognition system, which accepts telephone quality speech input, is based on an LPC analysis of the unknown word, dynamic time warping of each reference template to the unknown word (using the Itakura LPC distance measure), and the application of a K-nearest neighbor (KNN) decision rule. Results for several test sets of data are presented. They show error rates that are comparable to, or better than, those obtained with speaker-trained isolated word recognition systems.
Covariance analysis as a least squares approach for accurately performing glottal inverse filtering from the acoustic speech waveform is discussed. Best results are obtained by situating the analysis window within a s...
详细信息
Covariance analysis as a least squares approach for accurately performing glottal inverse filtering from the acoustic speech waveform is discussed. Best results are obtained by situating the analysis window within a stable closed glottis interval. Based on a linear model of speech production, it is shown that both the moment of glottal closure and opening can be determined from the normalized total squared error with proper choices of analysis window length and filter order. Results from actual speech are presented to illustrate the technique.
Several distance measures have been proposed for comparing sets of LPC coefficients. The most popular one has been the "log likelihood ratio" proposed by Itakura [1]. In this paper we discuss this measure (s...
详细信息
Several distance measures have been proposed for comparing sets of LPC coefficients. The most popular one has been the "log likelihood ratio" proposed by Itakura [1]. In this paper we discuss this measure (strictly speaking, a somewhat generalized version of it) from both a theoretical and a practical point of view. We derive its statistical properties both when the reference vector is known and when it is estimated from the data. We also show how these properties are affected by windowing, additive noise, and preemphasis. We present results of extensive simulations in support of the theoretical predictions. Finally, we argue that de Souza's [2] recent criticism of this measure is unjustified.
A communication system was built and tested to operate in the land mobile VHF band (150-174 MHz) at a channel separation of only 6 kHz. The audio source was digitally encoded at 2.4 kbits/s using linearpredictive cod...
详细信息
A communication system was built and tested to operate in the land mobile VHF band (150-174 MHz) at a channel separation of only 6 kHz. The audio source was digitally encoded at 2.4 kbits/s using linear predictive coding (LPC). The speech data stream was transmitted by frequency shift keying (FSK) which allowed the use of class-C transmitters and discriminator detection in the receiver. Baseband filtering of the NRZ data resulted in a narrow transmitter spectrum. The receiver had a 3 dB bandwidth of 2.4 kHz which allowed data transmission with minimal intersymbol interference and frequency offset degradation. A 58 percent eye opening was found. Bit error rate (BER) performance was measured with simulated Rayleigh fading at typical 150 MHz rates. Additional tests included capture, ignition noise susceptibility, adjacent channel protection, degradation from frequency offset, and bit error effects upon speech quality. A field test was conducted to compare the speech quality of the digital radio to that of a conventional 5 kHz deviation FM mobile radio.
Two prominent frequency components designated f1 and f2 have been identified in the visual evoked response to the transient presentation of sinusoidal luminance gratings in the range of 0.5-8 c/deg. The components occ...
详细信息
Two prominent frequency components designated f1 and f2 have been identified in the visual evoked response to the transient presentation of sinusoidal luminance gratings in the range of 0.5-8 c/deg. The components occur at temporal frequencies below the alpha band, with the f1 frequency being roughly half that of the f2 frequency. The f1 component is largest at low spatial frequencies with f2 becoming progressively dominant as spatial frequency is increased. The frequency and amplitude of f1 and f2 change substantially over the time course of the response. This has been studied by calculating the temporal frequency spectrum of the transient evoked potential over successive short-time epochs running through the response. Using this technique, the response is shown to consist of narrow-band frequency peaks or ''formants'' emerging at different times after stimulus onset. These formants occur at frequencies other than those of the spontaneous EEG and undergo changes in frequency and amplitude over the time course of the response. Two spectrum analysis techniques were employed: The Discrete Fourier Transform and linear predictive coding. Frequency components were successfully identified in single-trial responses using the LPC technique.
This paper describes the design of a baseband LPC coder that transmits speech over 9.6 kbit/s (kilobit/second) synchronous channels with random bit errors of up to 1 percent. Presented are the results of our investiga...
详细信息
This paper describes the design of a baseband LPC coder that transmits speech over 9.6 kbit/s (kilobit/second) synchronous channels with random bit errors of up to 1 percent. Presented are the results of our investigation of a number of aspects of the baseband LPC coder with the goal of maximizing the quality of the transmitted speech. Important among these aspects are: bandwidth of the baseband, coding of the baseband residual, high-frequency regeneration, and error protection of important transmission parameters. The paper discusses these and other issues, presents the results of speech-quality tests conducted during the various stages of optimization, and describes the details of the optimized speech coder. This optimized speech coding algorithm has been implemented as a real-time full-duplex system on an array processor. Informal listening tests of the real-time coder have shown that the coder produces good speech quality in the absence of channel bit errors and introduces only a slight degradation in quality for channel bit error rates of up to 1 percent.
A multiple method used in speech coding is applied to the seismic deconvolution problem. The advantage of this method is that the source wavelet and reflectivity series representing the layered Earth structure are sim...
详细信息
A multiple method used in speech coding is applied to the seismic deconvolution problem. The advantage of this method is that the source wavelet and reflectivity series representing the layered Earth structure are simultaneously estimated. Results of this investigation show that the method shows promise for all-pole models of the input wavelet. An application of the method to actual seismic data is shown.< >
A noncausal autoregressive moving average (ARMA) source-filter model of voiced speech is proposed. Although the human speech-production mechanism is obviously causal, a noncausal model allows the simple source-filter ...
详细信息
A noncausal autoregressive moving average (ARMA) source-filter model of voiced speech is proposed. Although the human speech-production mechanism is obviously causal, a noncausal model allows the simple source-filter approach to incorporate different parameters for the open-glottis and closed glottis portions of the pitch period (without explicit determination of the open- and closed-glottis regions). The noncausal impulse response is obtained by standard cepstral deconvolution. Separate ARMA models are then found for the causal and anticausal portions of the impulse response. Initial experiments show that very close approximations to the speech waveform and spectrum are produced by two twelfth-order ARMA models.< >
暂无评论