Current speech, audio, and video coding and transmission systems are either analogue or digital, with a strong shift from analogue systems to digital systems during the last decades. We have combined both digital and ...
详细信息
Current speech, audio, and video coding and transmission systems are either analogue or digital, with a strong shift from analogue systems to digital systems during the last decades. We have combined both digital and analogue schemes for the benefit of saving transmission bandwidth, complexity, and of improving the achievable quality at any given signal-to-noise ratio on the channel. The combination is achieved by transmitting pseudo analogue samples of the unquantized residual signal of a linearpredictive digital filter which is called mixed pseudo analogue-digital (MAD) transmission. In this paper a new modulation scheme based on QPSK for digital information and an Archimedes spiral for the time discrete, pseudo analogue residual signal is introduced and evaluated.
Speech processing has been an active area for several decades with a wide variety of applications ranging from communications to automatic reading machines. There are many speech recognition techniques, which are base...
详细信息
ISBN:
(纸本)9781424442133
Speech processing has been an active area for several decades with a wide variety of applications ranging from communications to automatic reading machines. There are many speech recognition techniques, which are based on statistical techniques as well as neural networks. The present work investigates the feasibility of two approaches for solving the problem using neural networks.
Vocoders compress speech by estimating model parameters at a given transmission rate over an analysis window, assuming that speech is stationary within this window. In this paper, the limits of this assumption are exp...
详细信息
Vocoders compress speech by estimating model parameters at a given transmission rate over an analysis window, assuming that speech is stationary within this window. In this paper, the limits of this assumption are explored with regard to the spectral envelope parameters in the form of line spectral frequency (LSF) parameters. It is shown that all LSF parameters have considerable variations over time, regardless of LSF vector extraction and transmission rates. LSF track variations are investigated through oversampling and are shown to contain high frequency variations above the frequency corresponding to the LSF vector transmission rate. An anti-aliasing filter with cut-off frequency adequate for the chosen LSF vector transmission rate is proposed to alleviate possible spectral overlapping of the LSF parameter spectra. It is confirmed, through experiments, that the proposed method offers an advantage over the classic LSF extraction method with respect to quantisation shown by bit savings of typically 10 to 15%.
We propose a hyperspectral image compressor called BH which considers its input image as being partitioned into square blocks, each lying entirely within a particular band, and compresses one such block at a time by u...
详细信息
We propose a hyperspectral image compressor called BH which considers its input image as being partitioned into square blocks, each lying entirely within a particular band, and compresses one such block at a time by using the following steps: first predict the block from the corresponding block in the previous band, then select a predesigned code based on the prediction errors, and finally encode the predictor coefficient and errors. Apart from giving good compression rates and being fast, BH can provide random access to spatial locations in the image. We hypothesize that BH works well because it accommodates the rapidly changing image brightness that often occurs in hyperspectral images. We also propose an intra-band compressor called LM which is worse than BH, but whose performance helps explain BH's performance.
A new method for recognizing the start and the end of each word in a Chinese continuous sentence is discussed. We define a new recognition characteristic called periodic gradual change (PGC). A continuous speech sente...
详细信息
ISBN:
(纸本)0780374886
A new method for recognizing the start and the end of each word in a Chinese continuous sentence is discussed. We define a new recognition characteristic called periodic gradual change (PGC). A continuous speech sentence can be separated into many single words by a combination of the new method of PGC and other characteristics such as zero crossing rate (ZCR), instantaneous swing (E characteristic) and linear predictive coding (LPC) parameter. The recognition rate is improved for continuous speech segmentation by the new method.
In this work, a new method for estimating the time-varying AR model of speech is presented. Here, the time-varying parameters are modeled as stationary processes. Both the time-varying parameters and their correspondi...
详细信息
In this work, a new method for estimating the time-varying AR model of speech is presented. Here, the time-varying parameters are modeled as stationary processes. Both the time-varying parameters and their corresponding stationary process are modeled through a common Gauss-Markov model whose state-vector can be estimated through the extended Kalman Filter (EKF) algorithm. The proposed algorithm is different from the earlier methods which use the EKF algorithm. Simulation studies are carried out for both voiced and unvoiced speech. It is shown that the proposed method has less mean-square prediction error than that obtained through the LPC method.
Instead of using the fuzzy membership input with class membership desired output among training procedures as proposed by several researchers, we used the fuzzy membership input with conventional binary desired output...
详细信息
Instead of using the fuzzy membership input with class membership desired output among training procedures as proposed by several researchers, we used the fuzzy membership input with conventional binary desired output. This can reduce the mistaken training, decrease the training time and also improve the recognition ability. The system was tested on the recognition of ten Thai numerals from zero to nine. The error rate for speaker-independent tests achieved 9.2% compared with 14% error rate for conventional neural network systems while the error rate of the system using class membership desired output is somewhat higher because of mistaken training.
The ITU-T issued the new recommendation G.729 in 1996, to realize a high-quality and low-delay speech coder at 8-kb/s. In this paper, the algorithm for conjugate-structure algebraic code-excited linear prediction (CS-...
详细信息
The ITU-T issued the new recommendation G.729 in 1996, to realize a high-quality and low-delay speech coder at 8-kb/s. In this paper, the algorithm for conjugate-structure algebraic code-excited linear prediction (CS-ACELP) is discussed, and its central aspects are analyzed in detail. Topics covered include the special codebook structure, efficient codebook search strategies and speech improvement approaches.
A series of experiments was performed in order to select a set of acoustic measurements for use as input to an expert system for stop consonant recognition. In the experiments, a trained human spectrogram reader made ...
详细信息
A series of experiments was performed in order to select a set of acoustic measurements for use as input to an expert system for stop consonant recognition. In the experiments, a trained human spectrogram reader made six-way (/b,d,g,p,t,k/) classifications of syllable-initial stops using four different data representations: DFT spectrograms, LPC spectrograms, LPC spectral slices and tables of numerical measurements. Percent correct identification was 79%, 81%, 72% and 76%, respectively, for the four data sets. The relatively high performance achieved using the numerical measurements, together with other considerations for selecting input representations for expert systems, suggest that the numerical tables are the most appropriate of the four forms of input.
A coding method, glottal linear prediction, that is based on a more precise model is introduced. According to this model, speech production can be separated into three processors: the glottal excitation, the vocal tra...
详细信息
A coding method, glottal linear prediction, that is based on a more precise model is introduced. According to this model, speech production can be separated into three processors: the glottal excitation, the vocal tract, and the lip radiation effect. The application of the idea in the coding of telephone-line PCM (pulse-code modulation) speech that includes different types of utterances is emphasized. The only filter that has to be transmitted from the coder to the decoder is the filter which models the vocal tract. This is usually of lower order than LPC (linear predictive coding) filters used in conventional linearpredictive analysis. The excitation signal is coded by modeling the obtained glottal wave estimate with Lagrange interpolation or by generating noise. Preliminary results indicate that the more accurate modeling of the speech-production mechanism will lead to improved quality in speech coding.< >
暂无评论