The Code Excited linearpredictive (CELP) technique has the potential for producing high quality synthetic speech at bit rates as low as 4.8 kb/s. Most of the complexity in the CELP coders comes from the search used t...
详细信息
The Code Excited linearpredictive (CELP) technique has the potential for producing high quality synthetic speech at bit rates as low as 4.8 kb/s. Most of the complexity in the CELP coders comes from the search used to select an optimal excitation sequence from a code book of stochastic vectors. This paper describes three fast search methods. The key idea here is to inverse filter the actual speech by the formant and pitch filters to produce a residual error sequence (RES). The residual error is used to identify a neighborhood or a subset of codes for further processing. The first method, called Dynamic Nearest Neighborhood (DNN), attempts to dynamically construct a neighborhood of the 6 codes of maximum correlation with the residual error. The second method, called Nearest Fixed Neighborhood (NFN), clusters the code book into a fixed number of cells, then code search is performed on the codes of the nearest cell to the RES. The two methods achieve a reduction in the search procedure by a factor of 8-20 times. The third method combines the advantages of the first two methods to attain a reduction of operations from 40 to 50 times. The performance of these techniques and some of their ramifications will also be addressed.
Semi-variogram estimators and distortion measures of signal spectra are utilized in this paper for image texture retrieval. On the use of the complete Brodatz database, most high retrieval rates are reportedly based o...
详细信息
Semi-variogram estimators and distortion measures of signal spectra are utilized in this paper for image texture retrieval. On the use of the complete Brodatz database, most high retrieval rates are reportedly based on multiple features and the combinations of multiple algorithms, while the classification using single features is still a challenge to the retrieval of diverse texture images. The semi-variogram, which is theoretically sound and the cornerstone of spatial statistics, has the characteristics shared between true randomness and complete determinism and, therefore, can be used as a useful tool for both the structural and statistical analysis of texture images. Meanwhile, spectral distortion measures derived from the theory of linear predictive coding provide a rigorously mathematical model for signal-based similarity matching and have been proven useful for many practical pattern classification systems. Experimental results obtained from testing the proposed approach using the complete Brodatz database, and the the University of Illinois at Urbana-Champaign texture database suggests the effectiveness of the proposed approach as a single-feature-based dissimilarity measure for real-time texture retrieval.
Special fast procedures for the code excited linear predictive coding (CELP) algorithm have been developed to make implementation on modest hardware possible. The advantages, as well as the disadvantages, of the vario...
详细信息
Special fast procedures for the code excited linear predictive coding (CELP) algorithm have been developed to make implementation on modest hardware possible. The advantages, as well as the disadvantages, of the various fast procedures are discussed. A general formalism for the algorithm is developed, followed by the discussion of the individual procedures which are grouped according to their features. Along with the computational complexity of each procedure, its storage requirement and numerical accuracy are discussed. A large number of the fast procedures are designed to search through a particular type of codebook (most of the codebooks are stochastic in character, while a few are deterministic). Other fast procedures can be used for arbitrary codebooks and are thus also applicable to trained codebooks. Some of the fast procedures designed for stochastic codebooks can also be used for the computation of the closed pitch loop parameters, which can be interpreted as a search through a time-dependent codebook.< >
This paper describes the design of a speech coder called pitch synchronous innovation CELP (PSI-CELP) for low bit-rate mobile communications. PSI-CELP is based on CELP, but has more adaptive excitation structures. In ...
详细信息
This paper describes the design of a speech coder called pitch synchronous innovation CELP (PSI-CELP) for low bit-rate mobile communications. PSI-CELP is based on CELP, but has more adaptive excitation structures. In voiced frames, instead of conventional random excitation vectors, PSI-CELP converts even the random excitation vectors to have pitch periodicity by repeating stored random vectors as well as by using an adaptive codebook. In silent, unvoiced, and transient frames, the coder stops using the adaptive codebook and switches to fixed random codebooks. The PSI-CELP coder also implements novel structures and techniques: an FIR-type perceptual weighting filter using unquantized LPC parameters, a random codebook with a conjugate structure trained to be robust against channel errors, codebook search with delayed decision, a gain quantization with sloped amplitude, and a moving average prediction coding of LSP parameters. Our speech coder is implemented by DSP chips. Its coded speech quality at 3.6 kb/s with 2.0 kb/s redundancy is comparable to that of the Japanese full-rate VSELP coder at 6.7 kb/s with 4.5 kb/s redundancy. The basic structure of this PSI-CELP coder has been chosen as the Japanese half-rate speech codec for digital cellular telecommunications.
This paper discusses a speech-and-speaker (SAS) identification system based on spoken Arabic digit recognition. The speech signals of the Arabic digits from zero to ten are processed graphically (the signal is treated...
详细信息
This paper discusses a speech-and-speaker (SAS) identification system based on spoken Arabic digit recognition. The speech signals of the Arabic digits from zero to ten are processed graphically (the signal is treated as an object image for further processing). The identifying and classifying methods are performed with Burg's estimation model and the algorithm of Toeplitz matrix minimal eigenvalues as the main tools for signal-image description and feature extraction. At the stage of classification, both conventional and neural-network-based methods are used. The success rate of the speaker-identifying system obtained in the presented experiments for individually uttered words is excellent and has reached about 98.8% in some cases. The miss rate of about 1.2% was almost only because of false acceptance (13 miss cases in 1100 tested voices). These results have promisingly led to the design of a security system-for SAS identification. The average overall,success rate was then 97.45% in recognizing one uttered word and identifying its speaker, and 92.5% in recognizing a three-digit password (three individual words), which is really a high success rate because, for compound cases, we should successfully test all the three uttered words consecutively in addition to and after identifying their speaker;hence, the probability of making an error is basically higher. The authors' major contribution to this task involves building a system to recognize both the uttered words and their speaker through an innovative graphical algorithm for feature extraction from the voice signal. This Toeplitz-based algorithm reduces the amount of computations from operations on an n x n matrix that contains n(2) different elements to a matrix (of Toeplitz form) that contains only n elements that are different from each other.
linear predictive coding (LPC) analysis of speech is made using a stationary model while parts of speech such as stop consonants are highly nonstationary. An asymptotic analysis is made of the stability of the LPC mod...
详细信息
linear predictive coding (LPC) analysis of speech is made using a stationary model while parts of speech such as stop consonants are highly nonstationary. An asymptotic analysis is made of the stability of the LPC model obtained from a simplified model of a nonstationary waveform. This model is used to predict the occurrence of unstable LPC models in the analysis of a stop consonant.< >
This correspondence presents a new two-stage adaptive vector quantizer of LSF parameters in LPC speech coding. The first codebook is adapted by a partition-delete operation, whereas the code-vectors of the second code...
详细信息
This correspondence presents a new two-stage adaptive vector quantizer of LSF parameters in LPC speech coding. The first codebook is adapted by a partition-delete operation, whereas the code-vectors of the second codebook remain unchanged. The objective and subjective evaluations show that the proposed scheme offers transparent quantization with 22 b/frame.
A newly developed PARCOR coefficients quantisation scheme is presented which is referred to as trellis in-loop PARCOR coefficients quantisation. In the in-loop quantisation scheme, each PARCOR coefficient is determine...
详细信息
A newly developed PARCOR coefficients quantisation scheme is presented which is referred to as trellis in-loop PARCOR coefficients quantisation. In the in-loop quantisation scheme, each PARCOR coefficient is determined and quantised inside the analysis computation loop prior to the high order one. Therefore, the quantisation error is taken into account at each stage of lattice filter analysis. This scheme, when combined with trellis structure and associated with appropriate search algorithms, can achieve lower distortions than the conventional scalar quantisation. Simulation results show that by incorporating trellis structure into the in-loop quantisation scheme and using (M, L)-search algorithm with M = 4 and L = 8, an average gain of six bits per frame over the traditional methods can be achieved.
Parallel, self organizing, hierarchical neural networks (PSHNN's) are multistage networks in which stages operate in parallel rather than in series during testing, Each stage can be any particular type of network,...
详细信息
Parallel, self organizing, hierarchical neural networks (PSHNN's) are multistage networks in which stages operate in parallel rather than in series during testing, Each stage can be any particular type of network, Previous PSHNN's assume quantized, say, binary outputs, A new type of PSHNN is discussed such that the outputs are allowed to be continuous-valued. The performance of the resulting networks is tested in the problem of predicting speech signal samples from past samples, Three types of networks in which the stages are learned by the delta rule, sequential least-squares, and the backpropagation (BP) algorithm, respectively, are described, In all cases studied, the new networks achieve better performance than linear prediction, A revised BP algorithm is discussed for learning input nonlinearities. When the BP algorithm is to be used, better performance is achieved when a single;BP network is replaced by a PSHNN of equal complexity in which each stage is a BP network of smaller complexity than the single BP network.
暂无评论