Semi-variogram estimators and distortion measures of signal spectra are utilized in this paper for image texture retrieval. On the use of the complete Brodatz database, most high retrieval rates are reportedly based o...
详细信息
Semi-variogram estimators and distortion measures of signal spectra are utilized in this paper for image texture retrieval. On the use of the complete Brodatz database, most high retrieval rates are reportedly based on multiple features and the combinations of multiple algorithms, while the classification using single features is still a challenge to the retrieval of diverse texture images. The semi-variogram, which is theoretically sound and the cornerstone of spatial statistics, has the characteristics shared between true randomness and complete determinism and, therefore, can be used as a useful tool for both the structural and statistical analysis of texture images. Meanwhile, spectral distortion measures derived from the theory of linear predictive coding provide a rigorously mathematical model for signal-based similarity matching and have been proven useful for many practical pattern classification systems. Experimental results obtained from testing the proposed approach using the complete Brodatz database, and the the University of Illinois at Urbana-Champaign texture database suggests the effectiveness of the proposed approach as a single-feature-based dissimilarity measure for real-time texture retrieval.
This paper discusses a speech-and-speaker (SAS) identification system based on spoken Arabic digit recognition. The speech signals of the Arabic digits from zero to ten are processed graphically (the signal is treated...
详细信息
This paper discusses a speech-and-speaker (SAS) identification system based on spoken Arabic digit recognition. The speech signals of the Arabic digits from zero to ten are processed graphically (the signal is treated as an object image for further processing). The identifying and classifying methods are performed with Burg's estimation model and the algorithm of Toeplitz matrix minimal eigenvalues as the main tools for signal-image description and feature extraction. At the stage of classification, both conventional and neural-network-based methods are used. The success rate of the speaker-identifying system obtained in the presented experiments for individually uttered words is excellent and has reached about 98.8% in some cases. The miss rate of about 1.2% was almost only because of false acceptance (13 miss cases in 1100 tested voices). These results have promisingly led to the design of a security system-for SAS identification. The average overall,success rate was then 97.45% in recognizing one uttered word and identifying its speaker, and 92.5% in recognizing a three-digit password (three individual words), which is really a high success rate because, for compound cases, we should successfully test all the three uttered words consecutively in addition to and after identifying their speaker;hence, the probability of making an error is basically higher. The authors' major contribution to this task involves building a system to recognize both the uttered words and their speaker through an innovative graphical algorithm for feature extraction from the voice signal. This Toeplitz-based algorithm reduces the amount of computations from operations on an n x n matrix that contains n(2) different elements to a matrix (of Toeplitz form) that contains only n elements that are different from each other.
linear predictive coding (LPC) analysis of speech is made using a stationary model while parts of speech such as stop consonants are highly nonstationary. An asymptotic analysis is made of the stability of the LPC mod...
详细信息
linear predictive coding (LPC) analysis of speech is made using a stationary model while parts of speech such as stop consonants are highly nonstationary. An asymptotic analysis is made of the stability of the LPC model obtained from a simplified model of a nonstationary waveform. This model is used to predict the occurrence of unstable LPC models in the analysis of a stop consonant.< >
This correspondence presents a new two-stage adaptive vector quantizer of LSF parameters in LPC speech coding. The first codebook is adapted by a partition-delete operation, whereas the code-vectors of the second code...
详细信息
This correspondence presents a new two-stage adaptive vector quantizer of LSF parameters in LPC speech coding. The first codebook is adapted by a partition-delete operation, whereas the code-vectors of the second codebook remain unchanged. The objective and subjective evaluations show that the proposed scheme offers transparent quantization with 22 b/frame.
Parallel, self organizing, hierarchical neural networks (PSHNN's) are multistage networks in which stages operate in parallel rather than in series during testing, Each stage can be any particular type of network,...
详细信息
Parallel, self organizing, hierarchical neural networks (PSHNN's) are multistage networks in which stages operate in parallel rather than in series during testing, Each stage can be any particular type of network, Previous PSHNN's assume quantized, say, binary outputs, A new type of PSHNN is discussed such that the outputs are allowed to be continuous-valued. The performance of the resulting networks is tested in the problem of predicting speech signal samples from past samples, Three types of networks in which the stages are learned by the delta rule, sequential least-squares, and the backpropagation (BP) algorithm, respectively, are described, In all cases studied, the new networks achieve better performance than linear prediction, A revised BP algorithm is discussed for learning input nonlinearities. When the BP algorithm is to be used, better performance is achieved when a single;BP network is replaced by a PSHNN of equal complexity in which each stage is a BP network of smaller complexity than the single BP network.
A newly developed PARCOR coefficients quantisation scheme is presented which is referred to as trellis in-loop PARCOR coefficients quantisation. In the in-loop quantisation scheme, each PARCOR coefficient is determine...
详细信息
A newly developed PARCOR coefficients quantisation scheme is presented which is referred to as trellis in-loop PARCOR coefficients quantisation. In the in-loop quantisation scheme, each PARCOR coefficient is determined and quantised inside the analysis computation loop prior to the high order one. Therefore, the quantisation error is taken into account at each stage of lattice filter analysis. This scheme, when combined with trellis structure and associated with appropriate search algorithms, can achieve lower distortions than the conventional scalar quantisation. Simulation results show that by incorporating trellis structure into the in-loop quantisation scheme and using (M, L)-search algorithm with M = 4 and L = 8, an average gain of six bits per frame over the traditional methods can be achieved.
This work aims to present a combined version of reduced candidate mechanism (RCM) and iteration-free pulse replacement (IFPR) as a novel and efficient way to enhance the performance of algebraic codebook search in an ...
详细信息
This work aims to present a combined version of reduced candidate mechanism (RCM) and iteration-free pulse replacement (IFPR) as a novel and efficient way to enhance the performance of algebraic codebook search in an algebraic code-excited linear-prediction speech coder. As the first step, individual pulse contribution in each track is given by RCM, and the value of N is then specified. Subsequently, the replacement of a pulse is performed through the search over the sorted top N pulses by IFPR, and those of 2-4 pulses are carried out by a standard IFPR. Implemented on a G.729A speech codec, this proposal requires as few as 20 searches, a search load tantamount to 6.25% of G.729A, 31.25% of the global pulse replacement method (iteration = 2), 41.67% of IFPR, but still provides a comparable speech quality in any case. The aim of significant search performance improvement is hence achieved in this work.
The direct use of vector quantization (VQ) to encode LPC parameters in a communication system suffers from the following two limitations: 1) complexity of implementation for large vector dimensions and codebook sizes ...
详细信息
The direct use of vector quantization (VQ) to encode LPC parameters in a communication system suffers from the following two limitations: 1) complexity of implementation for large vector dimensions and codebook sizes and 2) sensitivity to errors in the received indices due to noise in the communication channel. In the past, these issues have been simultaneously addressed by designing channel matched multistage vector quantizers (CM-MSVQ). A sub-optimal sequential design procedure has been used to train the codebooks of the CM-MSVQ. In this paper, a novel channel-optimized multistage vector quantization (CO-MSVQ) codec is presented, in which the stage codebooks are jointly designed. The proposed codec uses a source and channel-dependent distortion measure to encode line spectral frequencies derived from segments of a speech signal. Extensive simulation results are provided to demonstrate the consistent reduction in both the mean and the variance of the spectral distortion obtained using the proposed codec relative to the conventional sequentially designed CM-MSVQ. Furthermore, the perceptual quality of the reconstructed speech using the proposed codec was found to be better than that obtained using the sequentially designed CM-MSVQ.
A linear predictive coding (LPC) excitation signal composed of a set of orthogonal functions called zinc functions is introduced. These functions are shown to form a complete orthogonal set and have properties that ar...
详细信息
A linear predictive coding (LPC) excitation signal composed of a set of orthogonal functions called zinc functions is introduced. These functions are shown to form a complete orthogonal set and have properties that are well suited for modeling the LPC residual signal. A benchmark comparison between Fourier series and zinc function modeling shows that the zinc function model for the residual is superior in the mean-squared-error sense. The zinc basis functions are used in two low-bit-rate speech coding systems targeted at the 4.8-9.6-kb/s range. The first is a zinc excited LPC (ZELPC) system, where the voiced excitation is modeled using the zinc functions while the unvoiced excitation is represented by the usual white noise source. The second system is a zinc multipulse LPC (ZMPLPC) system, where the LPC excitation is constructed using the zinc basis functions instead of the usual ideal impulses. Results show that, given a fixed segmental signal-to-noise ratio with similar computational complexity, the ZMPLPC system is more efficient than a conventional multipulse LPC (MPLPC) system. Subjective listening tests also indicate a preference for the ZMPLPC system.< >
暂无评论