Subband-autocorrelation (SBCOR) analysis is a noise robust acoustic analysis based on filter bank and autocorrelation analysis, and aims to extract the periodicities associated with the inverse of the center frequency...
详细信息
Subband-autocorrelation (SBCOR) analysis is a noise robust acoustic analysis based on filter bank and autocorrelation analysis, and aims to extract the periodicities associated with the inverse of the center frequency in a subband. In this paper, it is derived that SBCOR results in the lateral inhibitive weighting (LIW) processing of the power spectrum, and it is shown that the LIW is significantly effective for noise robust acoustic analysis using a DTW word recognizer. An interpretation of the LIW is also described. A flattening technique of the noise spectral envelope using an LPC inverse filter is applied to speech degraded with noise, and DTW word recognition is performed. The idea of this inverse filtering technique comes from weakening the strong periodic components included in noise. The experimental results using a 32th order LPC inverse filter show that the recognition performance of SBCOR (or LIW) is improved for computer room noise.
Pattern Recognition and Audio Processing are important aspects in the control and behavior of mobile robots. Mobile robot's action depends on the recognition of visual and audio stimuli in order to reflect intelli...
详细信息
Pattern Recognition and Audio Processing are important aspects in the control and behavior of mobile robots. Mobile robot's action depends on the recognition of visual and audio stimuli in order to reflect intelligent behavior of the robot. This work presents two recognition systems developed using morphological operations and linear predictive coding (LPC) with Backpropagation Neural Networks (BNN) to process visual and audio data respectively. The objective is to design a person tracking system for a mobile robot with text dependent pitch recognition and a visual pattern recognition mechanism. The BNN will awake the robot from the idle position, while the visual stimulation will be used to track the person given the command.
For mobile communication systems computational complexity and memory requirements are serious problems in real-time digital signal processing of speech signal. In this article we proposed new structuralization algorit...
详细信息
For mobile communication systems computational complexity and memory requirements are serious problems in real-time digital signal processing of speech signal. In this article we proposed new structuralization algorithm intended to split vector quantizer codebook of LSF coefficients. Fast search procedure, based on structure of codebook and description tree, allows reduce the entire quantity of comparisons over searching the codebook. Our approach allows us to eliminate in search procedure codevectors with minimal probability of belonging to solution and create fast codebook search algorithm with significant decrease of complexity.
The authors introduce a spectral analysis technique for LPC systems based on using a high-order lattice filter having only few nonzero reflection coefficients. An efficient algorithm based on dynamic programming techn...
详细信息
The authors introduce a spectral analysis technique for LPC systems based on using a high-order lattice filter having only few nonzero reflection coefficients. An efficient algorithm based on dynamic programming techniques has been developed to find the nonzero coefficients and their corresponding delays for the thinned lattice filter. Simulation results show that the thinned filter approach can achieve more accurate spectrum fitting to speech signals.< >
This paper describes a discrete utterance recognition technique which applies formal language theory to a symbol string derived from the speech input. Analysis is performed to obtain a representation of the input utte...
详细信息
This paper describes a discrete utterance recognition technique which applies formal language theory to a symbol string derived from the speech input. Analysis is performed to obtain a representation of the input utterance in terms of acoustically consistent labeled regions. Syntactic pattern recognition is then used to parse this representation of the input word using stored context-free grammars for the allowed vocabulary. Preliminary results are reported.
An isolated word recognizer has been evaluated using a large data base of telephone-band digit utterances recorded by 100 talkers. Three reference template configurations of the recognizer have been studied, one talke...
详细信息
An isolated word recognizer has been evaluated using a large data base of telephone-band digit utterances recorded by 100 talkers. Three reference template configurations of the recognizer have been studied, one talker independent and two talker dependent. The talker dependent configurations are a 1 template per word system obtained by robust training and a 5 template per word system. Performance has been studied in terms of the distribution of error rates across the three template configurations, the effect on performance of varying a decision rule parameter, applying a rejection threshold, and normalizing test and reference utterance lengths has also been investigated. Overall average error rates obtained are 2.17% for the talker independent system and 2.77% and 0.77% for the 1-template and 5-template talker dependent systems, respectively.
A flexible analysis-synthesis system with signal dependent features is described and used to realize some desired voice characteristics in synthesized speech. The intelligibility of synthetic speech appears to depend ...
详细信息
A flexible analysis-synthesis system with signal dependent features is described and used to realize some desired voice characteristics in synthesized speech. The intelligibility of synthetic speech appears to depend on the ability to reproduce dynamic sounds such as stops, whereas the quality of voice is mainly determined by the true reproduction of voiced segments. We describe our work in converting the speech of one speaker to sound like that of another. A number of factors are important for maintaining the quality of the voice during this conversion process. These factors are derived from both the speech and electroglottograph signals.
linear predictive coding (LPC), and transformations of it, is currently the most popular way of analysing speech signals. Major limitations of using a frame-based technique are that each frame is analysed in isolation...
详细信息
linear predictive coding (LPC), and transformations of it, is currently the most popular way of analysing speech signals. Major limitations of using a frame-based technique are that each frame is analysed in isolation of the rest while assuming the excitation source to be a white noise process. In order to reduce computation time, an all pole model is usually employed. In the present project an adaptive algorithm is proposed for speech signal analysis. The algorithm is based on the recursive least squares method with a variable forgetting factor. A pole-zero model is used to estimate the anti-formants present in certain sounds (i.e. nasals and nasalized vowels). This method offers better detection of poles and zeros in stationary environments and faster tracking of pole and zero frequencies in nonstationary signals than other sequential methods. An effective input estimation algorithm eliminates the influence of pitch on the parameter estimates by assuming the input to be a white noise process or a pulse sequence.
Our objective consists in studying collaborative situations where an introduction of a new agent into the system increases the performance of the group. This work is a part of the road traffic simulation model ARCHISI...
详细信息
Our objective consists in studying collaborative situations where an introduction of a new agent into the system increases the performance of the group. This work is a part of the road traffic simulation model ARCHISIM in which a model of the behavior of the drivers has already been developed and validated. Our idea is to re-use this structure to define a collaborative behavior. For this purpose, we add a coordination layer to the basic driver agent behavior. The use of a simple reactive coordination strategy called "situated coordination" allows the emergence of a coherent group of agents that coordinate their activities. Experiments earned to measure the performance of the group, each time a new agent is introduced, show satisfactory results. We also demonstrate that the agents succeeded in coordinating themselves even if the degree of the constraints imposed by the simulated environment is very high.
With reducing computational complexity, an approximated correlation matrix of the vocal impulse response is proposed in algebraic-code-excited linear prediction (ACELP) coders. By exploring statistical characteristics...
详细信息
With reducing computational complexity, an approximated correlation matrix of the vocal impulse response is proposed in algebraic-code-excited linear prediction (ACELP) coders. By exploring statistical characteristics, we only need to calculate a small portion of correlation coefficients before ACELP search procedure. If we further combine a pulse position prediction algorithm, we can reduce the arithmetic complexity in pre-computing autocorrelation matrix and the number of pulse position combinations with imperceptible degradation in speech quality performance. The proposed scheme can be applied to all ACELP coders such as ITU G.729 and G.723.1
暂无评论