Spectral cues are important for speech intelligibility, but they are distorted, in most popular multi-band compression algorithms for hearing aids, by being split into multiple bands and processed by different amplifi...
详细信息
ISBN:
(纸本)0780388747
Spectral cues are important for speech intelligibility, but they are distorted, in most popular multi-band compression algorithms for hearing aids, by being split into multiple bands and processed by different amplifiers. The paper presents a morphology-based method, which can preserve primary spectral cues and have low computation cost. Speech testing results demonstrate its efficiency.
The paper presents a speaker verification system that uses the YOHO database which has been coded to the ITU-T G.729 standard. A set of bitstream based features, consisting of 16 LPC cepstral coefficients and MFCC der...
详细信息
The paper presents a speaker verification system that uses the YOHO database which has been coded to the ITU-T G.729 standard. A set of bitstream based features, consisting of 16 LPC cepstral coefficients and MFCC derived from the quantized line spectral pairs as well as residual information in the form of pitch, was utilized to construct the speakers' models, and their robustness was studied under white noise conditions. Results suggest that, using a cohort model, MFCC are more robust under noise conditions than LPC cepstral coefficients; the addition of pitch to the feature vector contributes from a 16% to a 29% of improvement in verification performance under different noise conditions.
Glottal inverse filtering is a process where the effects of the vocal tract are cancelled from the speech signal in order to estimate the voice source. Traditionally, inverse filtering methods have involved a high lev...
详细信息
Glottal inverse filtering is a process where the effects of the vocal tract are cancelled from the speech signal in order to estimate the voice source. Traditionally, inverse filtering methods have involved a high level of manual tuning of parameters, such as the vocal tract model order. We present objective heuristics for the measurement of the quality of the resulting glottal flow estimate. In addition, we propose an automatic method for determining the order of the vocal tract all-pole model in inverse filtering based on phase-plane analysis and estimation of the glottal flow kurtosis.
An evaluation of the feature set of the vector difference (VD) based on fractional cosine and sine transform focusing on the high-order fractional domains for text-independent speaker recognition is elucidated in this...
详细信息
An evaluation of the feature set of the vector difference (VD) based on fractional cosine and sine transform focusing on the high-order fractional domains for text-independent speaker recognition is elucidated in this paper. The experiments have been done following the principles varying the number of the vector dimension and the power of the output parameters of fractional cosine and sine transform separately. The recognition results show that when the order of primary fractional domain is fixed to be 1 and the one of secondary fractional domain is 0.98, the correct recognition rate of the VD feature matches the one of the previous MFCC feature
Voice activity detection and comfort noise generation (VAD-CNG) algorithms are widely employed in packet voice communication systems to reduce transmission bandwidth. This paper is devoted to the investigation of effe...
详细信息
Voice activity detection and comfort noise generation (VAD-CNG) algorithms are widely employed in packet voice communication systems to reduce transmission bandwidth. This paper is devoted to the investigation of effective implementations of a modified version of a well-established fixed-point data-dependent VAD-CNG algorithm of Nortel Network on a TMS320C5402DSK DSP board. Certain optimizations that target reduction in the implementational complexity of the algorithm are introduced. Experimental results show that over 80% of the reduction in the implementational complexity is achieved through the proposed optimizations, making it possible to incorporate such a VAD-CNG algorithm into a practical real-time voice communication system. A real-time audio codec system is built in the laboratory to demonstrate a real-time implementation of this algorithm.
Robots are conveniently controlled by a human operator with spoken commands, since voice is a natural communication medium for humans. In order to successfully carry out a command, a robot needs to know which of the p...
详细信息
Robots are conveniently controlled by a human operator with spoken commands, since voice is a natural communication medium for humans. In order to successfully carry out a command, a robot needs to know which of the possibly many people gave the command and where this person is located. In this paper, we present a particle-filter based algorithm for localization of multiple speakers, in an environment where there is only one person speaking at a time. The algorithm incorporates person-specific voice features (vowel formant frequencies) in order to distinguish between the speakers. The voice features are supported by azimuth angle measurements obtained by a pair of microphones. We test our approach using the microphone system of the Philips iCat interface robot.
Many coding techniques have been proposed for the compression of speech signals. Speech signals are continuous signals shaped by the movement of the vocal tract. linearpredictive coefficients (LPC) and line spectral ...
详细信息
Many coding techniques have been proposed for the compression of speech signals. Speech signals are continuous signals shaped by the movement of the vocal tract. linearpredictive coefficients (LPC) and line spectral pairs (LSP) coefficients have been used to model the position of the vocal tract. Each speaker's vocal tract uses a unique, finite set of positions to generate their voice. This paper compares of the performance of ordered and disordered codebooks which employed to measure and classify the repetition of the speech signal coefficients. An analysis and results of the efficient performance of both types of codebooks in terms of the quality of synthesised speech are also provided. Also both ordered and disordered codebooks can reduce the number of bit rate transmission around 20% in speech coding system
This paper proposes an environmental sounds recognition system using LPC-cepstral coefficients for characterization and a backpropagation artificial neural network as verification method. LPC-cepstral data are totally...
详细信息
This paper proposes an environmental sounds recognition system using LPC-cepstral coefficients for characterization and a backpropagation artificial neural network as verification method. LPC-cepstral data are totally dependent on the sound-source from which they are computed. This system is evaluated using a database containing files of four different sound-sources under a variety of recording conditions. Two neural networks are trained with the magnitude of the discrete Fourier transform of the LPC-cepstral matrices. The global percentage of verification was of 96.66%. The percentage of verification can be improved if the number of feature vectors (coefficients) is incremented in the neural network-training phase. Basically the idea here is to apply the techniques founded in speech recognition systems to an environmental sounds recognition system.
暂无评论