Emotion detection has gained increasing attention and become an active research area. The problem is solved with improved feature set with different number of feature groups, by employing different classifiers in orde...
详细信息
Emotion detection has gained increasing attention and become an active research area. The problem is solved with improved feature set with different number of feature groups, by employing different classifiers in order to achieve satisfactory recognition rate. In this study, speech related features are employed to evaluate the performance of different classifiers in emotion detection problem.
The Standard for the Exchange of Earthquake Data (SEED) is a commonly used file format for recording seismographic data. The data description language (DDL) is part of the specification of the SEED format; it is used ...
详细信息
The Standard for the Exchange of Earthquake Data (SEED) is a commonly used file format for recording seismographic data. The data description language (DDL) is part of the specification of the SEED format; it is used to describe the way in which the data has been encoded in the SEED file. All SEED files contain a piece of DDL code that describes how to read the data contained in the file. In this project, we explore the best possible way of compressing seismographic data losslessly under the constraint that it must be describable in DDL. Since DDL is not a Turing-complete language, it is impossible to implement many standard compression algorithms with it. However, we show it is possible to implement a modified Tunstall code under DDL that produces files that are on average 14% smaller than the traditionally used Steim2 compression technique. We also compare our results with a standard linear predictive coding scheme (which is not implementable in DDL) and show that our technique is on average 18% worse on the same set of files.
A novel algorithm for voice conversion is proposed in this paper. The mapping function of spectral vectors of the source and target speakers is calculated by the canonical correlation analysis (CCA) estimation based o...
详细信息
A novel algorithm for voice conversion is proposed in this paper. The mapping function of spectral vectors of the source and target speakers is calculated by the canonical correlation analysis (CCA) estimation based on Gaussian mixture models. Since the spectral envelope feature remains a majority of second order statistical information contained in speech after linear prediction (LPC) analysis, the CCA method is more suitable for spectral conversion than MMSE because CCA explicitly considers the variance of each component of the spectral vectors during conversion procedure. Both subjective and objective evaluations are conducted. The experimental results demonstrate that the proposed scheme can achieve better performance than the previous method which uses MMSE estimation criterion.
This paper presents a novel agent-based design for Arabic speech recognition. We define the Arabic speech recognition as a multi-agent-system where each agent has a specific goal and deals with that goal only. Once al...
详细信息
This paper presents a novel agent-based design for Arabic speech recognition. We define the Arabic speech recognition as a multi-agent-system where each agent has a specific goal and deals with that goal only. Once all the small tasks are accomplished the big task is too. A number of agents are required in order to recast Arabic speech recognition, namely the feature extraction agent and the pattern classification agent. These agents are detailed in this paper.
A better performing product code vector quantization (VQ) method is proposed for coding the line spectrum frequency (LSF) parameters; the method is referred to as sequential split vector quantization (SeSVQ). The spli...
详细信息
A better performing product code vector quantization (VQ) method is proposed for coding the line spectrum frequency (LSF) parameters; the method is referred to as sequential split vector quantization (SeSVQ). The split sub-vectors of the full LSF vector are quantized in sequence and thus uses conditional distribution derived from the previous quantized sub-vectors. Unlike the traditional split vector quantization (SVQ) method, SeSVQ exploits the inter sub-vector correlation and thus provides improved rate-distortion performance, but at the expense of higher memory. We investigate the quantization performance of SeSVQ over traditional SVQ and transform domain split VQ (TrSVQ) methods. Compared to SVQ, SeSVQ saves 1 bit and nearly 3 bits, for telephone-band and wide-band speech coding applications respectively.
linear predictive coding (LPC) has been used to compress and encode speech signals for digital transmission at a low bit rate. LPC determines a FIR system that predicts a speech sample from the past samples by minimiz...
详细信息
linear predictive coding (LPC) has been used to compress and encode speech signals for digital transmission at a low bit rate. LPC determines a FIR system that predicts a speech sample from the past samples by minimizing the squared error between the actual occurrence and the estimated. The coefficients of the FIR system are encoded and sent. At the receiving end, the inverse system called AR model is excited by a random signal to reproduce the encoded speech. The use of LPC can be extended to speech recognition since the FIR coefficients are the condensed information of a speech signal of typically 10ms -30ms. PARCOR parameter associated with LPC that represents a vocal tract model based on a lattice filter structure is considered for speech recognition. The use of FIR coefficients and the frequency response of AR model were previously investigated. [1] This paper reports the method to detect a limited number of phonemes from a continuous stream of speech. A system being developed slides a time window of 16 ms and calculates the PARCOR parameters continuously, feeding them to a classifier. A classifier is a supervised classifier that requires training. The classifier uses the Maximum Likelihood Decision Rule. The training uses TIMIT speech database, which contains the recordings of 630 speakers of 8 major dialects of American English. The classification results of some typical vowel and consonant phonemes segmented from the continuous speech are listed. The vowel and consonant correct classification rate are 65.22% and 93.51%. Overall, They indicate that the PARCOR parameters have the potential capability to characterize the phonemes.
An innovative scheme of voice morphing is proposed to make the speech of a source speaker sound like uttered by a target speaker. The morphing technique can hide people 's identity, age, gender while chatting and ...
详细信息
An innovative scheme of voice morphing is proposed to make the speech of a source speaker sound like uttered by a target speaker. The morphing technique can hide people 's identity, age, gender while chatting and doing other things related to the transformation of speech streams online, which can ensure the privacy on the prevalent internet. Speaker individuality transformation is achieved by altering the spectral envelope and estimating the excitation signal by modifying the fundamental pitch frequency in syllable units of the residual signal of the source speech based on linear prediction coding (LPC) model. The main advantage of this scheme relies in the aspect of having considered the dynamic characteristic of the pitch frequency, not just focusing on the average level, which enhances the performance of the whole conversion system compared with general concepts such as discrete pitch frequency mapping and so on. Moreover, in the aspect of the alignment of line spectral frequencies (LSFs) vectors, an advanced technique based on isolated syllables rather than the general dynamic time warping algorithm (DTW) is introduced. The experimental results show that the system is capable of effectively transforming speaker identity whilst the converted speech maintains high quality.
We describe a coding scheme based on audio and speech quantization with an adaptive quantizer derived from the autoregressive model under high-rate assumptions. The main advantage of this scheme compared to state-of-t...
详细信息
We describe a coding scheme based on audio and speech quantization with an adaptive quantizer derived from the autoregressive model under high-rate assumptions. The main advantage of this scheme compared to state-of-the-art training-based coders is its flexibility. The scheme can adapt in real time to any particular rate and has a computational complexity independent of the rate. Experiments indicate that, compared with a non-scalable conventional fixed-rate code-excited linearpredictive (CELP) coding scheme, our real time scalable coder with scalar quantization performs at least as well in the constrained entropy case, and has nearly identical performance for the constrained resolution case.
This paper studies the feasibility of information analysis processing technology, which fuses speech and image together in the real-time monitoring system. It emphasizes particularly on speech analysis and fuses these...
详细信息
ISBN:
(纸本)9780662478300;0662478304
This paper studies the feasibility of information analysis processing technology, which fuses speech and image together in the real-time monitoring system. It emphasizes particularly on speech analysis and fuses these two technologies in terms of scoring strategy. It also makes some improvement on MFCC feature extraction and proposes a quick MFCC algorithm. The proposed algorithm can reach the requirement of real-time system in case of the high precision. To prove it, this paper compares its algorithm with LPC and FFT. The experiment indicates that the EER of LPC is 13.9% and the EER of FFT is 11.1%, but by using the Quick MFCC the EER is only 4.2%. And compared with the traditional MFCC algorithm, the quick MFCC algorithm reduces the run time greatly while maintaining recognition accuracy of the system. Finally the rate of fusion recognition is about 97.8%, which is a good result for the real-time monitoring system.
In the present article we presented an automatic speech recognition approach to identify initially four voice words using the energy of the signal through of LPC (linear predict coding) and finally neural networks as ...
详细信息
In the present article we presented an automatic speech recognition approach to identify initially four voice words using the energy of the signal through of LPC (linear predict coding) and finally neural networks as recognition and classification techniques of speech parameters. The identification of speech command was obtained with a back propagation multilayer perceptron (MLP). The characteristics of the voice parameters in the time domain were processed, and the neural network was trained to classify and identify the speech commands. The implementation of this system has been test in one Development Starter Kit (DSK), Digital Signal Processing Card of 1 GHz, with reference TMS320C6416T of Texas Instruments, utilized in this application.
暂无评论