Automatic speech recognition (ASR) is most promising area which is of special attention today. This paper proposes speech recognition system for Kannada vowels. The proposed framework consists of preprocessing unit an...
详细信息
ISBN:
(纸本)9781538623619
Automatic speech recognition (ASR) is most promising area which is of special attention today. This paper proposes speech recognition system for Kannada vowels. The proposed framework consists of preprocessing unit and classification unit. The preprocessing unit segments the speech signal into proper frames and extracts the features using linear predictive coding (LPC). The Kannada vowels are classified into the proper classes in the classification unit where Euclidean distance is used. The recognition accuracy obtained is about 40 %.
The Digital Waveguide Mesh is a technique used in the modelling of room acoustics and musical instruments. This paper details a project that applies the theory of waveguide mesh acoustic modelling to the production of...
详细信息
ISBN:
(纸本)0780378504
The Digital Waveguide Mesh is a technique used in the modelling of room acoustics and musical instruments. This paper details a project that applies the theory of waveguide mesh acoustic modelling to the production of human-like vowel sounds. A 2D software mesh model is created that approximates the shape of the vocal tract in different vowel positions, and a glottal flow input is applied. The resulting signal bears similar resonant frequencies or formants to that of recorded speech. Recommendations are made towards extending the model to include some of the more complex features of the mouth, potentially constructing an acoustical model of the human vocal tract capable of creating speech sounds of increased naturalness.
A low delay audio coding scheme with good perceptual audio quality for a desired limited bit rate is presented. The proposed audio coding scheme is based on differential pulse code modulation (DPCM) and block compande...
详细信息
ISBN:
(纸本)9781479908332
A low delay audio coding scheme with good perceptual audio quality for a desired limited bit rate is presented. The proposed audio coding scheme is based on differential pulse code modulation (DPCM) and block companded (BC) quantization. Prediction is realized as a FIR filter in lattice structure. DPCM performs in feedback manner, therefore no transmission of prediction filter coefficients is needed. The incorporation of BC quantization in the DPCM relies on a prediction error recalculation scheme. The use of BC quantization in the DPCM allows to accurately follow the prediction error signal. This improves the perceptual audio quality significantly compared to a plain DPCM with an adaptive quantizer. An algorithmic delay below a half millisecond and an overhead of less than a half bit per sample is introduced due to the short fixed block length of the BC quantizer. Therefore, a real time bidirectional audio application is achievable.
The duration of vowel steady-states (VSS) was examined acoustically in the speech production of 40 normal young adults. VSS was assessed according to formant frequency changes in sustained /i/ productions and consonan...
详细信息
The duration of vowel steady-states (VSS) was examined acoustically in the speech production of 40 normal young adults. VSS was assessed according to formant frequency changes in sustained /i/ productions and consonant + /i/ + /d/(/Cid/) productions. The duration of the VSS was measured for the first and second formants (F1 and F2) by incorporating a fixed rate-of-change criterion. Results indicated no significant differences in VSS duration according to gender or vowel context. VSS duration based on F1 was significantly longer than F2 VSS duration. The duration of VSS was also found to be correlated to the overall vowel duration in /Cid/ contexts. Discussion focuses on the analysis and application of VSS in acoustic studies of normal and disordered speech production.
In this paper a new feature extraction methods, which utilize reduced order linear predictive coding (LPC) coefficients for speech recognition, have been proposed The coefficients have been derived from the speech fra...
详细信息
ISBN:
(纸本)9781424424085
In this paper a new feature extraction methods, which utilize reduced order linear predictive coding (LPC) coefficients for speech recognition, have been proposed The coefficients have been derived from the speech frames decomposed using Discrete Wavelet Transform (DWT). In the literature it is assumed that the speech frame of size 10 msec to 30 msec is stationary, however, in practice different parts of the speech signal may convey different amount of information (hence may not be perfectly stationary). LPC coefficients derived from subband decomposition of speech frame provide better representation than modeling the frame directly. Experimentally it has been shown that, the proposed approaches provide effective (better recognition rate) and efficient (reduced feature vector dimension) features. The speech recognition system using the continuous Hidden Markov Model (HMM) has been implemented. The proposed algorithms are evaluated using NIST TI-46 isolated-word database.
Tree based context clustering processes reduce the sizes of acoustic models of Hidden Markov Model (HMM) speech synthesis systems as well as eliminate problems arising from unseen sound units. Representations of speec...
详细信息
ISBN:
(纸本)9781479979615
Tree based context clustering processes reduce the sizes of acoustic models of Hidden Markov Model (HMM) speech synthesis systems as well as eliminate problems arising from unseen sound units. Representations of speech units in speech synthesis systems are often LPC or MCEP features whose characteristics promote speech reconstruction rather than discrimination among different sound units. In this paper, MFCC features, successfully utilized in speech recognition, were selected as features for generating context clustering trees applied to LPC/MCEP-based speech synthesis. On average, the collective size of acoustic models was 29% smaller than ones of typical cases while spectral features generated from a speech synthesis system using each type of clustering trees did not significantly deviate from features extracted from actual spoken utterances. Applying MFCC-based clustering tree did not significantly affect the resulting pitch and duration models of the system. We concluded that MFCC-based clustering tree can reduce the overall size of acoustic models while synthetic sound quality is maintained.
In low bit-rate coders, the near-sample and far-sample redundancies of the speech signal are usually removed by a cascade of a short-term and a long-term linear predictor. These two predictors are usually found in a s...
详细信息
ISBN:
(纸本)9781424423538
In low bit-rate coders, the near-sample and far-sample redundancies of the speech signal are usually removed by a cascade of a short-term and a long-term linear predictor. These two predictors are usually found in a sequential and therefore suboptimal approach. In this paper we propose an analysis model that jointly finds the two predictors by adding a regularization term in the minimization process to impose sparsity constraints on a high order predictor. The result is a linear predictor that can be easily factorized into the short-term and long-term predictors. This estimation method is then incorporated into an Algebraic Code Excited linear Prediction scheme and shows to have a better performance than traditional cascade methods and other joint optimization methods, offering lower distortion and higher perceptual speech quality.
We introduce an efficient algorithm for real-time compression of temporally consistent dynamic 3D meshes. The algorithm uses mesh connectivity to determine the order of compression of vertex locations within a frame. ...
详细信息
ISBN:
(纸本)9781424404810
We introduce an efficient algorithm for real-time compression of temporally consistent dynamic 3D meshes. The algorithm uses mesh connectivity to determine the order of compression of vertex locations within a frame. Compression is performed in a frame to frame fashion using only the last decoded frame and the partly decoded current frame for prediction. Following the predictivecoding paradigm, local temporal and local spatial dependencies between vertex locations are exploited. In this framework we present a novel angle preserving predictor and evaluate its performance against other state of the art predictors. It is shown that the proposed algorithm improves up to 25% upon the current state of the art for compression of temporally consistent dynamic 3D meshes.
Several linear prediction vocoder modifications and an evaluation of their effects on intelligibility are presented. Diagnostic rhyme test (DRT) comparisons among 1) the fixed filter order, fixed analysis frame rate v...
详细信息
作者:
Merouane, BouzidUSTHB
Elect Fac Speech Commun & Signal Proc Lab Algiers 16111 Algeria
In this paper, an optimized trellis coded vector quantization (OTCVQ) system designed for efficient and robust coding of LSF spectral parameters is presented. The aim of this system, called at the beginning "LSF-...
详细信息
ISBN:
(纸本)9781424444564
In this paper, an optimized trellis coded vector quantization (OTCVQ) system designed for efficient and robust coding of LSF spectral parameters is presented. The aim of this system, called at the beginning "LSF-OTCVQ Encoder", is to achieve a low bit rate transparent quantization of the FS1016 LSF parameters. Once the effectiveness of the LSF-OTCVQ encoder was proven in the case of ideal transmissions over noiseless channel, we were interested after in the improvement of its robustness for real transmissions over noisy channel. To protect implicitly the transmission indices of the LSF-OTCVQ encoder incorporated in the FS1016, we used a joint source-channel coding carried out by the channel optimized vector quantization.
暂无评论