Line Spectrum Pairs(LSP) representation of linear predictive coding coefficients is widely used in speech coding,speech recognition and other domains due to its desirable interpolation and quantization *** methods pro...
详细信息
Line Spectrum Pairs(LSP) representation of linear predictive coding coefficients is widely used in speech coding,speech recognition and other domains due to its desirable interpolation and quantization *** methods proposed for calculating LSP parameters have been complicated by high computation *** paper proposed an effective and efficient algorithm APF using Aitken iterative method and polynomial synthesis *** parameters were estimated by obtaining a root of N-order nonlinear equation by Aitken iterative method at first,then decreasing degrees with polynomial synthesis division,and finally calculating quartic equation using Ferrari's *** analysis and experiment results show that the proposed algorithm has not only high precision but also low calculation complexity.
During speech production, the significant excitation of the vocal tract occurs at places like instants of glottal closure and the onset of events like bursts. Hence the energy in the excitation signal is high around t...
详细信息
During speech production, the significant excitation of the vocal tract occurs at places like instants of glottal closure and the onset of events like bursts. Hence the energy in the excitation signal is high around these places and these instants play an important role in the perception of speech. This property is exploited in this work to derive the weight function. The linear prediction (LP) residual of the spectral subtracted speech signal is weighted by the weight function to enhance the significant excitation of speech. The musical noise in the enhanced speech obtained by exciting the time varying all pole filter with the modified residual is found to be reduced significantly
This paper describes continuous speech recognition experiments on a Romanian language speech database, by using hidden Markov models (EMM). We compare the recognition rates obtained in our ASR system realising front-e...
详细信息
This paper describes continuous speech recognition experiments on a Romanian language speech database, by using hidden Markov models (EMM). We compare the recognition rates obtained in our ASR system realising front-ends based on features extracted by perceptual variants of cepstral analysis and linear prediction and by simple linear prediction. The best results obtained with 36 coefficients mel-frequency cepstral coefficients (MFCC) are used as basis to rank the front-ends based on LPC. The second rank is very promising for the performance obtained with 5 perceptual linear prediction (PLP) coefficients, obviously better at the last ranked performance of the simple linear prediction coefficients (LPC). We reorganized the database as follows: one database for male speakers, one database for female speakers and one database for both male and female speakers
The dynamic structure linearpredictive coder (DLPC) is proposed. The analysis filter structure is dynamic, i.e., the order number is varied according to the analyzing gain. It is adjusted so as to the achieved gain i...
详细信息
The dynamic structure linearpredictive coder (DLPC) is proposed. The analysis filter structure is dynamic, i.e., the order number is varied according to the analyzing gain. It is adjusted so as to the achieved gain is less than the reference gain value. The main advantage is that the DLPC provides low bit-rate of transmission. Also, the subjective and objective measurements show that the DLPC yields relative quality of the synthesized speech compared to that of the conventional LPC coding
Traditional voice activity detection algorithms are mostly threshold-based or statistical model-based. All those methods are absent of the ability to react quickly to variations of environments. This paper describes a...
详细信息
Traditional voice activity detection algorithms are mostly threshold-based or statistical model-based. All those methods are absent of the ability to react quickly to variations of environments. This paper describes an incremental SVM (support vector machine) method for speech activity detection. The proposed incremental procedure makes it adaptive to variation of environments and the special construction of incremental training data set decreases computing consumption effectively. Experiments results demonstrated its higher end point detection accuracy. Further work will be focused on decreasing computing consumption and importing multi-class SVM classifiers
Streaming a lecture video via the Internet is important for e-learning. We have developed a system that generates a lecture video using virtual camerawork based on shooting techniques of broadcast cameramen. However, ...
详细信息
Streaming a lecture video via the Internet is important for e-learning. We have developed a system that generates a lecture video using virtual camerawork based on shooting techniques of broadcast cameramen. However, viewing a full-length video takes time for students. In this paper, we propose a method for generating a time shrunk lecture video using event detection. We detect two kinds of events: a speech period and a chalkboard writing period. A speech period is detected by voice activity detection with LPC cepstrum and classified into speech or non-speech using Mahalanobis distance. To detect chalkboard writing periods, we use a graph cuts technique to segment a precise region of interests such as an instructor. By deleting content-free periods, i.e, period without the events of speech and writing, and fast-forwarding writing periods, our method can generate a time shrunk lecture video automatically. The resulting generated video is about 20%~30% shorter than the original video in time. This is almost the same as the results of manual editing by a human operator
This paper proposes a music signal synthesis scheme that is based on sinusoid modeling and sliding-window ESPRIT. Despite widely used audio coding standards, effectively synthesizing music using sinusoid models, more ...
详细信息
This paper proposes a music signal synthesis scheme that is based on sinusoid modeling and sliding-window ESPRIT. Despite widely used audio coding standards, effectively synthesizing music using sinusoid models, more suitable for harmonic rich music signals, remains an open issue. In the proposed scheme, music signals are modeled by a sum of damped sinusoids in noise. A sliding window ESPRIT algorithm is applied. A continuity constraint is then imposed for tracking the time trajectories of sinusoids in music and for removing spurious spectral peaks in order to adapt to the changing number of sinusoid contents in dynamic music. Simulations have been performed to several music signals with a range of complexities, including music recorded from banjo, flute and music with mixed instruments. The results from listening and spectrograms have strongly indicated that the proposed method is very robust for music synthesis with good quality
A novel modeling method for glottal source is proposed for improving the naturalness and quality of synthetic speech. This paper utilizes the high correlation between vocal tract parameters and glottal source to model...
详细信息
A novel modeling method for glottal source is proposed for improving the naturalness and quality of synthetic speech. This paper utilizes the high correlation between vocal tract parameters and glottal source to model glottal source. Vocal tract parameters (LSF) are clustered into some classes. Within each class, a LSF vector closest to centroid and its corresponding glottal wave derivative are selected as a code vector representing different phonetic class of voiced speech. At the stage of voice conversion or synthesis, we can find the relevant glottal source by virtual of finding the closest matched vocal tract parameters. Experiment results show that this vocal tract related glottal source model significantly outperform Rosenberg model and LF model. Correlation coefficients between vocal tract related glottal source and original glottal source increase 27% and 30.13%, spectral distance between synthetic speech and original speech reduce 50.5% and 51.48% respectively, comparing with Rosenberg model and LF model
This paper proposes a new QR-decomposition-based recursive frequency estimation algorithm for multiple sinusoids based on the linear prediction (LP) approach. It extends the batch processing algorithm of So et al. in ...
详细信息
This paper proposes a new QR-decomposition-based recursive frequency estimation algorithm for multiple sinusoids based on the linear prediction (LP) approach. It extends the batch processing algorithm of So et al. in order to process the input samples recursively at a much lower arithmetic complexity for supporting on-line applications. Furthermore, a weighted least M-estimate (WLM) algorithm is developed to improve robustness to impulsive noise. Simulation results show that the robust recursive frequency estimator has a better performance than the conventional LS estimation in impulsive noise environment
The United States Army Aviation community requested that MITRE examine the possibility of transmitting beyond line of sight (BLOS) voice over the preexisting blue force tracking (BFT) infrastructure for properly equip...
详细信息
The United States Army Aviation community requested that MITRE examine the possibility of transmitting beyond line of sight (BLOS) voice over the preexisting blue force tracking (BFT) infrastructure for properly equipped Apache A model helicopters. BFT is a low bandwidth satellite based command and control system originally intended to provide situational awareness for the platforms and command centers possessing this technology. By slightly enhancing the current BFT transceiver configuration on the helicopter and at the ground units while integrating a low data rate audio processing unit, MITRE proved successful in transmitting intelligible audio in pseudo real time utilizing less then 1300 bps
暂无评论