This paper presents a lowbitratespeech coder based on predictive lattice vector quantization (PLVQ) and time-scale modification (TSM). The coding model of proposed vocoder is built on the MELP, in which bitrate re...
详细信息
ISBN:
(纸本)9783037852958
This paper presents a lowbitratespeech coder based on predictive lattice vector quantization (PLVQ) and time-scale modification (TSM). The coding model of proposed vocoder is built on the MELP, in which bitrate reduction is achieved by taking advantage of PLVQ and TSM techniques. PLVQ is used to encode the speech line spectrum pair (LSP) parameters, which has the advantage of lower implementation complexity than multi-stage vector quantization (MSVQ), moreover, it does not require memory for codebook storage. With our speech data base, PLVQ can save up to 4 bits/frame compared to unstructured codebook MSVQ. TSM can change the speed of speech signal with its perceptual characteristics remained. Through appending TSM as previous and post process, speechcoding at bitrate about 1.1 kbps could be easily achieved without modifying the vocoder structure.
This paper examines in detail the design issues and performance characteristics of linear predictive coding (LPC) split matrix quantization (SMQ), This efficient LPC quantization method which was recently proposed by ...
详细信息
This paper examines in detail the design issues and performance characteristics of linear predictive coding (LPC) split matrix quantization (SMQ), This efficient LPC quantization method which was recently proposed by the authors [1] can be viewed as an extension of the conventional split vector quantization (SVQ) process. SMQ removes existing interframe/intraframe line spectral frequency (LSF) redundancy by applying VQ principles on trajectories of smoothly evolving, with time, LSF coefficients. Using a 20 ms LPC analysis frame size, "transparent'' quantization is achieved at 900 b/s, whereas "high quality" LSF quantization is easily obtained at 650 b/s, Furthermore, the SMQ methodology offers valuable flexibility in the way quantization of LPC coefficients is performed and leads into several schemes of varying computational complexity/storage characteristics.
This correspondence presents a new strategy to encode the LP short-time spectral envelope (stse) of speech. A better reconstruction of the stse is achieved by modifying the usual trade-off between the transmission rat...
详细信息
This correspondence presents a new strategy to encode the LP short-time spectral envelope (stse) of speech. A better reconstruction of the stse is achieved by modifying the usual trade-off between the transmission rate of LP parameters and the performance of the quantization algorithm. A differential coding based on bidirectional prediction and hybrid vector quantization is used to compensate the increase in transmission rate. Simulation results show the effectiveness of this coding strategy.
Good speech quality has been achieved using waveform matching and parametric reconstruction coders. Recently developed very lowbitrate generative codecs can reconstruct high quality wideband speech with bit streams ...
详细信息
ISBN:
(纸本)9781728176055
Good speech quality has been achieved using waveform matching and parametric reconstruction coders. Recently developed very lowbitrate generative codecs can reconstruct high quality wideband speech with bit streams less than 3 kb/s. These codecs use a DNN with parametric input to synthesise high quality speech outputs. Existing objective speech quality models (e.g., POLQA, ViSQOL) do not accurately predict the quality of coded speech from these generative models underestimating quality due to signal differences not highlighted in subjective listening tests. We present WARP-Q, a full-reference objective speech quality metric that uses dynamic time warping cost for MFCC speech representations. It is robust to small perceptual signal changes. Evaluation using waveform matching, parametric and generative neural vocoder based codecs as well as channel and environmental noise shows that WARP-Q has better correlation and codec quality ranking for novel codecs compared to traditional metrics in addition to versatility for general quality assessment scenarios.
Short-wave communication has unstable channel conditions, but it is still widely used in military and diplomatic fields due to its security, high destruction resistance and full coverage characteristics. Therefore, in...
详细信息
ISBN:
(纸本)9798400716171
Short-wave communication has unstable channel conditions, but it is still widely used in military and diplomatic fields due to its security, high destruction resistance and full coverage characteristics. Therefore, in order to adapt to transmission in shortwave channels, further research on lowrate voice coding algorithms under low reliable channel conditions is needed. In this paper, we propose a coding algorithm that converts linear prediction coefficients into line spectral pairs of frequency parameters, and on the basis of the traditional 2.4kbps LPC10 algorithm, we reduce the codingrate to 1.7kbps and score the synthesized results by the PESQ algorithm, and the results show that the improved algorithm yields a voice score of 2.1870, which is an increase of 0.9315 compared to the LCP10 algorithm, with a significant improvement in voice quality. The voice quality is greatly improved.
We investigate a vocoder based on artificial neural networks using a phonological speech representation. speech decomposition is based on the phonological encoders, realised as neural network classifiers, that are tra...
详细信息
We investigate a vocoder based on artificial neural networks using a phonological speech representation. speech decomposition is based on the phonological encoders, realised as neural network classifiers, that are tra...
详细信息
ISBN:
(纸本)9781467369985
We investigate a vocoder based on artificial neural networks using a phonological speech representation. speech decomposition is based on the phonological encoders, realised as neural network classifiers, that are trained for a particular language. The speech reconstruction process involves using a Deep Neural Network (DNN) to map phonological features posteriors to speech parameters - line spectra and glottal signal parameters - followed by LPC resynthesis. This DNN is trained on a target voice without transcriptions, in a semi-supervised manner. Both encoder and decoder are based on neural networks and thus the vocoding is achieved using a simple fast forward pass. An experiment with French vocoding and a target male voice trained on 21 hour long audio book is presented. An application of the phonological vocoder to low bit rate speech coding is shown, where transmitted phonological posteriors are pruned and quantized. The vocoder with scalar quantization operates at 1 kbps, with potential for lower bit-rate.
暂无评论