This paper introduces a new lattice quantization scheme, the multiple-scale lattice vector quantization (MSLVQ), based on the truncation of the D/sub 10//sup +/ lattice. The codebook is composed of several copies of t...
详细信息
This paper introduces a new lattice quantization scheme, the multiple-scale lattice vector quantization (MSLVQ), based on the truncation of the D/sub 10//sup +/ lattice. The codebook is composed of several copies of the truncated lattice scaled with different scaling factors. A fast nearest neighbor search is introduced. We compare the performance of predictive MSLVQ for quantization of line spectrum frequency (LSF) coefficients with the quantization technique used in the G.729 codec and show the better performance of our method in terms of spectral distortion. The MSLVQ scheme achieves the transparent quality at 21 bits/frame.
The line spectral frequencies (LSFs) extracted from successive analysis orders are interlaced with each other. This intermodel interlacing property gives a new relationship between the closeness of LSFs and their spec...
详细信息
The line spectral frequencies (LSFs) extracted from successive analysis orders are interlaced with each other. This intermodel interlacing property gives a new relationship between the closeness of LSFs and their spectral sensitivities, which enables us to propose a weighting function for LSF distortion measurement. By applying the proposed weighting function to an LSF quantizer, we can achieve better performance than when using the conventional heuristic functions. Moreover, the complexity of the proposed weighting function is much lower than that of the optimal weighting function, while their performances are almost the same.
This paper describes several methods of interpreting line spectral pair (LSP) parameters such as those generated by CELP and LPC speech coders. Analysis methods are proposed to assist in automatic classification of th...
详细信息
This paper describes several methods of interpreting line spectral pair (LSP) parameters such as those generated by CELP and LPC speech coders. Analysis methods are proposed to assist in automatic classification of the phonetic content of an encoded speech signal. In particular, the advantages of analysing the lower bitrate parameters available at a speech coder output rather than the higher bitrate input parameters, are discussed. Comparisons are made with existing parametric measures in terms of both performance and implementation complexity, with a view to maximising classification performance whilst minimising complexity.
An improved method for the assessment of the oxide thickness applicable to advanced CMOS technologies is proposed. To this end, a proper combination of Maserjian's technique (Maserjian et al., Solid State Electron...
详细信息
An improved method for the assessment of the oxide thickness applicable to advanced CMOS technologies is proposed. To this end, a proper combination of Maserjian's technique (Maserjian et al., Solid State Electron. vol. 17, pp. 335-9, 1974) and of Vincent's method (Vincent et al., Proc. IEEE Microelectronic Test Structures vol. 10, pp. 105-10, 1997) is used to alleviate the unknown parameter inherent to both extraction procedures and which depends on the employed carrier statistics. The new method has been successfully applied to various technologies with gate oxide thickness ranging from 7 nm down to 1.8 nm.
MPEG-4 parametric speech coding, harmonic vector excitation coding (HVXC) algorithm, is described. New features of the coder includes a quantizer scheme capable of generating 2.0 and 4.0 kbps scalable bit-streams, whe...
详细信息
MPEG-4 parametric speech coding, harmonic vector excitation coding (HVXC) algorithm, is described. New features of the coder includes a quantizer scheme capable of generating 2.0 and 4.0 kbps scalable bit-streams, where 2.0 kbps decoding is possible using a subset of 4.0 kbps bit-stream. Time scale modification of speech is also possible without changing pitch nor phoneme for fast and slow playback mode. Listening tests show that the proposed coding method at 2.0 kbps provides significantly better quality than that of FS1016 CELP at 4.8 kbps. In October 1998, the HVXC coder was adopted to the Final Draft International Standard (FDIS) of MPEG-4 standardization.
The coder proposed in this paper falls in the class of segmental vocoders known as phonetic vocoders. Speaker recognisability is one of the main problems faced by vocoders at the lowest bit rates, given the need to re...
详细信息
The coder proposed in this paper falls in the class of segmental vocoders known as phonetic vocoders. Speaker recognisability is one of the main problems faced by vocoders at the lowest bit rates, given the need to reduce speaker specific information. Hence, phonetic vocoders are very suitable to speaker dependent coding, and can achieve bit rates as low as 250 bit/s. For speaker independent coding a speaker adaptation methodology is adopted, although resulting in higher bit rates to transmit the speaker specific information. In order to further reduce the corresponding bit rate, a new method is proposed that explores the intra-speaker correlation for the same phone.
Text-to-speech synthesis is of great interest and its applications are several. For this reason, it has interested many researchers for decades. Two methods are usually used: synthesis by rule and synthesis by concate...
详细信息
Text-to-speech synthesis is of great interest and its applications are several. For this reason, it has interested many researchers for decades. Two methods are usually used: synthesis by rule and synthesis by concatenation of pre-recorded sounds. But these methods have some disadvantages such as difficulty to be adapted to a new speaker or to a new language. Recently, neural networks (NN) have been used with nonconventional problems where a traditional solution seems impossible. Text-to-speech appears as one of these problems. In this field, it has been shown that NN don't work well when they are directly fed with speech samples. Therefore, works have been done to explore and evaluate different parametric forms of speech based on linear predictive coding (LPC), used for training, and found that LSP produced the best results. However, these methods don't take into account residual signal and speech produced was machine-like and not natural. We propose in this paper to drive the NN with codebook-excited linear prediction (CELP), which provides high quality speech, to perform Arabic speech synthesis.
This paper presents a harmonic+noise speech coder which uses an efficient spectral quantization technique and a novel voiced/unvoiced (V/UV) mixing model. The harmonic magnitudes are coded at 23 bits/frame using the m...
详细信息
This paper presents a harmonic+noise speech coder which uses an efficient spectral quantization technique and a novel voiced/unvoiced (V/UV) mixing model. The harmonic magnitudes are coded at 23 bits/frame using the magnitude response of a linear predictive coding (LPC) system. The difference between the harmonic magnitudes and the sampled magnitude response is minimized by the closed-loop approach. The V/UV mixing is modeled by a smooth function which is derived from the speech spectrum envelope based on the flatness measure. The V/UV mixing model allows noise to be added in the harmonic portion of speech spectrum so that buzzyness is reduced. The V/UV mixing information is determined from the spectral parameters available in the decoder, no bits are needed for transmitting the V/UV information. A 1.4 kbps harmonic coder is developed. The speech quality of the coder is comparable to other harmonic coders operating at higher rates.
This paper describes an 8 kbit/s ACELP speech coder with high performance for both speech and non-speech signals such as background noise. While the traditional waveform matching LPAS structure employed in many existi...
详细信息
This paper describes an 8 kbit/s ACELP speech coder with high performance for both speech and non-speech signals such as background noise. While the traditional waveform matching LPAS structure employed in many existing speech coders provides high quality for speech signals, it has significant performance limitations for, for example, background noise. The coder presented here employs a novel adaptive gain coding technique using energy matching in combination with a traditional waveform matching criterion providing high quality for both speech and background noise. The coder has a basic structure similar to that of the 7.4 kbit/s D-AMPS EFR coder, with a 10 th order LPC, high resolution adaptive codebook and a 4 pulse algebraic codebook. The performance for speech signals is equivalent to or better than that of state-of-the-art 8 kbit/s coders, while for background noise conditions the performance is significantly improved.
In this paper, we propose a pitch synchronous addition method for LPC analysis by making use of the periodicity of speech. It is shown that the solution overcomes the difficulty involved with the technique of noise re...
详细信息
In this paper, we propose a pitch synchronous addition method for LPC analysis by making use of the periodicity of speech. It is shown that the solution overcomes the difficulty involved with the technique of noise reduction compatible with the stability of the LPC filter obtained by subtracting the noise part from the autocorrelation function of speech. The relation between the pitch period of speech and the improvement in signal-to-noise ratio accomplished by the method is investigated. The simulation results show the effectiveness of the proposed method especially for high-pitched speech.
暂无评论