Several techniques for speech coding at rates of 4 kb/s and lower require quantization of spectral magnitudes at a set of frequencies which are harmonics of the fundamental pitch period of the talker (for example: mul...
详细信息
Several techniques for speech coding at rates of 4 kb/s and lower require quantization of spectral magnitudes at a set of frequencies which are harmonics of the fundamental pitch period of the talker (for example: multiband excitation coding, sinusoidal transform coding, and time-frequency interpolation). The number of harmonic magnitudes to be quantized depends on the fundamental frequency value and hence is variable, changing from frame to frame. The variable number of components to be quantized makes it difficult to use fixed-dimension vector quantization for harmonic magnitude encoding. In this paper, we introduce a quantization technique called non-square transform vector quantization (NSTVQ) which uses a fixed-dimension vector quantizer combined with a variable-size non-square transform which maps the variable-dimension harmonic magnitude vector into a fixed-dimension vector. The optimal reconstruction procedure for non-square transforms is derived and shown to be equivalent to an optimal least-square estimation procedure. The proposed technique is evaluated experimentally as part of a new coding system called spectral excitation coding (SEC). The results are compared to an existing technique which estimates the spectral shape using all-pole modeling followed by vector quantization of the LSP parameters.
The response time required in speaker identification systems mainly depend on the amount of enrolled speakers. Thus, how to reduce the computational cost, when evaluating large speaker databases, is the key problem. T...
详细信息
The response time required in speaker identification systems mainly depend on the amount of enrolled speakers. Thus, how to reduce the computational cost, when evaluating large speaker databases, is the key problem. Thus, a ldquobag of codesrdquo algorithm is proposed, which can generate speaker models by estimating the probability distribution of each code in speech data. Experiments prove that the new configuration has substantially lower complexity than commonly used methods with comparable identification accuracy, overcoming one bottleneck in the development of speaker identification research.
We introduce a method novel to generate new types of time-varying convolutional codes using finite-field wavelets. These codes have a new type of trellis that we refer to as a bipartite (or k-partite) trellis. An impo...
详细信息
ISBN:
(纸本)0780371232
We introduce a method novel to generate new types of time-varying convolutional codes using finite-field wavelets. These codes have a new type of trellis that we refer to as a bipartite (or k-partite) trellis. An important feature of the k-partite trellises is their reduced decoding complexity.
The authors describe the use of low rate vocoders (4800 b/s, 2400 b/s and 800 b/s) as an alternative to CVSD 16 kb/s coder. The available bit rate on the global 16 kb/s channel can be used to improve robustness of the...
详细信息
The authors describe the use of low rate vocoders (4800 b/s, 2400 b/s and 800 b/s) as an alternative to CVSD 16 kb/s coder. The available bit rate on the global 16 kb/s channel can be used to improve robustness of the communication or to offer new services (multiplexing speech and data or proposing a duplex service). The authors report the theoretical studies that took into account both the speech coder and the channel characteristics to derive an error protection scheme. A real-time implementation has been carried out. It uses a vehicular radio set type of the PR4G VHF/frequency hopping system. Performance has been assessed during field experiments and comparisons have been drawn with the standard CVSD coder. Finally, perspectives of industrialization are mentioned.< >
Linear Prediction (LP) used in the CELP speech coding is the most successful speech analysis method. However, it has drawbacks, such as low estimation accuracy due to its least-squares scheme. We proposed time-varying...
详细信息
Linear Prediction (LP) used in the CELP speech coding is the most successful speech analysis method. However, it has drawbacks, such as low estimation accuracy due to its least-squares scheme. We proposed time-varying complex AR (TV-CAR) speech analysis based on MMSE criterion, robust criterion, and LASSO analysis and evaluated it for $F_speech coding$ estimation of speech. We also proposed $\ell_speech$ -norm regularized TV-CAR analysis, Regularized LP (RLP)-based, Time-RLP (TRLP)-based and their combined methods, and evaluated them on $F_speech coding$ estimation based on the IRAPT using complex residual signals. In addition, a bone-conducted (BC) pre-filter has already been introduced to improve performance. This paper proposes an improved $F_speech coding$ estimation lntroducing adaptive pre-emphasis and shows its effectiveness.
The canonical representation of speech constitutes a perfect reconstruction (PR) analysis-synthesis system. Its parameters are the autoregressive (AR) model coefficients, the pitch period and the voiced and unvoiced c...
详细信息
The canonical representation of speech constitutes a perfect reconstruction (PR) analysis-synthesis system. Its parameters are the autoregressive (AR) model coefficients, the pitch period and the voiced and unvoiced components of the excitation represented as transform coefficients. Each set of parameters may be operated on independently. A time-frequency unvoiced excitation (TFUNEX) model is proposed that has high time resolution and selective frequency resolution. Improved time-frequency fit is obtained by using for antialiasing cancellation the clustering of pitch-synchronous transform tracks defined in the modulation transform domain. The TFUNEX model delivers high-quality speech while compressing the unvoiced excitation representation about 13 times over its raw transform coefficient representation for wideband speech.
This paper presents a new 1200 bps speech coder designed with a tree searched multi stage matrix quantization scheme. With the new matrix quantization method, spectral distortion about 1 dB is achieved using rates as ...
详细信息
ISBN:
(纸本)0780366859
This paper presents a new 1200 bps speech coder designed with a tree searched multi stage matrix quantization scheme. With the new matrix quantization method, spectral distortion about 1 dB is achieved using rates as low as 18 bits/frame. In the proposed coder, LSF parameters of two consecutive frames are grouped into a superframe and jointly quantized. For other speech parameters, quantization is made for each frame. New techniques for improving performance include joint quantization of pitch and voiced/unvoiced/mixed decisions, gain interpolation and residual LSF quantization. For the new matrix quantization based speech coder (MQBC), the listening tests have proven that an efficient and high quality coding has been achieved at bit rate 1200 bps. Test results are compared with the 2400 bps LPC10e coder and the new 2400 bps MELP coder chosen as the new 2400 bps Federal Standard.
In certain communication environments, digital speech transmission systems must work in severe acoustic environments where the noise levels exceeds 110 dB. In other environments, speakers must use an oxygen face mask....
详细信息
In certain communication environments, digital speech transmission systems must work in severe acoustic environments where the noise levels exceeds 110 dB. In other environments, speakers must use an oxygen face mask. In both situations, the intelligibility of encoded speech falls below an acceptable level. We have developed a technique for improving speech quality in these situations. Previous speech improvement methods have focused on processing the corrupted signal after it has been induced by the microphone. These methods have not performed adequately. In our technique, speech anomalies are attenuated by a microphone array before speech and noise become mixed into a signal. Our microphone array prototype has shown excellent performance. In an example of speech taken aboard an E2C aircraft, this noise-canceling microphone array improved the speech-to-noise ratio by as much as 18 dB. When the same technique is used in a face mask, muffled speech was almost completely restored to high quality speech.
In this paper, we describe several important experiments concerning large vocabulary distributed continuous speech recognition (LVDCSR) systems in Brazilian Portuguese using LSF and LPC - derived features. The ITU-T G...
详细信息
In this paper, we describe several important experiments concerning large vocabulary distributed continuous speech recognition (LVDCSR) systems in Brazilian Portuguese using LSF and LPC - derived features. The ITU-T G.723.1 codec is employed and investigated as a case of practical use of this technology. Results are presented for both speaker dependent and independent modes as well as the situations where the same text or different texts were used for training and testing.
This paper explores data embedding in G.711 mu-law speech signals with the spread spectrum techniques. Based on an optimized spread spectrum scheme, a simple but effective solution is presented for high-capacity embed...
详细信息
This paper explores data embedding in G.711 mu-law speech signals with the spread spectrum techniques. Based on an optimized spread spectrum scheme, a simple but effective solution is presented for high-capacity embedding. Simulations show that the proposed scheme, when incorporated with the measure of the frequency masking effects, can achieve an embedding rate of about 100 bits per second with a 7% bit error rate (BER), or 1000 bps with a 10% BER
暂无评论