In this paper, block backward adaptive linear predictors are used for improving the performance of perceptual audio codecs. Based on the investigation on different linear prediction algorithms, a signal dependent adap...
详细信息
In this paper, block backward adaptive linear predictors are used for improving the performance of perceptual audio codecs. Based on the investigation on different linear prediction algorithms, a signal dependent adaptive-switched predictor is developed. This predictor not only delivers significant coding gain for stationary signals but also recovers quickly from transient signals. At a bitrate of 64 kbit/s, the performance of the new codec for most critical test sequences is significantly better than MPEG-1 Layer II.
This paper presents an improved mixed LPC vocoder at 2000 bps using Multi-Band Excitation analysis by a synthesis algorithm. The new vocoder determines the voiced/unvoiced characteristics harmonic by harmonic in a fra...
详细信息
This paper presents an improved mixed LPC vocoder at 2000 bps using Multi-Band Excitation analysis by a synthesis algorithm. The new vocoder determines the voiced/unvoiced characteristics harmonic by harmonic in a frame, and finds the first voiced/unvoiced transition as the cut-off frequency, which is more accurate and efficient than traditional cut-off frequency detection. The synthetic speech below the cut-off frequency is excited by a series of voiced harmonics, while the signal above the cut-off frequency is simulated by a noise source. The final output speech is the sum of these two outputs. To increase the naturalness and clearness of the synthesized speech, this model applies phase prediction and spectral enhancement in the synthesizer. It is also possible to reduce the bit rate to 1200 bps. Informal listening tests indicate that the output speech possesses higher intelligibility and quality than that of the 2.4 kbps LPC-10e standard, and is comparable with the 4.8 kbps FS1016 CELP vocoder.
A human information processing system is composed of neurons switching at speeds about a million times slower than computer gates. Yet humans are more efficient than computers at computing complex tasks such as speech...
详细信息
A human information processing system is composed of neurons switching at speeds about a million times slower than computer gates. Yet humans are more efficient than computers at computing complex tasks such as speech and visual interpretation. A neural network (NN) method was developed to reproduce one of the abilities and power of the human brain: speaker recognition. To realize this method, the input patterns used for this network were the reflection coefficients (k/sub p/) of the speech signal. The speech was analyzed using an autoregressive LPC technique and the k/sub p/s were found by applying the Levinson recursion algorithm. Next, a simple transformation of these input patterns onto a hypersphere in augmented space was made using a multilayer perceptron (MLP) neural model. The Kohonen approach is commonly used for computing a distance in multidimensional input space so that the input patterns are projected on a hypersphere with unit radius. Although this technique is very efficient for clustering of patterns, it has one significant drawback. By normalizing the input pattern, the information about its magnitude is lost. The proposed modified network has a relatively simple architecture but is shown to be very effective in performing speaker recognition.
In this paper, the fixed-point accuracy analysis and VLSI architecture of FS1O16 CELP decoder are presented. The code excited linearpredictive (CELP) coder is the most effective technique among various linear predict...
详细信息
In this paper, the fixed-point accuracy analysis and VLSI architecture of FS1O16 CELP decoder are presented. The code excited linearpredictive (CELP) coder is the most effective technique among various linear predictive coding methods for speech compression. Hence to design a low cost and low power CELP decoder chip for the portable systems and wireless digital communication environment becomes increasingly important. The decoder VLSI architecture can achieve (1) excellent accuracy results due to the accuracy studies for the finite word length, (2) power saving and high speed operations resulting from the combined advantages of pipeline, current processing for LSE's interpolating and cosine operation, (3) table size reducing by applying the memoryless realization for stochastic codebook and partial sums technique, and (4) specification satisfying the FS1016 CELP coder.
Bark-scale warped linear prediction (WLP) is a very potential core for a monophonic perceptual audio codec. In the current paper the WLP scheme is extended for processing complex valued signals (CWLP). Three different...
详细信息
Bark-scale warped linear prediction (WLP) is a very potential core for a monophonic perceptual audio codec. In the current paper the WLP scheme is extended for processing complex valued signals (CWLP). Three different methods of converting a stereo signal to one complex valued signal are introduced. The philosophy behind the coding scheme is to integrate some aspects of modern wideband audio coding (e.g. perceptuality and stereo signal processing) into one computational element in order to find a more holistic and economic way of processing.
The paper presents a novel technique for underwater acoustic voice communications. The speech signal is compressed prior to transmission by using linear predictive coding and transmission of appropriate speech paramet...
详细信息
The paper presents a novel technique for underwater acoustic voice communications. The speech signal is compressed prior to transmission by using linear predictive coding and transmission of appropriate speech parameters is achieved by digital pulse position modulation. The main emphasis is on the reception of the multipath-dominant signal and on the demodulation process.
This paper describes the new U.S. Federal Standard at 2400 bps. The mixed excitation linear prediction (MELP) coder was chosen by the DoD Digital Voice Processing Consortium to replace the existing 2400 bps Federal St...
详细信息
This paper describes the new U.S. Federal Standard at 2400 bps. The mixed excitation linear prediction (MELP) coder was chosen by the DoD Digital Voice Processing Consortium to replace the existing 2400 bps Federal Standard FS 1015 (LPC-10). This new standard provides equal or improved performance over the 4800 bps Federal Standard FS 1016 (CELP) at a rate equivalent to LPC-10. The MELP coder is based on the traditional LPC model, but includes additional features to improve its performance.
Conventional time-scale modification methods have the problem that as the modification rate gets higher the time-scale modified speech signal becomes less intelligible, because they ignore the effect of articulation r...
详细信息
ISBN:
(纸本)0818679190
Conventional time-scale modification methods have the problem that as the modification rate gets higher the time-scale modified speech signal becomes less intelligible, because they ignore the effect of articulation rate on speech characteristics. We propose a variable time-scale modification method based on the knowledge that the timing information of transient portions of a speech signal plays an important role in speech perception. After identifying transient and steady portions of a speech signal, the proposed method gets the target rate by modifying steady portions only. The result of subjective preference test indicates that the proposed method produces performance superior to that of the conventional SOLA method.
A method for recovering the LPC spectrum from a microphone array input signal corrupted by ambient noise is proposed. This method is based on the CSS (coherent subspace) method, which is designed for DOA (direction of...
详细信息
A method for recovering the LPC spectrum from a microphone array input signal corrupted by ambient noise is proposed. This method is based on the CSS (coherent subspace) method, which is designed for DOA (direction of arrival) estimation of broadband array input signals. The noise energy is reduced in the subspace domain by the maximum likelihood method. To enhance the performance of noise reduction, elimination of the noise-dominant subspace using projection is further employed, which is effective when the SNR is low and classification of noise and signals in the subspace domain is difficult. The results of the simulation show that some small formants, which cannot be estimated by the conventional delay-and-sum beamformer, were well estimated by the proposed method.
This paper presents a new and efficient method for modeling voiced, mixed excitation spectra in sinusoidal (SC) and prototype interpolation coding (PIC) systems. Speech harmonics are classified as "weak-voiced&qu...
详细信息
This paper presents a new and efficient method for modeling voiced, mixed excitation spectra in sinusoidal (SC) and prototype interpolation coding (PIC) systems. Speech harmonics are classified as "weak-voiced" or "strong-voiced" by simply examining the short-term residual magnitude spectrum. This information is encoded effectively in terms of fixed width frequency bands and is used to control sets of periodic and random sine wave oscillators which model the short-term mixed excitation nature of speech. In this way the model allows the mixing of periodic and random signal energy on a harmonic basis. The proposed methodology has been used in a 2.4 Kbits/sec speech coder, whose recovered speech quality is better than that of the 4.8 Kbits/sec DoD standard.
暂无评论