Considerable research has been devoted to the development of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio. As a result, many algorithms have been proposed and several have...
详细信息
Considerable research has been devoted to the development of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio. As a result, many algorithms have been proposed and several have now become international and/or commercial product standards. This paper reviews algorithms for perceptually transparent coding of CD-quality digital audio, including both research and standardization activities. First, psychoacoustic principles are described with the MPEG psychoacoustic signal analysis model 1 discussed in some detail. Then, we review methodologies which achieve perceptually transparent coding of FM- and CD-quality audio signals, including algorithms which manipulate transform components and subband signal decompositions. The discussion concentrates on architectures and applications of those techniques which utilize psychoacoustic models to exploit efficiently masking characteristics of the human receiver. Several algorithms which have become international and/or commercial standards are also presented, including the ISO/MPEG family and the Dolby AC-3 algorithms. The paper concludes with a brief discussion of future research directions.
It is shown that the charge pumping technique allows the extraction of the Si-SiO/sub 2/ interface depth trap concentration profile. This profile is found of the form N/sub t/(x)=N/sub ts/ exp(-x/d)+N/sub to/ where d ...
详细信息
It is shown that the charge pumping technique allows the extraction of the Si-SiO/sub 2/ interface depth trap concentration profile. This profile is found of the form N/sub t/(x)=N/sub ts/ exp(-x/d)+N/sub to/ where d is the distance in the oxide from the interface. The method is discussed as well as the expression of the trap density obtained.
The goal of this paper is to propose a new perceptually-based objective technique that uses radial basis functions neural networks, instead of regression algorithms, to estimate the nonlinear mapping function that bes...
详细信息
The goal of this paper is to propose a new perceptually-based objective technique that uses radial basis functions neural networks, instead of regression algorithms, to estimate the nonlinear mapping function that best represents the relationship among input (perceptual parameters) and output (speech quality) variables in a database. In the proposed technique, the perceptual parameters are obtained by: (1) emulating several known features of perceptual processing of speech sounds by the human ear (including critical-band masking, equal loudness, and the intensity-loudness power law operations) to map the speech power spectrum into the auditory power spectrum (bark domain), (2) deriving the perceptual LPC coefficients from the auditory spectrum that is used to calculate, for each frame, the cepstrum distance between the input and the output coded speech signals; (3) using the radial basis functions neural network to map the perceptual cepstrum distance per frame into the corresponding estimated speech quality. After extensive experimentation and validation of the proposed techniques, the results indicate that the proposed technique is shown to be effective for estimating the coded speech quality.
The Puzzle Project is an interactive software system that solves jigsaw puzzles. The voice interface includes speech synthesis and word recognition. The attributes of the puzzle pieces are determined using image proce...
详细信息
The Puzzle Project is an interactive software system that solves jigsaw puzzles. The voice interface includes speech synthesis and word recognition. The attributes of the puzzle pieces are determined using image processing techniques and wavelet decomposition. Two algorithms are used to solve the puzzles: an expert system and fuzzy logic. This paper describes the steps required to find the solution to the puzzle from image processing to decision-making algorithms. It also explains the techniques involved in designing the voice interface.
We propose methods to achieve bit-rate and complexity scalability in CELP coders/decoders. These methods can be implemented as plug-ins or add-ons to existing CELP systems. The associated scale factors can be readjust...
详细信息
We propose methods to achieve bit-rate and complexity scalability in CELP coders/decoders. These methods can be implemented as plug-ins or add-ons to existing CELP systems. The associated scale factors can be readjusted even during run-time. In particular, using the proposed methods, a wideband (sampling rate 16 kHz) coder/decoder has been developed that is capable of reconstructing good to high quality speech at bit rates ranging from 14 to 24 kbit/s. The possible complexity range of the decoder can amount to a factor of 2 or more, depending on the constitution of the transmitted bitstream. Although the system also incorporates a complexity scalable encoder, it is not discussed in this paper. This system has been submitted to MPEG-4, where it has been incorporated in the MPEG-4 Audio Verification Model.
This paper describes the proprietary implementation of a real-time software based HY-2 compatible channel vocoder algorithm. The HY-2 algorithm was used prior to linear predictive coding (LPC) vocoders for low bit rat...
详细信息
This paper describes the proprietary implementation of a real-time software based HY-2 compatible channel vocoder algorithm. The HY-2 algorithm was used prior to linear predictive coding (LPC) vocoders for low bit rate voice communications at 2400 bps. This implementation is a unique and instructive implementation of a parametric vocoder, producing high quality speech.
Utilizing the ordering property of LSF (Line Spectral Frequency) parameters, we propose a new quantization method for scalar quantization of LSF coefficients, which uses a new set of quantization parameters called LSF...
详细信息
Utilizing the ordering property of LSF (Line Spectral Frequency) parameters, we propose a new quantization method for scalar quantization of LSF coefficients, which uses a new set of quantization parameters called LSFI instead of direct LSF or Differential LSF (LSFD). Based on the experimental analysis the new quantization method is further improved by combining with LSFD parameters (we call this improved version an LSFID scheme). The experimental results have shown that the proposed methods are effective in decreasing quantization distortion. Compared with LSFD, the LSFID quantizer can save at least 1 bit per frame to achieve 1dB quantization.
A compression algorithm is presented, which utilizes the special properties of ultrasonic radio frequency (RF) data. The compression is done in two steps: First, linear predictive coding (LPC) is applied, using an one...
详细信息
A compression algorithm is presented, which utilizes the special properties of ultrasonic radio frequency (RF) data. The compression is done in two steps: First, linear predictive coding (LPC) is applied, using an one-step-predictor. Further, the remaining error of the prediction is stored using only the necessary word length to store the signal. A lossy extension of the algorithm is presented, which stores only the upper bits of the error signal. The algorithm has been tested with both, data of a speckle phantom and in vivo data. The data could be compressed to approximately 30-55% of the original data size using the lossless algorithm. In comparison, a conventional compression tool achieves 65-75 % of the original data size.
In this paper, we proposed a very low bit-rate speech codec using recognition and synthesis schemes. The 2512 speech units, including 48 phones and 2463 diphones are utilized in the recognition process. The three-stat...
详细信息
In this paper, we proposed a very low bit-rate speech codec using recognition and synthesis schemes. The 2512 speech units, including 48 phones and 2463 diphones are utilized in the recognition process. The three-state continuous hidden Markov model, excluding the start and final states, is applied to model these speech units. In addition to the recognized phonetic index, the corresponding phonetic frame length is also the compressed information. In order to obtain a better quality of the reconstructed speech, pitch periods and pitch gains are realized to preserve the speaker's personal characteristics. In the synthesis process, the time-domain pitch-synchronous overlapped addition scheme is utilized to synthesize a high-quality speech waveform. In our tests, a more than 90% recognition accuracy can be achieved when the user speaks in a normal behavior. The reconstructed speech quality can be above a mean opinion score of 3.0, and a diagnostic rhyme test score of 92.
We present a low-delay vector predictive transform coder with quantization noise modeling. The coder uses overlapped transform coding and a Kalman filter with a backward estimated LPC model of the speech signal combin...
详细信息
We present a low-delay vector predictive transform coder with quantization noise modeling. The coder uses overlapped transform coding and a Kalman filter with a backward estimated LPC model of the speech signal combined with an additive noise model of the quantization noise. The noise model is driven by the vector quantizer. Each quantizer cell has its own error correlation matrix, which is kept in a table and addressed using the quantizer index. Simulation results indicate that overlapped transform coding combined with noise modeling can improve the quality of the decoded speech signal.
暂无评论