In 1996, the U.S. Department of Defense Digital Voice Processing Consortium (DDVPC) selected Texas Instrument's mixed excitation linear prediction (MELP) algorithm as the recommended new Federal Standard for 2400 ...
详细信息
In 1996, the U.S. Department of Defense Digital Voice Processing Consortium (DDVPC) selected Texas Instrument's mixed excitation linear prediction (MELP) algorithm as the recommended new Federal Standard for 2400 bps voice communications. The algorithm selection process involved quality, intelligibility, communicability, and recognizability testing in many acoustic noise, error, and tandem conditions. Algorithm complexity was also measured. This paper compares the performance scores, diagnostic information, and complexity of MELP to the 4800 bps Federal Standard (FS1016) code excited linear prediction (CELP) algorithm, the 16 kbps continuously variable slope delta modulation (CVSD) algorithm, and the venerable Federal Standard (FIPS Pub. 137) 2400 bps linear predictive coding (LPC-10) algorithm.
We propose methods to achieve bit-rate and complexity scalability in CELP coders/decoders. These methods can be implemented as plug-ins or add-ons to existing CELP systems. The associated scale factors can be readjust...
详细信息
We propose methods to achieve bit-rate and complexity scalability in CELP coders/decoders. These methods can be implemented as plug-ins or add-ons to existing CELP systems. The associated scale factors can be readjusted even during run-time. In particular, using the proposed methods, a wideband (sampling rate 16 kHz) coder/decoder has been developed that is capable of reconstructing good to high quality speech at bit rates ranging from 14 to 24 kbit/s. The possible complexity range of the decoder can amount to a factor of 2 or more, depending on the constitution of the transmitted bitstream. Although the system also incorporates a complexity scalable encoder, it is not discussed in this paper. This system has been submitted to MPEG-4, where it has been incorporated in the MPEG-4 Audio Verification Model.
LPC based speech coders operating at bit rates below 3.0 kbits/sec are usually associated with buzzy or metallic artefacts in the synthetic speech. These are mainly attributable to the simplifying assumptions made abo...
详细信息
LPC based speech coders operating at bit rates below 3.0 kbits/sec are usually associated with buzzy or metallic artefacts in the synthetic speech. These are mainly attributable to the simplifying assumptions made about the excitation source, which are usually required to maintain such low bit rates. A new LPC vocoder is presented which splits the LPC excitation into two frequency bands using a variable cut-off frequency. The lower band is responsible for representing the voiced parts of speech, whilst the upper band represents unvoiced speech. In doing so the coder's performance during both mixed voicing speech and speech containing acoustic noise is greatly improved, producing soft natural sounding speech. The paper also describes new parameter determination and quantisation techniques vital to the operation of this coder at such low bit rates.
This paper describes the proprietary implementation of a real-time software based HY-2 compatible channel vocoder algorithm. The HY-2 algorithm was used prior to linear predictive coding (LPC) vocoders for low bit rat...
详细信息
This paper describes the proprietary implementation of a real-time software based HY-2 compatible channel vocoder algorithm. The HY-2 algorithm was used prior to linear predictive coding (LPC) vocoders for low bit rate voice communications at 2400 bps. This implementation is a unique and instructive implementation of a parametric vocoder, producing high quality speech.
Speech recognition is an increasingly popular method for Chinese character input. A fast and reliable hierarchical bounding box method for searching the speech database is proposed. The method borrows from ideas in co...
详细信息
ISBN:
(纸本)0780343719
Speech recognition is an increasingly popular method for Chinese character input. A fast and reliable hierarchical bounding box method for searching the speech database is proposed. The method borrows from ideas in computer graphics, where the hierarchical bounding box concept is used for fast ray-object intersection tests (Hearn et al. 1997).
Utilizing the ordering property of LSF (Line Spectral Frequency) parameters, we propose a new quantization method for scalar quantization of LSF coefficients, which uses a new set of quantization parameters called LSF...
详细信息
Utilizing the ordering property of LSF (Line Spectral Frequency) parameters, we propose a new quantization method for scalar quantization of LSF coefficients, which uses a new set of quantization parameters called LSFI instead of direct LSF or Differential LSF (LSFD). Based on the experimental analysis the new quantization method is further improved by combining with LSFD parameters (we call this improved version an LSFID scheme). The experimental results have shown that the proposed methods are effective in decreasing quantization distortion. Compared with LSFD, the LSFID quantizer can save at least 1 bit per frame to achieve 1dB quantization.
Speech synthesis technique is classified into three groups; waveform coding, source coding and hybrid coding. Waveform coding and hybrid coding are used for sentence-based-synthesis by synthesis-by-analysis method; th...
详细信息
ISBN:
(纸本)0780336941
Speech synthesis technique is classified into three groups; waveform coding, source coding and hybrid coding. Waveform coding and hybrid coding are used for sentence-based-synthesis by synthesis-by-analysis method; the difficulty of pitch altering makes them inappropriate for synthesis-by-rule. However, if it is possible to alter the pitch period when the waveform coding is used, synthesis-by-rule is available maintaining good intelligibility and naturalness comparable to the original speech. In this paper, we propose a new pitch alteration method that can change the pitch period in waveform coding by scaling the time-axis and where phase compensation is performed by using the zero-inserting and pitch-halving method.
The focus of this work is on the performance analysis of a text dependent closed set speaker identification system for the Italian language. Two identification algorithms, based on LPC and LPC-cepstral feature extract...
详细信息
The focus of this work is on the performance analysis of a text dependent closed set speaker identification system for the Italian language. Two identification algorithms, based on LPC and LPC-cepstral feature extractors followed by a continuous density hidden Markov model (CD-HMM) classifier, have been implemented and tested on the Italian database SIVA the MUSER. The database consists of 360 phone calls made by 20 different male speakers from different Italian regions. The false identification probability for the two algorithms has been evaluated for different training sets, different spoken words and a variable number of states of the CD-HMM classifier. Results show that, in any of the considered conditions, the LPC-cepstral based system performs better than the LPC based one and that, in the best working condition, the false identification probability turns out to be of the order of 1.5 per cent.
It is shown that the charge pumping technique allows the extraction of the Si-SiO/sub 2/ interface depth trap concentration profile. This profile is found of the form N/sub t/(x)=N/sub ts/ exp(-x/d)+N/sub to/ where d ...
详细信息
It is shown that the charge pumping technique allows the extraction of the Si-SiO/sub 2/ interface depth trap concentration profile. This profile is found of the form N/sub t/(x)=N/sub ts/ exp(-x/d)+N/sub to/ where d is the distance in the oxide from the interface. The method is discussed as well as the expression of the trap density obtained.
Considerable research has been devoted to the development of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio. As a result, many algorithms have been proposed and several have...
详细信息
Considerable research has been devoted to the development of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio. As a result, many algorithms have been proposed and several have now become international and/or commercial product standards. This paper reviews algorithms for perceptually transparent coding of CD-quality digital audio, including both research and standardization activities. First, psychoacoustic principles are described with the MPEG psychoacoustic signal analysis model 1 discussed in some detail. Then, we review methodologies which achieve perceptually transparent coding of FM- and CD-quality audio signals, including algorithms which manipulate transform components and subband signal decompositions. The discussion concentrates on architectures and applications of those techniques which utilize psychoacoustic models to exploit efficiently masking characteristics of the human receiver. Several algorithms which have become international and/or commercial standards are also presented, including the ISO/MPEG family and the Dolby AC-3 algorithms. The paper concludes with a brief discussion of future research directions.
暂无评论