This paper presents AT&T's candidate coder for the ITU-T's new wideband speech coding standard at 16, 24 and 32 kb/s. This coder achieves high speech quality with a low coder complexity. The basic idea of ...
详细信息
ISBN:
(纸本)0818679190
This paper presents AT&T's candidate coder for the ITU-T's new wideband speech coding standard at 16, 24 and 32 kb/s. This coder achieves high speech quality with a low coder complexity. The basic idea of the coder is to perform closed-loop pitch prediction on perceptually weighted speech, and then quantize the prediction residual using perceptually based transform coding techniques. A first version of the coder based on DFT was thoroughly tested and submitted to the ITU-T in February 1996, and it was selected as one of two surviving candidates to advance to the next phase. A revised version based on MDCT was later submitted in October 1996. Both versions are described.
A low rate speech coding algorithm, harmonic vector excitation coding (HVXC) is proposed for MPEG-4 standardization, in which an efficient coding scheme based on harmonic and stochastic vector representation of linear...
详细信息
A low rate speech coding algorithm, harmonic vector excitation coding (HVXC) is proposed for MPEG-4 standardization, in which an efficient coding scheme based on harmonic and stochastic vector representation of linear predictive coding (LPC) residuals is employed. A combination of weighted vector quantization of harmonic spectral envelope of LPC residual signal for voiced segments and vector excitation coding for unvoiced segments provides good speech duality at very low bit rates. MPEG-4 formal listening tests in December 95 showed that the subjective speech quality of HVXC at 2.0 kbps was better than that of FS1016 4.8 kbps CELP.
The focus of this work is on the performance analysis of a text dependent closed set speaker identification system for the Italian language. Two identification algorithms, based on LPC and LPC-cepstral feature extract...
详细信息
The focus of this work is on the performance analysis of a text dependent closed set speaker identification system for the Italian language. Two identification algorithms, based on LPC and LPC-cepstral feature extractors followed by a continuous density hidden Markov model (CD-HMM) classifier, have been implemented and tested on the Italian database SIVA the MUSER. The database consists of 360 phone calls made by 20 different male speakers from different Italian regions. The false identification probability for the two algorithms has been evaluated for different training sets, different spoken words and a variable number of states of the CD-HMM classifier. Results show that, in any of the considered conditions, the LPC-cepstral based system performs better than the LPC based one and that, in the best working condition, the false identification probability turns out to be of the order of 1.5 per cent.
New methods are proposed to improve the robustness of the linearpredictive (LPC) analysis of speech to noise. These methods are developed based on the concept of noise compensation. By subtracting the noise power fro...
详细信息
New methods are proposed to improve the robustness of the linearpredictive (LPC) analysis of speech to noise. These methods are developed based on the concept of noise compensation. By subtracting the noise power from the autocorrelation function of speech iteratively, the proposed methods achieve success in both reducing the bias of the LPC coefficients and guaranteeing the stability of the LPC inverse filter.
It is indicated that the pitch-synchronous subtle waveform fluctuations of sustained vowels are the acoustic cue for the naturalness in case the sustained vowels do not include fundamental frequency fluctuations. From...
详细信息
It is indicated that the pitch-synchronous subtle waveform fluctuations of sustained vowels are the acoustic cue for the naturalness in case the sustained vowels do not include fundamental frequency fluctuations. From the viewpoint of speech synthesis, it is expected that the more naturally sounding sustained vowels can be synthesized by incorporating the appropriately modeled waveform fluctuations. In this study, the waveform fluctuations of the inverse-filtered normal sustained vowels were analyzed with the aim of high quality speech synthesis. The series of the analyses suggested that the waveform fluctuations could be modeled by the same rule for all speech samples obtained from ten male subjects. In addition, the psychoacoustic experiments confirmed the validity of the method which generates the completely artificial waveform fluctuations for the remarkable enhancement of the quality of sustained vowels.
In general, a variable rate coder can obtain the same speech quality as a fixed rate coder, while reducing the average bit rate. We have developed a variable-rate multimodal speech coder with an average bit rate of 3 ...
详细信息
In general, a variable rate coder can obtain the same speech quality as a fixed rate coder, while reducing the average bit rate. We have developed a variable-rate multimodal speech coder with an average bit rate of 3 kb/s for a speech activity factor of 80% and quality comparable to the GSM full rate coder. The coder has four coding modes and uses a robust classification method involving the pitch gain, zero crossings, and a peakiness measure. Also the coder employs a novel gain-matched analysis-by-synthesis technique for very low rate coding of unvoiced frames and an improved noise-level-dependent postfilter. This paper describes the details of our algorithm and presents the results from subjective listening tests.
Measured responsivity and illuminated I-V data are presented for III-V In/sub x/Ga/sub 1-x/As-based photovoltaic cells with bandgaps of 1.42 eV (GaAs), 0.74 (lattice-matched In/sub 0.53/Ga/sub 0.47/As), and 0.55 eV (&...
详细信息
Measured responsivity and illuminated I-V data are presented for III-V In/sub x/Ga/sub 1-x/As-based photovoltaic cells with bandgaps of 1.42 eV (GaAs), 0.74 (lattice-matched In/sub 0.53/Ga/sub 0.47/As), and 0.55 eV ("extended" lattice-mismatched In/sub 0.7/Ga/sub 0.3/As) relevant to their application as laser power converters (LPCs) sensitive to infrared light out to a long cutoff wavelength of 2.3 /spl mu/m. The authors present an analysis that shows the optimum LPC size to use with a Gaussian illumination profile. They compare some of the tradeoffs of more complex multijunction LPCs with simpler single-junction LPCs used with commercial DC-DC power converter chips.
LPC based speech coders operating at bit rates below 3.0 kbits/sec are usually associated with buzzy or metallic artefacts in the synthetic speech. These are mainly attributable to the simplifying assumptions made abo...
详细信息
LPC based speech coders operating at bit rates below 3.0 kbits/sec are usually associated with buzzy or metallic artefacts in the synthetic speech. These are mainly attributable to the simplifying assumptions made about the excitation source, which are usually required to maintain such low bit rates. A new LPC vocoder is presented which splits the LPC excitation into two frequency bands using a variable cut-off frequency. The lower band is responsible for representing the voiced parts of speech, whilst the upper band represents unvoiced speech. In doing so the coder's performance during both mixed voicing speech and speech containing acoustic noise is greatly improved, producing soft natural sounding speech. The paper also describes new parameter determination and quantisation techniques vital to the operation of this coder at such low bit rates.
Speech recognition is an increasingly popular method for Chinese character input. A fast and reliable hierarchical bounding box method for searching the speech database is proposed. The method borrows from ideas in co...
详细信息
ISBN:
(纸本)0780343719
Speech recognition is an increasingly popular method for Chinese character input. A fast and reliable hierarchical bounding box method for searching the speech database is proposed. The method borrows from ideas in computer graphics, where the hierarchical bounding box concept is used for fast ray-object intersection tests (Hearn et al. 1997).
Speech synthesis technique is classified into three groups; waveform coding, source coding and hybrid coding. Waveform coding and hybrid coding are used for sentence-based-synthesis by synthesis-by-analysis method; th...
详细信息
ISBN:
(纸本)0780336941
Speech synthesis technique is classified into three groups; waveform coding, source coding and hybrid coding. Waveform coding and hybrid coding are used for sentence-based-synthesis by synthesis-by-analysis method; the difficulty of pitch altering makes them inappropriate for synthesis-by-rule. However, if it is possible to alter the pitch period when the waveform coding is used, synthesis-by-rule is available maintaining good intelligibility and naturalness comparable to the original speech. In this paper, we propose a new pitch alteration method that can change the pitch period in waveform coding by scaling the time-axis and where phase compensation is performed by using the zero-inserting and pitch-halving method.
暂无评论