It is indicated that the pitch-synchronous subtle waveform fluctuations of sustained vowels are the acoustic cue for the naturalness in case the sustained vowels do not include fundamental frequency fluctuations. From...
详细信息
It is indicated that the pitch-synchronous subtle waveform fluctuations of sustained vowels are the acoustic cue for the naturalness in case the sustained vowels do not include fundamental frequency fluctuations. From the viewpoint of speech synthesis, it is expected that the more naturally sounding sustained vowels can be synthesized by incorporating the appropriately modeled waveform fluctuations. In this study, the waveform fluctuations of the inverse-filtered normal sustained vowels were analyzed with the aim of high quality speech synthesis. The series of the analyses suggested that the waveform fluctuations could be modeled by the same rule for all speech samples obtained from ten male subjects. In addition, the psychoacoustic experiments confirmed the validity of the method which generates the completely artificial waveform fluctuations for the remarkable enhancement of the quality of sustained vowels.
This paper presents a source controlled variable-rate CELP type speech codec. First, a voice activity detection block distinguishes active speech frames from silence and background noise. The active speech is further ...
详细信息
This paper presents a source controlled variable-rate CELP type speech codec. First, a voice activity detection block distinguishes active speech frames from silence and background noise. The active speech is further classified into voiced and unvoiced frames. The voiced frames have variable bit-rate pitch-lag quantization based on the characteristics of the speech, whereas the unvoiced frames are coded without pitch information. A variable bit-rate fixed codebook excitation with a variable number of excitation pulses is determined for each speech frame. The performance of the linear analysis part of the codec as well as the input speech characteristics determine the excitation bit-rate. The average bit-rate of the codec is around 7.0 kbit/s for active speech, and the overall bit-rate ranges from 0 to 7.85 kbit/s. The described variable-rate codec produces toll quality speech equal to that of the 32 kbit/s ADPCM (G.726) standard.
This paper describes a 1200 bps voice codec that is based on a 10th-order linear prediction analysis, split vector quantization of line spectral frequencies, and differential pitch quantization. Robust codebooks and e...
详细信息
This paper describes a 1200 bps voice codec that is based on a 10th-order linear prediction analysis, split vector quantization of line spectral frequencies, and differential pitch quantization. Robust codebooks and error protection and concealment techniques are used to minimize the effect of channel errors. Listening tests show that the codec has a high degree of speech intelligibility and speaker recognizability with natural voice quality.
This report describes a speaker verification system based on a vector quantization (VQ) approach that incorporates dynamic time warping (DTW), cohort models, and a discriminator to separate the true speakers and impos...
详细信息
This report describes a speaker verification system based on a vector quantization (VQ) approach that incorporates dynamic time warping (DTW), cohort models, and a discriminator to separate the true speakers and imposters. This system is designed for telephone network applications and provides good performance for different telephone handsets and network conditions. The speaker verification system is text dependent, and each subscriber uses a personal pass phrase to verify his/her identity. A distortion vector is computed using VQ encoding and DTW distortions with respect to speaker-dependent and cohort code books. The components of this distortion vector are applied to a linear discriminator to validate the identity of the speaker. A speaker verification database was used to evaluate the performance of this approach, and it was found that the performance is significantly better compared with that of a basic VQ system. The system achieved an equal error rate (EER) of 0.92% when the true speaker and the imposters spoke different pass phrases and 4.30% when they spoke the same phrase.
A low rate speech coding algorithm, harmonic vector excitation coding (HVXC) is proposed for MPEG-4 standardization, in which an efficient coding scheme based on harmonic and stochastic vector representation of linear...
详细信息
A low rate speech coding algorithm, harmonic vector excitation coding (HVXC) is proposed for MPEG-4 standardization, in which an efficient coding scheme based on harmonic and stochastic vector representation of linear predictive coding (LPC) residuals is employed. A combination of weighted vector quantization of harmonic spectral envelope of LPC residual signal for voiced segments and vector excitation coding for unvoiced segments provides good speech duality at very low bit rates. MPEG-4 formal listening tests in December 95 showed that the subjective speech quality of HVXC at 2.0 kbps was better than that of FS1016 4.8 kbps CELP.
This paper presents AT&T's candidate coder for the ITU-T's new wideband speech coding standard at 16, 24 and 32 kb/s. This coder achieves high speech quality with a low coder complexity. The basic idea of ...
详细信息
ISBN:
(纸本)0818679190
This paper presents AT&T's candidate coder for the ITU-T's new wideband speech coding standard at 16, 24 and 32 kb/s. This coder achieves high speech quality with a low coder complexity. The basic idea of the coder is to perform closed-loop pitch prediction on perceptually weighted speech, and then quantize the prediction residual using perceptually based transform coding techniques. A first version of the coder based on DFT was thoroughly tested and submitted to the ITU-T in February 1996, and it was selected as one of two surviving candidates to advance to the next phase. A revised version based on MDCT was later submitted in October 1996. Both versions are described.
New methods are proposed to improve the robustness of the linearpredictive (LPC) analysis of speech to noise. These methods are developed based on the concept of noise compensation. By subtracting the noise power fro...
详细信息
New methods are proposed to improve the robustness of the linearpredictive (LPC) analysis of speech to noise. These methods are developed based on the concept of noise compensation. By subtracting the noise power from the autocorrelation function of speech iteratively, the proposed methods achieve success in both reducing the bias of the LPC coefficients and guaranteeing the stability of the LPC inverse filter.
The goal of this paper is to propose a new perceptually-based objective technique that uses radial basis functions neural networks, instead of regression algorithms, to estimate the nonlinear mapping function that bes...
详细信息
The goal of this paper is to propose a new perceptually-based objective technique that uses radial basis functions neural networks, instead of regression algorithms, to estimate the nonlinear mapping function that best represents the relationship among input (perceptual parameters) and output (speech quality) variables in a database. In the proposed technique, the perceptual parameters are obtained by: (1) emulating several known features of perceptual processing of speech sounds by the human ear (including critical-band masking, equal loudness, and the intensity-loudness power law operations) to map the speech power spectrum into the auditory power spectrum (bark domain), (2) deriving the perceptual LPC coefficients from the auditory spectrum that is used to calculate, for each frame, the cepstrum distance between the input and the output coded speech signals; (3) using the radial basis functions neural network to map the perceptual cepstrum distance per frame into the corresponding estimated speech quality. After extensive experimentation and validation of the proposed techniques, the results indicate that the proposed technique is shown to be effective for estimating the coded speech quality.
The Puzzle Project is an interactive software system that solves jigsaw puzzles. The voice interface includes speech synthesis and word recognition. The attributes of the puzzle pieces are determined using image proce...
详细信息
The Puzzle Project is an interactive software system that solves jigsaw puzzles. The voice interface includes speech synthesis and word recognition. The attributes of the puzzle pieces are determined using image processing techniques and wavelet decomposition. Two algorithms are used to solve the puzzles: an expert system and fuzzy logic. This paper describes the steps required to find the solution to the puzzle from image processing to decision-making algorithms. It also explains the techniques involved in designing the voice interface.
Measured responsivity and illuminated I-V data are presented for III-V In/sub x/Ga/sub 1-x/As-based photovoltaic cells with bandgaps of 1.42 eV (GaAs), 0.74 (lattice-matched In/sub 0.53/Ga/sub 0.47/As), and 0.55 eV (&...
详细信息
Measured responsivity and illuminated I-V data are presented for III-V In/sub x/Ga/sub 1-x/As-based photovoltaic cells with bandgaps of 1.42 eV (GaAs), 0.74 (lattice-matched In/sub 0.53/Ga/sub 0.47/As), and 0.55 eV ("extended" lattice-mismatched In/sub 0.7/Ga/sub 0.3/As) relevant to their application as laser power converters (LPCs) sensitive to infrared light out to a long cutoff wavelength of 2.3 /spl mu/m. The authors present an analysis that shows the optimum LPC size to use with a Gaussian illumination profile. They compare some of the tradeoffs of more complex multijunction LPCs with simpler single-junction LPCs used with commercial DC-DC power converter chips.
暂无评论