In this paper we report on a study of a technique for 32-band subband/transform coding at 16 kb/s. This approach occupies the middle range of algorithm complexities and frequency resolution between that of Sub-Band Co...
详细信息
In this paper we report on a study of a technique for 32-band subband/transform coding at 16 kb/s. This approach occupies the middle range of algorithm complexities and frequency resolution between that of Sub-Band coding (SBC) and Adaptive Transform coding (ATC). Two designs for 16 kb/s 32-band coders have been simulated on a laboratory computer. The results of informal listening tests indicate that the new designs offer performance comparable to existing ATC techniques while having complexities roughly three times that of existing 4 and 5 band sub-band coders.
This paper describes the design of a toll-quality 4-kbit/s speech coder based on phase-adaptive PSI-CELP. This adaptation method not only gives pitch periodicity to the random excitation but also synchronizes the basi...
详细信息
This paper describes the design of a toll-quality 4-kbit/s speech coder based on phase-adaptive PSI-CELP. This adaptation method not only gives pitch periodicity to the random excitation but also synchronizes the basic point of the stored random vector with the pitch phase. We further improve the proposed coder by introducing a backward gain prediction scheme. In subjective evaluation experiments, there is no significant difference between the quality of ITU-T G.726 32-kbit/s coder and that of the proposed 4-kbit/s coder under the conditions of normal and low input levels, tandem connection for clean speech. In noisy environments, there are also no significant differences between G.726 and 4-kbit/s coders from MOS results of the ACR test.
Line Spectrum Pair (LSP) was first introduced by Itakura [1,2] as an alternative LPC spectral representations. It was found that this new representation has such interesting properties as (1) all zeros of LSP polynomi...
详细信息
Line Spectrum Pair (LSP) was first introduced by Itakura [1,2] as an alternative LPC spectral representations. It was found that this new representation has such interesting properties as (1) all zeros of LSP polynomials are on the unit circle, (2) the corresponding zeros of the symmetric and anti-symmetric LSP polynomials are interlaced, and (3) the reconstructed LPC all-pole filter preserves its minimum phase property if (1) and (2) are kept intact through a quantization procedure. In this paper we prove all these properties via a "phase function." The statistical characteristics of LSP frequencies are investigated by analyzing a speech data base. In addition, we derive an expression for spectral sensitivity with respect to single LSP frequency deviation such that some insight on their quantization effects can be obtained. Results on multi-pulse LPC using LSP for spectral information compression are finally presented.
In an effort to select a speech representation for our next generation concatenative text-to-speech synthesizer, the use of two candidates is investigated; TD-PSOLA and the harmonic plus noise model, HNM. A formal lis...
详细信息
In an effort to select a speech representation for our next generation concatenative text-to-speech synthesizer, the use of two candidates is investigated; TD-PSOLA and the harmonic plus noise model, HNM. A formal listening test has been conducted and the two candidates have been rated regarding intelligibility, naturalness and pleasantness. Ability for database compression and computational load is also discussed. The results show that HNM consistently outperforms TD-PSOLA in all the above features except for computational load. HNM allows for high-quality speech synthesis without smoothing problems at the segmental boundaries and without buzziness or other oddities observed with TD-PSOLA.
An approach to linear prediction coefficient (LPC) analysis based on the normalization of vocal-tract length is presented. The approach is of significance for speech recognition of arbitrary speakers. In this approach...
详细信息
ISBN:
(纸本)0818608781
An approach to linear prediction coefficient (LPC) analysis based on the normalization of vocal-tract length is presented. The approach is of significance for speech recognition of arbitrary speakers. In this approach, the ratio of two vocal-tract lengths corresponding to a new speaker and a reference one is first estimated from the training speech data of several typical vowels. The LPC parameters normalized on this ratio can then be calculated for any speech data. Compared with previous methods of speech parameter normalization, this approach does not need to estimate formant frequencies and is simple and reliable in theory. Limited experiments on the recognition of nine Chinese vowels for four speakers to indicate that this new approach can achieve 5% to 20% improvements of correct recognition rate.< >
This paper presents a vowel recognition for Thai spoken language. The Thai language consists of 9 short unmixed vowels (a, i,ω,u, o, e, ε, γ, [unk]); 9 long unmixed vowels (aa, ii, ωω, uu, oo, ee, £ εε, γ...
详细信息
This paper presents a vowel recognition for Thai spoken language. The Thai language consists of 9 short unmixed vowels (a, i,ω,u, o, e, ε, γ, [unk]); 9 long unmixed vowels (aa, ii, ωω, uu, oo, ee, £ εε, γγ, [unk][unk]); 3 short mixed vowels (ia, ωa, ua); and 3 long mixed vowels (i:a:, ω:a:, u:a:). We proposed uses 3-stage decision making: step 1 distinguishes long and short vowels using coefficients of third order polynomial regression of signal energy as features set and 5-NN as classification method; step 2 classifies each voice segment (frame) into 9 basic vowels using 18 critical band intensities as feature set and 9-NN as classification method; finally step 3 decides whether each frame contains mixed or unmixed vowel via thresholding method. This solution is different from the conventional speech recognition mainly because decision making in this method is done for each frame, while conventional speech recognition chooses the best decision for a sequence of frames forming a word or a sentence. Evaluation is done by applying the algorithm to 3024 voice samples of male and female subjects. Each step of the algorithm is evaluated successively.
Fully depleted (FD) SOI CMOS is a contender for low-voltage IC applications. However, as FD/SOI MOSFETs are scaled, floating-body effects, which previously seemed insignificant, become important. In this paper, we rep...
详细信息
Fully depleted (FD) SOI CMOS is a contender for low-voltage IC applications. However, as FD/SOI MOSFETs are scaled, floating-body effects, which previously seemed insignificant, become important. In this paper, we report kinks in the measured subthreshold current-voltage characteristics of highly scaled FD/SOI MOSFETs, and we describe and model the underlying physical mechanism, showing how it differs from the familiar kink effect in partially depleted (PD) devices. The insight afforded qualifies the meaning of FD/SOI and implies new design issues for low-voltage SOI CMOS.
The paper describes the software architecture of an Italian text-to-speech synthesis system based on the joining of LPC coded diphones. The automatic voice response system is designed according to multichannel and rea...
详细信息
The paper describes the software architecture of an Italian text-to-speech synthesis system based on the joining of LPC coded diphones. The automatic voice response system is designed according to multichannel and real time criteria. For each output channel, the following operations are performed: pre-processing of the input string of characters, translation into the proper sequence of diphones, generation of prosodic contours and real-time control of a hardware speech synthesizer.
In the context of signal reconstruction and coding, a new robust parametric formulation to linear predictive coding (LPC) is introduced. The linear prediction filter coefficients are transformed into a set of weighted...
详细信息
In the context of signal reconstruction and coding, a new robust parametric formulation to linear predictive coding (LPC) is introduced. The linear prediction filter coefficients are transformed into a set of weighted line frequencies. The positive weights play the dual role of a new set of parameters and simultaneously they exhibit the relative importance of the associated line frequencies. This new representation for LPC is shown to be always stable under quantization.
暂无评论