In this paper, a digital processing method is described for modifying tone contrast that was defined as the difference in frequencies between peaks and valleys of pitch curves in natural utterances. Speech signals wit...
详细信息
In this paper, a digital processing method is described for modifying tone contrast that was defined as the difference in frequencies between peaks and valleys of pitch curves in natural utterances. Speech signals with modified tones were presented to hearing-impaired Chinese listeners who were asked to identify four alternative Mandarin words. Employing this method, it was found that modified speech with enhanced tone contrast contributed moderate gains in the percentage correct word identification when compared to unmodified speech, while reducing tone contrast generally reduced the percentage correct identification. These findings therefore offer support to the assertion that a hearing aid with tone modifications is indeed effective for hearing-impaired Chinese.
This paper presents an approach to speech vector quantization of sources exhibiting intervector dependency. We present the optimal decoder based on a collection of received indices. We also present the optimal encoder...
详细信息
This paper presents an approach to speech vector quantization of sources exhibiting intervector dependency. We present the optimal decoder based on a collection of received indices. We also present the optimal encoder for such decoding. The optimal decoder can be implemented as a table look-up decoder, however the size of the decoder codebook grows very fast with the size of the collection of utilized indices. This leads us to introduce a method for storing an approximation to the set of optimal decoder vectors, based on linear mapping of a block code vector quantization. In this approach a heavily reduced set of parameters is employed to represent the codebook. Furthermore, we illustrate that the proposed scheme has an interpretation as nonlinearpredictive quantization. Numerical results indicate high gain over memoryless coding and memory quantization based on linear predictive coding. The results also show that the sub-optimal approach performs close to the optimal.
Most published evaluations of LPC systems use only one or two speakers. Since LPC quality and intelligibility are known to depend on the speaker, this is an inadequate test of a synthesis system. We recorded eight men...
详细信息
Most published evaluations of LPC systems use only one or two speakers. Since LPC quality and intelligibility are known to depend on the speaker, this is an inadequate test of a synthesis system. We recorded eight men and nine women chosen from a speech data base of 81 speakers who were independently rated by two phoneticians for the presence or absence of the following voice characteristics: nasality, harshness, creak, whisper, and pitch extreme. The 17 talkers represented a balanced sample of strong positives or negatives of the five voice characteristics. Each speaker was recorded on one fifty word set from the Modified Rhyme Test. Monosyllabic word intelligibility tests were administered to 88 listeners (with four listeners per speaker set). Results from the intelligibility tests for different speakers show that vocal characteristics and resultant LPC quality are linked. Nasality and whisper are the most strongly correlated with a decreased LPC intelligibility.
Code excited linear predictor coders hold promise to achieve high quality speech at low bit rates. We propose a ternary excitation based CELP coder with a new structure to achieve toll quality speech at 4 kbps. Speech...
详细信息
Code excited linear predictor coders hold promise to achieve high quality speech at low bit rates. We propose a ternary excitation based CELP coder with a new structure to achieve toll quality speech at 4 kbps. Speech quality is maintained by allocating more bits for the codebook index allowing for larger codebooks which provide better speech quality as quantization levels increase. To allocate more bits for the codebook index a backward adaptive 10-th order LPC predictor is used. Regular structure of the lattice codebook and convexity of error surface have been exploited to greatly improve the efficiency of the search algorithm. The storage requirement of the codebook is eliminated by transmitting the position of three weights used in generating the ternary codebook instead of the codebook index. Speech quality obtained using the new CELP structure is studied and results compared with the LBG and the Gaussian codebooks.
The paper investigates the use of neural networks in recognizing the phonation of the speech sounds. The proposed method classifies the Malay plosive sounds of adults and children based on phonation in a speaker-indep...
详细信息
The paper investigates the use of neural networks in recognizing the phonation of the speech sounds. The proposed method classifies the Malay plosive sounds of adults and children based on phonation in a speaker-independent manner. The proposed method achieves encouraging result with an average accuracy of 98%.
The present paper describes the implementation and experimental results of the DAQP02 system on a chip (SoC) for petroleum pipeline inspection. This integrated circuit is able to read and process information about phy...
详细信息
The present paper describes the implementation and experimental results of the DAQP02 system on a chip (SoC) for petroleum pipeline inspection. This integrated circuit is able to read and process information about physical phenomena inside pipelines. The DAQP02 has two multiplexed analog inputs, one 8-bit A/D converter, an 8-bit data bus and a 22-bit address bus. Due to its small dimensions and low power consumption features, it is an efficient in-line inspection tool. The integrated circuit was manufactured in CMOS 0.8 /spl mu/m double poly, double metal, n-well technology.
The STU-III program has been very successful in providing secure, high quality communications. The STU-III program, however, has so far been restricted to strategic networks. Tactical networks use devices such as SINC...
详细信息
The STU-III program has been very successful in providing secure, high quality communications. The STU-III program, however, has so far been restricted to strategic networks. Tactical networks use devices such as SINCGARS, VINSON, and MINTERM (tactical terminal) which can not interoperate with the STU-III network. The paper discusses the modifications to MINTERM that provide it with a STU-III-compatible mode. It is shown that most of the changes can be made in software and that the modified MINTERM can interoperate with the STU-III network over a large variety of media.
A digital circuit multiplication system (DCMS), which is equipped with an echo canceller to perform activity detection by applying reciprocal action between an echo canceller and a speech detector, is described. A hig...
详细信息
A digital circuit multiplication system (DCMS), which is equipped with an echo canceller to perform activity detection by applying reciprocal action between an echo canceller and a speech detector, is described. A high-quality speech detector, developed by introducing a hangover controller and linear predictive coding (LPC) analysis, and a trial model of the DCMS using advanced adaptive differential pulse-code modulation (ADPCM) which allowed 4-bit coding for voiceband data (VBD) signals, are presented. Efficiency improved by 10% using 4-bit coding for VBD.< >
A combined quantization-interpolation of speech line spectrum pair (LSP) parameters is proposed. To utilize the linear dependency between successive LSP frames, the proposed algorithm attempts to locate the frames whe...
详细信息
A combined quantization-interpolation of speech line spectrum pair (LSP) parameters is proposed. To utilize the linear dependency between successive LSP frames, the proposed algorithm attempts to locate the frames where there is a significant spectral change; these frames are encoded by vector quantization. The remaining frames are reconstructed by linear interpolation between the vector-quantized frames. A two-pass algorithm is used to couple the quantization and interpolation operations. Simulation results for the performance of the combined quantization-interpolation approach indicate that, at rates below 350 bits/s (for coding the spectral parameters), the proposed scheme outperforms the multiframe coding scheme of D. P. Kemp et al. (ICASSP-91, p.609-12). The effect of maximum delay on the overall performance is studied, and the performance of the proposed system over noisy channels is examined.< >
An optimal method for organizing acoustic features to recognize phonemes in continuous speech is described. Each level of acoustic features, including power and its variational pattern, and the linearpredictive codin...
详细信息
An optimal method for organizing acoustic features to recognize phonemes in continuous speech is described. Each level of acoustic features, including power and its variational pattern, and the linear predictive coding Mel-cepstrum and its pattern of temporal change, is clustered hierarchically on the basis of the mutual information between the acoustic feature vector and phoneme labels assigned for the speech wave. Multilevel clustering is used to discriminate phonemes by detecting the most reliable features in the context and by using the effective combination of acoustic characteristics. Phoneme recognition for each frame is discussed. The conditional entropy is evaluated for the phoneme labels of the frame, given the various acoustic features for the neighboring frames. Phoneme discrimination can be performed effectively using the conditional entropy. In the preliminary test the phoneme recognition rate was 81.6%, and the vowel recognition rate was 92.4% in the frame level. In a completely talker-independent experiment the recognition rates were 76.8% and 89.7%, respectively.< >
暂无评论