We describe an experiment where listeners were asked to detect two specific forms of stress in talkers' recorded voices heard via six different simulated communication systems. Both task-induced stress and dramati...
详细信息
We describe an experiment where listeners were asked to detect two specific forms of stress in talkers' recorded voices heard via six different simulated communication systems. Both task-induced stress and dramatized urgency were used. Communication systems included low-rate digital speech coding combined with bit errors, packet loss, and packet loss concealment. Twenty-four listeners participated in a total of 11,520 detection trials. A parallel investigation of word intelligibility in sentence context used 576 trials. Intelligibility results showed wide variance due to communication system and stress detection results showed less variance. More specifically, we found that listener detection of dramatized talker urgency was 4.7 times more robust to communication system degradations than word intelligibility in sentence context.
A description is given of experimental work investigating the acoustic attenuation of an oxygen mask typical of those worn by military aircrew. Dominant noise transmission paths over the mask are identified, and the b...
详细信息
A description is given of experimental work investigating the acoustic attenuation of an oxygen mask typical of those worn by military aircrew. Dominant noise transmission paths over the mask are identified, and the behaviour of the expiration valve during speech is reported. This valve, which constitutes the most important noise transmission path, opens and closes at a surprisingly high rate during speech. This introduces a time variance into the mask's noise transmission, which is shown to complicate the application of adaptive cancellation techniques to the aircrew oxygen mask problem.< >
In this paper, a vector quantization-block constrained trellis coded quantization (VQ-BCTCQ) is presented to quantize line spectrum frequency (LSF) parameters of the wideband speech codec. Both the predictive structur...
详细信息
In this paper, a vector quantization-block constrained trellis coded quantization (VQ-BCTCQ) is presented to quantize line spectrum frequency (LSF) parameters of the wideband speech codec. Both the predictive structure and safety-net concept are combined into VQ-BCTCQ to develop the predictive VQ-BCTCQ. The performance of this quantization is compared with that of the linear predictive coding (LPC) vector quantizer used in the AMR-WB codec, and reductions in spectral distortion (SD) and encoding complexity are demonstrated.
This paper focuses on the analysis of the periodic part of speech signals using harmonic models. Three different models are discussed with respect to their effectiveness in modeling the periodic (harmonic) part of spe...
详细信息
This paper focuses on the analysis of the periodic part of speech signals using harmonic models. Three different models are discussed with respect to their effectiveness in modeling the periodic (harmonic) part of speech. The non-periodic part of speech is then obtained by subtracting in the time domain the harmonic part from the original speech signal.
The internet Low Bit-rate Codec (iLBC) inherently possesses high robustness to packet loss which is one of the essential properties of Voice over Internet Protocol (IP) applications. Another important feature is the r...
详细信息
The internet Low Bit-rate Codec (iLBC) inherently possesses high robustness to packet loss which is one of the essential properties of Voice over Internet Protocol (IP) applications. Another important feature is the rate flexibility, which allows the speech codec to adapt its bit rate to constantly changing network condition. Previously, the multi-rate operation of the iLBC was enabled by utilizing the Discrete Cosine Transform (DCT) and entropy coding. In this paper, various approaches to improve performance are presented. The simulation results show that when all the improvement schemes are combined, the performance is improved at all the bit rates compared to the previous results despite the fact that the Huffman table structure is significantly simplified.
In modern digital communication systems, more and more vocoders are used. At the same time improving their resistance to noise also becomes an important issue since their input speech is often unavoidably corrupted by...
详细信息
In modern digital communication systems, more and more vocoders are used. At the same time improving their resistance to noise also becomes an important issue since their input speech is often unavoidably corrupted by noise when they work in real world. This paper investigates the improved multi-band excitation (IMBE) parameters' sensitivity to white noise and the parameter error's effect on the quality of synthesised speech. As a result a key parameter is found which contributes most to the heavy distortion of the synthesised speech in a noisy environment. Then the possibility of using speech enhancement to reduce such distortion is examined. Finally the results are presented when MMSE-based speech enhancement is applied to the IMBE vocoder.
In this paper, the sufficient conditions for designing error-free integer-modulated filter banks is proposed. The methods for selecting the best filter bank for the purpose of data compression from a large number of c...
详细信息
In this paper, the sufficient conditions for designing error-free integer-modulated filter banks is proposed. The methods for selecting the best filter bank for the purpose of data compression from a large number of candidates satisfying the sufficient perfect reconstruction conditions are also extensively studied. The ability of image compression using the resulting filter banks are tested. The simulation results show that the integer-modulated filter banks obtained by maximizing the coding gain based the AR(1) signal model performs very closely to the real-valued modulated subband filter banks.
A feature extraction chip for speech recognition computes fifteen cepstra each 8ms at 64kHz clock rate and dissipates 30µW at 0.9V. It has been implemented as a gate array in a 0.5µm, three-metal CMOS techno...
详细信息
A feature extraction chip for speech recognition computes fifteen cepstra each 8ms at 64kHz clock rate and dissipates 30µW at 0.9V. It has been implemented as a gate array in a 0.5µm, three-metal CMOS technology. The average energy required to process a single word of the TI46 speech corpora is 10µJ. It achieves recognition rates over 98% in isolated-word, speech recognition tasks.
Due to the effect of globalization, mixing languages between Thai and English has been commonly used in typical conversations in Thailand, even if talking with Thai natives. Consequently, Thai automatic speech recogni...
详细信息
ISBN:
(数字)9781728198965
ISBN:
(纸本)9781728198972
Due to the effect of globalization, mixing languages between Thai and English has been commonly used in typical conversations in Thailand, even if talking with Thai natives. Consequently, Thai automatic speech recognition that is deployed in multilingual communities are able to handle Thai-English code-switching. One of the main challenges in building a system is selecting phone set for Thai-English pairs which mother tongue-like accent interferes with the English pronunciation. This paper shows evidence that an acoustic model with a Thai phoneme set improves the recognition performance for Thai-English code-mixing speech. The baseline system for comparison built with merge phoneme for Thai and English where the phones were simply combined. The experimental results shown that the word error rate of monolingual Thai phones is reduced by 4.5%.
Though vector quantizers are more efficient than scalar quantizers, their use for fine quantization of linear predictive coding (LPC) information (using 24-26 b/frame) is impeded due to their prohibitively high comple...
详细信息
Though vector quantizers are more efficient than scalar quantizers, their use for fine quantization of linear predictive coding (LPC) information (using 24-26 b/frame) is impeded due to their prohibitively high complexity. In the present work, a split vector quantization approach is used to overcome the complexity problem. The LPC vector, consisting of ten line spectral frequencies (LSFs), is divided into two parts and each part is quantized separately using vector quantization. Using the localized spectral sensitivity property of the LSF parameters, a weighted LSF distance measure is proposed. Using this distance measure, it is shown that the split vector quantizer can quantize LPC information in 24 b/frame with 1-dB average spectral distortion and <2% outlier frames (having spectral distortion greater than 2 dB).< >
暂无评论