In this paper we propose an extension of the very low bit-rate speech coding technique, exploiting predictability of the temporal evolution of spectral envelopes, for wide-band audio coding applications. Temporal enve...
详细信息
ISBN:
(纸本)9781424407286
In this paper we propose an extension of the very low bit-rate speech coding technique, exploiting predictability of the temporal evolution of spectral envelopes, for wide-band audio coding applications. Temporal envelopes in critically band-sized sub-bands are estimated using frequency domain linear prediction applied on relatively long time segments. The sub-band residual signals, which play an important role in acquiring high quality reconstruction, are processed using a heterodyning-based signal analysis technique. For reconstruction, their optimal parameters are estimated using a closed-loop analysis-by-synthesis technique driven by a perceptual model emulating simultaneous masking properties of the human auditory system. We discuss the advantages of the approach and show some properties on challenging audio recordings. The proposed technique is capable of encoding high quality, variable rate audio signals on bit-rates below 1bit/sample.
Spectral dynamics have attracted the attention of researchers in speech recognition for a long time. As part of the speech feature vector they are found to be useful and hence are almost part of any feature extraction...
详细信息
ISBN:
(纸本)0780382927
Spectral dynamics have attracted the attention of researchers in speech recognition for a long time. As part of the speech feature vector they are found to be useful and hence are almost part of any feature extraction algorithm for speech recognition. However, the usual cepstral dynamics do not directly reflect the dynamics of the speech spectrum, as they are extracted from cepstral parameters. In this paper we show that dynamic parameters obtained directly from the speech spectrum can perform better under low-SNR noisy speech conditions, in comparison to the conventional dynamic cepstral parameters. Results on a compact set of the Aurora task have been reported.
Speech Enhancement refers to the improvement in the intelligibility and or the quality of the degraded speech signal using signal processing techniques. Till recent days speech enhancement is a very difficult problem ...
详细信息
ISBN:
(纸本)9781479961085
Speech Enhancement refers to the improvement in the intelligibility and or the quality of the degraded speech signal using signal processing techniques. Till recent days speech enhancement is a very difficult problem because the noise content in the speech signals varies its nature and characteristics with time and application to application. Using speech enhancement techniques the quality and intelligibility of a speech signal can't be preserved simultaneously. So generally a trade off is maintained between these two. In speech communication there are number of applications where speech enhancement is required for Example: VoIP, hands free communication, hearing aids, answering machines, speech recognition, teleconferencing systems, car and mobile phones In this work the main focus is on the development of speech enhancement algorithm that maintains a proper tradeoff between quality and intelligibility in the speech signal. This can be made possible using the time and spectral information in the speech signal. This work also focus on the problem of enhancing the compressed version of the speech signal, to improve the intelligibility of the speech signal. The performance measures like Signal to Noise Ratio (SNR), Mean opinion Score (MOS), Pitch and Formants used to find the performance of a speech enhancement algorithm which varies from application to application.
In this paper, a packet loss concealment scheme based on sinusoidal extrapolation is proposed. It is to be implemented at the receiver end of a packetized speech transmission system and conceals frame-erasures in a PC...
详细信息
ISBN:
(纸本)0780374029
In this paper, a packet loss concealment scheme based on sinusoidal extrapolation is proposed. It is to be implemented at the receiver end of a packetized speech transmission system and conceals frame-erasures in a PCM waveform. The scheme operates on the source-filter components of the speech signal, which are obtained through linear Prediction (LP) analysis. When one or more speech frames are missing, the LP residual and filter coefficients are extrapolated separately. The residual is extrapolated based on sinusoidal modeling of the last correct residual frame, whereas the latest filter is repeated and bandwidth expanded. The proposed scheme is compared to two benchmark systems by means of subjective testing. The benchmark systems are based on repetitions of the last pitch period;one operates on the source-filter components, the other directly on the speech signal. It is noted that the systems operating on the source-filter components are preferred by the listeners, and that the proposed system is clearly superior to both benchmark systems.
In this paper, a non-linear spectral estimation for noise reduction is present which is approximated and implemented by double Radial Basis Function (RBF) networks. The simulation results indicate that the method can ...
详细信息
ISBN:
(纸本)9780769531199
In this paper, a non-linear spectral estimation for noise reduction is present which is approximated and implemented by double Radial Basis Function (RBF) networks. The simulation results indicate that the method can greatly improve the quality and the intelligibility of speech, and have other advantages such as the widely applicable Signal-to-Noise Ratio (SNR) range, less computation load Particularly the method may maintain the preferable accurate of signal in speech waveform, and the quality of speech signals have been improved obviously.
Speech coding at very low bit rates has many applications such as answering machines, IP telephony, mobile communications, military communications etc. Most low bit rate coders operate at around 2.4 kb/s, as the speec...
详细信息
ISBN:
(纸本)0780364163
Speech coding at very low bit rates has many applications such as answering machines, IP telephony, mobile communications, military communications etc. Most low bit rate coders operate at around 2.4 kb/s, as the speech quality degrades too much below this bit rate. In this paper we describe a frequency domain speech coder capable of operating at both 2.4 and 1.2kb/s, and produces good quality synthesised speech. Both rates use the same analysis and synthesis building blocks over 20ms, but the 1.2 kb/s coder jointly quantises three sets of parameters every 60 ms to reduce the bit rate while maintaining speech quality. We also describe the quantisation methods used to lower the bit rate from 2.4 kb/s to 1.2 kb/s while retaining most of the quality of the higher bit rate version.
A complete algorithm of a 1200-bits/s digital formant vocoder system is described. This vocoder algorithm draws heavily on the results of recent research in linear predictive coding. The transmitting parameters are fr...
详细信息
In this paper we extend a lossy compression technique for surface EMG signals, which is based on the Algebraic Code Excited linear Prediction (ACELP) paradigm, to compress multi-channel surface EMG recordings by explo...
详细信息
ISBN:
(纸本)1424407281
In this paper we extend a lossy compression technique for surface EMG signals, which is based on the Algebraic Code Excited linear Prediction (ACELP) paradigm, to compress multi-channel surface EMG recordings by exploiting the correlation between the Line Spectral Frequencies (LSF). Experimental results show that the LSFs of the inner signals in a multi-channel recording can be efficiently represented with 13 bit/frame, versus the 38 bit/frame needed by independent ACELP coding of each signal, thus saving 66% of the bandwidth needed to transmit these coefficients while maintaining comparable performance in terms of the SNR, Average Rectified Value and Root Mean Square of the waveform, and mean and median frequencies of the power spectrum.
A delay-free audio coding scheme based on ADPCM with adaptive pre- and post-filtering is presented. The pre-/post-filters are realized as a cascade of shelving filters, designed to match the characteristics of human p...
详细信息
ISBN:
(纸本)9781424414833
A delay-free audio coding scheme based on ADPCM with adaptive pre- and post-filtering is presented. The pre-/post-filters are realized as a cascade of shelving filters, designed to match the characteristics of human perception. The pre- and post-filters are adapted by dynamic compression of the respective sub-bands. The adaption is backward-adaptive, i.e. is fed by the reconstructed signal, which eliminates the need to transmit the filter coefficients and allows delay-free operation. This pre- and post-filtering significantly improves the audio quality compared to a plain ADPCM codec, as underlined by objective measurements. Since the base ADPCM used is also delay-free, the resulting coding system works without any algorithmic delay.
This paper presents the speech recognition of Malay Alveolar consonants of Malay children using neural networks in a speaker-independent manner. The Alveolar consonants consist of /d/, /t/, /l/, /r/, /s/, /z/ and /n/....
详细信息
ISBN:
(纸本)9783642038815
This paper presents the speech recognition of Malay Alveolar consonants of Malay children using neural networks in a speaker-independent manner. The Alveolar consonants consist of /d/, /t/, /l/, /r/, /s/, /z/ and /n/. The Alveolar consonants are combined with six Malay vowels to form the consonant-vowel (CV) syllable sounds. 100 children are involved in the speech recording with a total of 4200 speech sounds. The speech sounds are recorded at a sampling rate of 20 KHz with 16-bit resolution. linear predictive coding (LPC) is used to extract the speech feature extraction. Multi-layer Perceptron (MLP), which is one of the most popular neural networks, is used to classify the Alveolar consonants. Experiments are conducted to determine the optimal signal length of the consonants, and hidden neuron number of MLP. A maximum recognition rate of 62.14% is obtained at signal lengths of 150ms and 160ms.
暂无评论