We have already proposed novel robust parameter estimation algorithms of a time-varying complex AR (TV-CAR) model for analytic speech signals, which are based on GLS (generalized least squares) and ELS (extended least...
详细信息
We have already proposed novel robust parameter estimation algorithms of a time-varying complex AR (TV-CAR) model for analytic speech signals, which are based on GLS (generalized least squares) and ELS (extended least squares) and have shown that the methods can achieve robust speech spectrum estimation against additive white Gaussian. In these methods, forward prediction error is only used to calculate the MSE criterion. This paper proposes the improved TV-CAR speech analysis methods based on forward and backward linear prediction in which backward prediction error is also adopted to calculate the MSE criterion, viz., the MMSE and GLS-based algorithms using the forward and backward prediction. The experiments with natural speech and natural speech corrupted by white Gaussian demonstrate that the improved methods can achieve more accurate and more stable spectral estimation.
The sensitivity of military standard MELP (MIL-STD-3005) to randomly deleted frames on an IP network is studied. Frame deletion occurs when packets are lost or arrive too late to be useful in a voice-over-IP (VoIP) ap...
详细信息
The sensitivity of military standard MELP (MIL-STD-3005) to randomly deleted frames on an IP network is studied. Frame deletion occurs when packets are lost or arrive too late to be useful in a voice-over-IP (VoIP) application. Network errors including packet loss, burst loss, jitter and out-of-order packets are simulated. The quality of reconstructed speech is measured by spectral distortion.
Accurate linear prediction coefficient (LPC) estimation is one of the key requirements for low bit-rate voice coding. Under harsh acoustic conditions, LPC estimation can become unreliable. This results in poor quality...
详细信息
Accurate linear prediction coefficient (LPC) estimation is one of the key requirements for low bit-rate voice coding. Under harsh acoustic conditions, LPC estimation can become unreliable. This results in poor quality of encoded speech and introduces annoying artifacts. The paper presents a two-branch speech enhancement preprocessing scheme for low bit-rate voice coders. The scheme consists of two parallel denoising blocks. One block enhances the degraded speech for LPC estimation. Another block increases the perceptual quality of the speech to be coded. The goal of the paper is to design the two-branch scheme. Test results show that the two-branch scheme can provide better perceptual quality compared to conventional one-branch speech enhancement techniques in noisy environments.
Analysis-by-synthesis (AbS) speech codecs such as multi-pulse LPC and CELP compute the coefficients of the synthesis filter by minimizing the linear prediction (LP) error and the excitation function by minimizing the ...
详细信息
Analysis-by-synthesis (AbS) speech codecs such as multi-pulse LPC and CELP compute the coefficients of the synthesis filter by minimizing the linear prediction (LP) error and the excitation function by minimizing the closed-loop synthesis error. In the presence of additive noise, the LPC coefficients will be affected by noise, resulting in a synthesized speech with degraded quality. This paper points out the noise advantages of computing both the filter parameters and the excitation function by minimizing the closed-loop synthesis error. By minimizing the synthesis error, the analysis and synthesis stages become more compatible. Furthermore, It is shown that unlike LP-based estimation, minimization of the synthesis error could lead to noise-free filter parameters for uncorrelated (colored) noise. An approach for reducing the synthesis error is presented and simulation results on both synthetic and real speech are given.
We have made an attempt to study the spectral characteristics of two North East Indian languages, Assamese and Boro, coming from different genres. We have taken a few words with similar, partially similar, and dissimi...
详细信息
We have made an attempt to study the spectral characteristics of two North East Indian languages, Assamese and Boro, coming from different genres. We have taken a few words with similar, partially similar, and dissimilar characteristics in their nature of utterance from Assamese and Boro. The spectral analysis revels that both the languages have a high degree of similarity at both voiced and unvoiced regions for words with the same meaning. LPC modeling of the random selection of similar, partially similar and dissimilar phonemes may not provide characteristic features of the two languages, but cepstral coefficient analysis of the LPC modeling may provide important information in studying the linguistic and phonetic distinction of the two languages and, finally, speaker identification and word verification.
An optimized software implementation of a high quality MPEG AAC-LC (low complexity) audio encoder is presented in this paper. The standard reference encoder is improved by utilizing several algorithmic optimizations (...
详细信息
An optimized software implementation of a high quality MPEG AAC-LC (low complexity) audio encoder is presented in this paper. The standard reference encoder is improved by utilizing several algorithmic optimizations (fast psycho-acoustic model, new tonality estimation, new time domain block switching, optimized quantizer and Huffman coder) and very careful code optimizations for PC CPU architectures with SIMD (single-instruction-multiple-data) instruction set. The psychoacoustic model used the MDCT filterbank for energy estimation and peak detection as a measure of tonality. Block size decision is based on local perceptual entropies as well as LPC analysis of the time signal. Algorithmic optimizations in the quantizer include loop control module modification and optimized Huffman search. Code optimization is based on parallel processing by replacing vector algebra and math junctions with their optimized equivalents with Intel/sup /spl reg// Signal Processing Library (SPL). The implemented codec outperforms consumer MP3 encoders at 30% less bitrate at the same time achieving encoding times several times faster than real-time.
In this paper, we present an approach for real-time speech-driven 3D face animation using neural networks. We first analyze a 3D facial movement sequence of a talking subject and learn a quantitative representation of...
详细信息
In this paper, we present an approach for real-time speech-driven 3D face animation using neural networks. We first analyze a 3D facial movement sequence of a talking subject and learn a quantitative representation of the facial deformations, called the 3D motion units (MUs). A 3D facial deformation can be approximated by a linear combination of the MUs weighted by the MU parameters (MUPs) - the visual features of the facial deformation. The facial movement sequence synchronizes with a audio track. The audio track is digitized and the audio features of each frame are calculated. A real-time audio-to-MUP mapping is constructed by training a set of neural networks using the calculated audio-visual features. The audio-visual features are divided into several groups based on the audio features. One neural network is trained per group to map the audio features to the corresponding MUPs. Given a new audio feature vector, we first classify it into one of the groups and select the corresponding neural network to map the audio feature vector to MUPs, which are used for face animation. The quantitative evaluation shows the effectiveness of the proposed approach.
Reusing IPs requires designers to perform interface protocol related tasks such as writing test benches and designing interface protocol conversion circuits, e.g, wrappers for IPs. The results of those tasks usually i...
详细信息
ISBN:
(纸本)0780374940
Reusing IPs requires designers to perform interface protocol related tasks such as writing test benches and designing interface protocol conversion circuits, e.g, wrappers for IPs. The results of those tasks usually include the interface protocol components for the corresponding IPs, similar to bus protocol components of the bus functional models. Interface protocols of most IPs can be abstracted in transactions. This paper presents a transaction-oriented interface protocol description language which models interface protocol components recognizing or executing transactions over the given interface ports. In addition, we describe a target structure of the synthesizable interface protocol component together with its application to an IP wrapper design. The proposed approach not only reduces re-works on the interface protocol components but also enables the methodology that can be called "transaction-based interface design or synthesis".
Line spectrum pair (LSP) was first introduced by Itakura (1975) as an alternative representation for the LPC spectral information. It was later found that LSPs have very desirable quantization and interpolation proper...
详细信息
Line spectrum pair (LSP) was first introduced by Itakura (1975) as an alternative representation for the LPC spectral information. It was later found that LSPs have very desirable quantization and interpolation properties. Interpolation of LSPs at the frame boundaries results in a synthesized speech that is smoother and has higher quality. This paper compares the two methods of interpolation: 1) interpolating the LSPs derived from the LPC coefficients vs. 2) computing the LSPs from interpolated roots of the LPC polynomial. Experimental results obtained using four to six second long male and female sentences suggest that these two sets of LSPs are effectively equivalent.
In the G.729 speech coder, linear predictive coding (LPC) parameters are converted into line spectral frequencies (LSF) and then quantized. Spectral distortion, due to quantization error at various stages is calculate...
详细信息
In the G.729 speech coder, linear predictive coding (LPC) parameters are converted into line spectral frequencies (LSF) and then quantized. Spectral distortion, due to quantization error at various stages is calculated by using the log spectral distortion (LSD) measure.
暂无评论