In this paper, we present the pattern recognition methods proposed to classify respiratory sounds into normal and wheeze classes. We evaluate and compare the feature extraction techniques based on Fourier transform, l...
详细信息
In this paper, we present the pattern recognition methods proposed to classify respiratory sounds into normal and wheeze classes. We evaluate and compare the feature extraction techniques based on Fourier transform, linear predictive coding, wavelet transform and Mel-frequency cepstral coefficients (MFCC) in combination with the classification methods based on vector quantization, Gaussian mixture models (GMM) and artificial neural networks, using receiver operating characteristic curves. We propose the use of an optimized threshold to discriminate the wheezing class from the normal one. Also, post-processing filter is employed to considerably improve the classification accuracy. Experimental results show that our approach based on MFCC coefficients combined to GMM is well adapted to classify respiratory sounds in normal and wheeze classes. McNemar's test demonstrated significant difference between results obtained by the presented classifiers (p < 0.05). (C) 2009 Elsevier Ltd. All rights reserved.
This paper presents a novel digital data modulation and demodulation algorithm ARDMA based on the principles of autoregressive modeling (AR) of speech production. In the first step a sustained voiced speech signal cha...
详细信息
This paper presents a novel digital data modulation and demodulation algorithm ARDMA based on the principles of autoregressive modeling (AR) of speech production. In the first step a sustained voiced speech signal characteristics are analyzed using autoregressive modeling principle and then the two sets of linear prediction (LPC) coefficients are obtained and converted to linear spectrum frequencies (LSF). The input binary data stream drives the selection mechanism of LSF coefficients which are then applied as filter coefficients of the modulation signal synthesis filter. This filter is excited with specially designed excitation signal which corresponds to the basic characteristics of typical excitation signal of human vocal tract. Finally, a speech-alike modulation signal is produced. This modulation signal is then sent through the voice channel of the GSM system. The demodulator analyzes the incoming modulation signal using autoregressive modeling. The most likely LSF vector which modulated the particular symbol was determined by the demodulation process and converted to the respective string of binary data. The performance of proposed modulation scheme was compared to the regular frequency shift keying method (FSK). The performance improvement of ARDMA against FSK is observed at higher bit-rates in the case of three compared GSM speech coders. (c) 2008 Elsevier Inc. All rights reserved.
In this work, accurate spectral envelope estimation is applied to Voice Conversion in order to achieve High-Quality timbre conversion. True-Envelope based estimators allow model order selection leading to an adaptatio...
详细信息
ISBN:
(纸本)9781424423538
In this work, accurate spectral envelope estimation is applied to Voice Conversion in order to achieve High-Quality timbre conversion. True-Envelope based estimators allow model order selection leading to an adaptation of the spectral features to the characteristics of the speaker. Optimal residual signals can also be computed following a local adaptation of the model order in terms of the F-0. A new perceptual criteria is proposed to measure the impact of the spectral conversion error. The proposed envelope models show improved spectral conversion performance as well as increased converted-speech quality when compared to linear Prediction.
We applied and compared two supervised pattern recognition techniques, namely the Multilayer Perceptron (MLP) and Support Vector Machine (SVM), to classify seismic signals recorded on Stromboli volcano. The available ...
详细信息
ISBN:
(纸本)9781607500728
We applied and compared two supervised pattern recognition techniques, namely the Multilayer Perceptron (MLP) and Support Vector Machine (SVM), to classify seismic signals recorded on Stromboli volcano. The available data are firstly preprocessed in order to obtain a compact representation of the raw seismic signals. We extract from data spectral and temporal information so that each input vector is made up of 71 components, containing both spectral and temporal information extracted from the early signal. We implemented two classification strategies to discriminate three different seismic events: landslide, explosion-quake, and volcanic microtremor signals. The first method is a two-layer MLP network, with a Cross-Entropy error function and logistic activation function for the output units. The second method is a Support Vector Machine, whose multi-class setting is accomplished through a 1vsAll architecture with gaussian kernel. The experiments show that although the MLP produces very good results, the SVM accuracy is always higher, both in term of best performance, 99.5%, and average performance, 98.8%, obtained with different sampling permutations of training and test sets.
This paper presents a blind watermark detection scheme for additive watermark embedding model. The proposed estimation-correlation-based watermark detector first estimates the embedded watermark by exploiting non-Gaus...
详细信息
ISBN:
(纸本)9783642005985
This paper presents a blind watermark detection scheme for additive watermark embedding model. The proposed estimation-correlation-based watermark detector first estimates the embedded watermark by exploiting non-Gaussian of the real-world audio signal and the mutual independence between the host-signal and the embedded watermark and then a correlation-based detector is used to determine the presence or the absence of the watermark. For watermark estimation, blind Source separation (BSS) based on underdetermined independent component analysis (UICA) is used. Low watermark-to-signal ratio (WSR) is one to the limitations of blind detection for additive embedding model. The proposed detector uses two-stage processing to improve WSR at the blind detector;first stage removes the audio spectrum from the watermarked audio signal using linearpredictive (LP) filtering and the second stage uses resulting residue from the LP filtering stage to estimate the embedded watermark using BSS based on UICA. Simulation results show that the proposed detector performs significantly better than existing estimation-correlation-based detection schemes.
This paper compares two prediction structures for predictive perceptual audio coding in the context of the Ultra Low Delay (ULD) coding scheme. One structure is based on the commonly used AR signal model, leading to a...
详细信息
ISBN:
(纸本)9781424423538
This paper compares two prediction structures for predictive perceptual audio coding in the context of the Ultra Low Delay (ULD) coding scheme. One structure is based on the commonly used AR signal model, leading to an IIR predictor in the decoder. The other structure is based on an MA signal model, leading to an FIR predictor in the decoder. We find that the AR-based predictor has a slightly better performance in case of an undisturbed transmission channel, but the MA-based predictor has a much better performance in case of transmission errors. For a Bit Error Rate (BER) of 1.0e-5, the perceptual quality of the proposed MA model predictor achieves a mean Objective Difference Grade (ODG) of -0.66 ODG whereas the AR. model predictor only reaches -3.42 ODG.
This paper describes polynomial kernel subspace approach to Isolated Word Recognition (IWR) systems. linear predictive coding (LPC) coefficients derived from wavelet sub-bands of speech frame were used as features. Th...
详细信息
ISBN:
(纸本)9781424451043
This paper describes polynomial kernel subspace approach to Isolated Word Recognition (IWR) systems. linear predictive coding (LPC) coefficients derived from wavelet sub-bands of speech frame were used as features. This approach represents mapping of speech features (input space) into a feature space via a non-linear mapping onto the principal components called Kernel linear Discriminant Analysis (KLDA). The non-linear mapping between the input space and the feature space is implicitly performed using the kernel-trick. This nonlinear mapping using KLDA increases the discrimination ability of a pattern classifier. The use of Wavelet sub-band based LPC features (WLPC) provide low dimensional features which reduce the memory requirement and KLDA provides the fast classification and recognition. Experimental results obtained on isolated word database show that the proposed technique is computationally efficient and performs well with less training data.
In low bit-rate coders, the near-sample and far-sample redundancies of the speech signal are usually removed by a cascade of a short-term and a long-term linear predictor. These two predictors are usually found in a s...
详细信息
ISBN:
(纸本)9781424423538
In low bit-rate coders, the near-sample and far-sample redundancies of the speech signal are usually removed by a cascade of a short-term and a long-term linear predictor. These two predictors are usually found in a sequential and therefore suboptimal approach. In this paper we propose an analysis model that jointly finds the two predictors by adding a regularization term in the minimization process to impose sparsity constraints on a high order predictor. The result is a linear predictor that can be easily factorized into the short-term and long-term predictors. This estimation method is then incorporated into an Algebraic Code Excited linear Prediction scheme and shows to have a better performance than traditional cascade methods and other joint optimization methods, offering lower distortion and higher perceptual speech quality.
作者:
Merouane, BouzidUSTHB
Elect Fac Speech Commun & Signal Proc Lab Algiers 16111 Algeria
In this paper, an optimized trellis coded vector quantization (OTCVQ) system designed for efficient and robust coding of LSF spectral parameters is presented. The aim of this system, called at the beginning "LSF-...
详细信息
ISBN:
(纸本)9781424444564
In this paper, an optimized trellis coded vector quantization (OTCVQ) system designed for efficient and robust coding of LSF spectral parameters is presented. The aim of this system, called at the beginning "LSF-OTCVQ Encoder", is to achieve a low bit rate transparent quantization of the FS1016 LSF parameters. Once the effectiveness of the LSF-OTCVQ encoder was proven in the case of ideal transmissions over noiseless channel, we were interested after in the improvement of its robustness for real transmissions over noisy channel. To protect implicitly the transmission indices of the LSF-OTCVQ encoder incorporated in the FS1016, we used a joint source-channel coding carried out by the channel optimized vector quantization.
In recent studies the Unscented Kalman Filter (UKF) was applied to some nonlinear systems. Several speech processing problems like the estimation of the formant trajectories, the state and parameter Kalman estimation ...
详细信息
ISBN:
(纸本)9781424443451
In recent studies the Unscented Kalman Filter (UKF) was applied to some nonlinear systems. Several speech processing problems like the estimation of the formant trajectories, the state and parameter Kalman estimation for speech enhancement and the estimation of Line Spectral Frequency (LSF) trajectories. In this paper we apply the UKF to the estimation of LSF trajectories, in the case of synthetic and real noisy speech. The Expectation Maximization (EM) approach is used to iteratively estimate the LSF parameters. Furthermore, the Square-Root implementation of the UKF is used as it provides numeric stability and guarantees positive semi-definiteness of the state covariance.
暂无评论