We propose a system that is capable of improving intelligibility of speech by those with hearing impairment. We have found the speech often had problems in the intonation, the duration of the unvoiced consonants, and ...
详细信息
We propose a system that is capable of improving intelligibility of speech by those with hearing impairment. We have found the speech often had problems in the intonation, the duration of the unvoiced consonants, and the tone of the voiced phonemes. The system systematically compensates problematic components of the speech using the counterparts in normal speech. It corrects the intonation using TD-PSOLA and elongates consonants by repeating the original waveform using inverting technique for making the result continuously connected until the duration reaches a threshold. Experimental results show that the proposed method successfully improves the intelligibility of the speech from 28% to 35% by making phonemes more articulated, and making double consonants, dasianpsila of syllabic consonant, and unvoiced consonants perceived clearer.
In this paper a new feature extraction methods, which utilize reduced order linear predictive coding (LPC) coefficients for speech recognition, have been proposed. The coefficients have been derived from the speech fr...
详细信息
In this paper a new feature extraction methods, which utilize reduced order linear predictive coding (LPC) coefficients for speech recognition, have been proposed. The coefficients have been derived from the speech frames decomposed using Discrete Wavelet Transform (DWT). In the literature it is assumed that the speech frame of size 10 msec to 30 msec is stationary, however, in practice different parts of the speech signal may convey different amount of information (hence may not be perfectly stationary). LPC coefficients derived from subband decomposition of speech frame provide better representation than modeling the frame directly. Experimentally it has been shown that, the proposed approaches provide effective (better recognition rate) and efficient (reduced feature vector dimension) features. The speech recognition system using the continuous Hidden Markov Model (HMM) has been implemented. The proposed algorithms are evaluated using NIST TI-46 isolated-word database.
In this paper, we describe several important experiments concerning large vocabulary distributed continuous speech recognition (LVDCSR) systems in Brazilian Portuguese using LSF and LPC - derived features. The ITU-T G...
详细信息
In this paper, we describe several important experiments concerning large vocabulary distributed continuous speech recognition (LVDCSR) systems in Brazilian Portuguese using LSF and LPC - derived features. The ITU-T G.723.1 codec is employed and investigated as a case of practical use of this technology. Results are presented for both speaker dependent and independent modes as well as the situations where the same text or different texts were used for training and testing.
We propose a linear predictive coding technique for multichannel electromyographic (EMG) recordings. The signals are acquired using two-dimensional grid of electrodes which generate strongly correlated signals. Previo...
详细信息
We propose a linear predictive coding technique for multichannel electromyographic (EMG) recordings. The signals are acquired using two-dimensional grid of electrodes which generate strongly correlated signals. Previous work only considered spectral redundancy across the signal matrix. In this paper we exploit the correlation present in the residual signals, i.e., the signals after the short term prediction. The proposed technique achieves a compression ratio of about 1divide9, i.e., slightly better than spectral-only decorrelation methods, but with a strong increase of approximately 3.2 dB SNR in the quality of the reconstructed waveform.
With the widely applications of mobile communication networks, end-to-end secure voice has become a very important research issue and has attracted many people¿s interests. A simple and effective method for end-t...
详细信息
With the widely applications of mobile communication networks, end-to-end secure voice has become a very important research issue and has attracted many people¿s interests. A simple and effective method for end-to-end secure communication is to encrypt speech signal at user ends. But the use of encryption techniques leads to a hard problem that encrypted speech data can not be directly transmitted on mobile voice channel. In this paper, we propose a scheme for encrypted speech transmitting on GSM voice channel. Not only does this scheme offer a good method for encrypted speech accessing mobile voice channel by using a proposed special modem called smodem based on linear predictive coding (LPC) technique, but also it can ensure the encrypted data transmitting through GSM networks with sufficient accuracy so that the received data can be decrypted correctly at the receiver. Moreover, the proposed scheme has advantages of no change to the existing clear voice communication procedure, short time of call establishment and propagation delay, good interoperability over heterogeneous networks, and favorableness to encryption information hidden. The results of experiment show that the proposed scheme is feasible.
Voice centric interfaces are widely available in modern mobile phones, including low-cost versions. The applications have evolved from speaker-dependent name dialing, which require user enrollment of frequently dialed...
详细信息
Voice centric interfaces are widely available in modern mobile phones, including low-cost versions. The applications have evolved from speaker-dependent name dialing, which require user enrollment of frequently dialed names, to speaker-independent capabilities including continuous digit dialing, command and control of phone functions, and name dialing directly from the phone's contacts directory. Recently available advances include capabilities like voice-enabled SMS, e-mail, and even mobile search with voice. This evolution has been enabled by advances in speech recognition robustness, network capabilities, and increased computational power in small devices. Systems may now be used in hands-busy/eyes-busy conditions including speakerphone and bluetooth scenarios. In this paper, we will provide an overview of embedded speech recognition centric applications in mobile phones, specifically focusing on current status, industry trends, and challenges in customer acceptance. Although voice interfaces are natural and attractive in theory, a majority of users do not use the voice-enabled features available in their mobile phones. We will discuss some of the reasons for this user behavior and recommend actions to be taken.
We propose a novel feature set for speaker recognition that is based on the voice source signal. The feature extraction process uses closed-phase LPC analysis to estimate the vocal tract transfer function. The LPC spe...
详细信息
We propose a novel feature set for speaker recognition that is based on the voice source signal. The feature extraction process uses closed-phase LPC analysis to estimate the vocal tract transfer function. The LPC spectrum envelope is converted to cepstrum coefficients which are used to derive the voice source features. Unlike approaches based on inverse-filtering, our procedure is robust to LPC analysis errors and low-frequency phase distortion. We have performed text-independent closed-set speaker identification experiments on the TIMIT and the YOHO databases using a standard Gaussian mixture model technique. Compared to using mel- frequency cepstrum coefficients, the misclassification rate for the TIMIT database reduced from 1.51% to 0.16% when combined with the proposed voice source features. For the YOHO database the mis- classification rate decreased from 13.79% to 10.07%. The new feature vector also compares favourably to other proposed voice source feature sets.
Over sized particles in spent slurry are analyzed to examine particle agglomeration during chemical mechanical polishing process. In tungsten (W) polishing, due to greater chance of collision for large particles, high...
详细信息
ISBN:
(纸本)9781424421855
Over sized particles in spent slurry are analyzed to examine particle agglomeration during chemical mechanical polishing process. In tungsten (W) polishing, due to greater chance of collision for large particles, high abrasive content slurries yield less oversized particles during polishing. However, in oxide wafer polishing, fumed silica follows the similar agglomerating behavior as in tungsten polishing, while colloidal silica shows a reversed trend. Defectivity measurements on oxide wafer confirm the correlation between spent slurry large particle count (LPC) and defects number on wafers, which means agglomerates produced during polishing process may cause defects in CMP.
Support Vector Machine (SVM) is one of the state-of-the-art tools for linear and nonlinear pattern classification. One of the design issues in SVM classifier is reducing the number of support vectors without compromis...
详细信息
Support Vector Machine (SVM) is one of the state-of-the-art tools for linear and nonlinear pattern classification. One of the design issues in SVM classifier is reducing the number of support vectors without compromising the classification accuracy. In this paper, a novel technique known as Diminishing Learning (DL) is proposed for an SVM based multi-class pattern recognition system. In this technique, a sequential classifier is proposed wherein the classes which require stringent boundaries are tested one by one and once the tests for these classes fail, the stringency of the classifier is increasingly relaxed. The effect of, the sequence in which the classes are trained and tested, on the recognition accuracy is also studied in this paper. The proposed technique is applied for SVM based isolated digit recognition system and is studied using speaker dependent TI46 database of isolated digits. Two feature extraction techniques, one using LPC and another using MFCC are applied to the speech from the above database and the features are mapped using SOFM. This in turn is used by the SVM classifier to evaluate the recognition accuracy with and without DL technique. Based on this study, it is found that the use of diminishing learning reduces the number of support vectors by 35.5% and 39.5% respectively for SVM classifier with LPC and MFCC feature inputs. Recognition accuracies of 96% and 97% are achieved for SVM classifier with and without DL technique for LPC feature inputs respectively. Recognition accuracy of 100% is achieved for SVM with and without DL technique for MFCC feature inputs. The study confirms the effect of, the order in which the classes are trained and tested, on the recognition accuracy and for the TI46 database, about 7% increase in recognition accuracy is obtained by choosing the optimum order.
This paper proposes an efficient method for estimating frame energy of speech from enhanced variable rate coder (EVRC) bitstream for network-based speech processing applications in transcoder free operation (TrFO) env...
详细信息
This paper proposes an efficient method for estimating frame energy of speech from enhanced variable rate coder (EVRC) bitstream for network-based speech processing applications in transcoder free operation (TrFO) environments, where speech signals are represented as speech coding parameters. A frame of speech energy is decomposed into the energy of excitation and vocal tract filter, and the frame energy estimation method is derived for each component. Among many parameters of EVRC bitstream, the fixed codebook gain and adaptive codebook gain are used for the estimation of excitation energy, and line spectrum pair (LSP) information is used to estimate the energy of vocal tract filter. Experimental results demonstrated the novelty of the proposed method. The correlation coefficient between the actual and estimated frame energy can be maintained at a value of 0.994 with just 5% multiplicative operations of full decoding.
暂无评论