Voice biometrics is an economic method of person authentication with the help of machines because of low cost and high power computers. In this paper, we investigate the problem of spectral resolution in female speech...
详细信息
Voice biometrics is an economic method of person authentication with the help of machines because of low cost and high power computers. In this paper, we investigate the problem of spectral resolution in female speech for speaker identification. Finally, a speaker recognition system is presented to compare the relative performance of different LP-based features such as (LPC and LPCC) and filterbank-based features such as Mel-frequency cepstral coefficients (MFCC) for identification of female speakers. The results are shown for database collected from 15 female speakers in Bengali.
In this paper, a new scheme for the estimation of formant frequencies of noise-corrupted speech signals is presented. A once-repeated autocorrelation function (ORACF) of the observed noisy speech signal is proposed to...
详细信息
In this paper, a new scheme for the estimation of formant frequencies of noise-corrupted speech signals is presented. A once-repeated autocorrelation function (ORACF) of the observed noisy speech signal is proposed to employ in a linearpredictive based formant estimation method. It has been shown that the ORACF is capable of reducing the effect of additive noise significantly and if, instead of conventional ACF, ORACF is used in a modified form of least-squares Yule-Walker equations, a better performance in the formant estimation is achieved. Moreover, a frequency-domain algorithm is incorporated in the proposed scheme to avoid the possible estimation error in extracting a formant with low energy. The proposed algorithm has been tested on synthetic and natural vowels as well as some naturally spoken sentences in the presence of additive noise. The experimental results demonstrate a better performance obtained by the proposed scheme in comparison to some of the existing methods at low levels of signal-to-noise ratio (SNR).
Source coding and encryption are linked theoretically by the aim of removing redundancy. However, so far, no attempt has been made to combine source coding and encryption together, except in lossless compression model...
详细信息
Source coding and encryption are linked theoretically by the aim of removing redundancy. However, so far, no attempt has been made to combine source coding and encryption together, except in lossless compression models. When applied to speech coding, this appears to be a novel idea for research which provides the potential for developing new designs and implementations. The authors have considered the effect of combined encoding and encryption on analysis-by-synthesis (AbS) LPC based techniques, a class of time domain speech compression algorithms which are widely used nowadays in commercial as well as military communication applications. A novel pre-processing speech scrambling algorithm (PSSA) is proposed, which given speech, produces a scrambled signal with speech like characteristics. The resulting signal can be then compressed by a low bit rate speech codec.< >
An excitation modeling based on a multitap adaptive or fixed codebook is proposed for low-delay encoding of speech signals. The algorithm can produce high-quality speech at bit rates of 8 kbit/s, but it is still lacki...
详细信息
An excitation modeling based on a multitap adaptive or fixed codebook is proposed for low-delay encoding of speech signals. The algorithm can produce high-quality speech at bit rates of 8 kbit/s, but it is still lacking the toll quality of LD-CELP (low-delay code excited linear prediction) at 16 kbit/s. A simplified vector quantization search procedure was incorporated to reduce the complexity with little subjective degradation. The performance results indicated a need for several types of efficient excitation modeling, depending on the nature of the input signal.< >
Frame predictive vector quantization is developed to compress the bit rate for coding the LPC filter coefficients to under 250 bits/sec. An innovative LPC compression technique, matrix quantization, is also developed ...
详细信息
Frame predictive vector quantization is developed to compress the bit rate for coding the LPC filter coefficients to under 250 bits/sec. An innovative LPC compression technique, matrix quantization, is also developed to compress the LPC filter coefficients to a rate under 150 bits/sec. Subjective evaluation with the diagnostic rhyme test (DRT) finds the proposed techniques to be feasible for intelligible speech transmission at bit rates between 400 bits/sec and 200 bits/sec.
Peterson, Wang, and Sivertsen[1] suggested the use of the units called "dyads" as the basic unit for speech synthesis. This paper describes an approach to speech synthesis by rule which uses a unit that is s...
详细信息
Peterson, Wang, and Sivertsen[1] suggested the use of the units called "dyads" as the basic unit for speech synthesis. This paper describes an approach to speech synthesis by rule which uses a unit that is similar but smaller than the dyad as defined by Peterson et al. This new unit specifies only the transition between the two phones of the dyad, while the "steady state" portions are obtained by connecting with straight lines the end points of adjacent transitions. Further simplifications of the dyadic concept include a reduced collection of dyadic transitions, and the storage of only the end points of the dyadic transitions: the transitions themselves are then obtained by interpolation between these end points. This paper describes a complete rule synthesis scheme which uses these simplified dyads in combination with a word pronouncing dictionary and suitable prosodic rules.
The line spectrum pairs (LSP) provide an efficient representation of the synthesis filter used in linear predictive coding of speech. In this paper, an attempt to find the best distance measure for vector quantization...
详细信息
The line spectrum pairs (LSP) provide an efficient representation of the synthesis filter used in linear predictive coding of speech. In this paper, an attempt to find the best distance measure for vector quantization is carried out, in the first place, by making objective studies over the same training sequence. Lastly, fast VQ algorithms of the LSP parameters are compared in terms of complexity, using the Euclidean distance measure. The well-known ordering property of LSP parameters is exploited to improve the efficiency of minimum distortion encoder for VQ in terms of norm associated to its distance. As conventional full search is too complex for practical implementation, the originality of this work consists in using the norm to limit the size of the area which contains the nearest neighbor of an input vector to he quantized. This method results in a substantial reduction in search complexity with only a minor degradation in terms of average spectral distortion.< >
This paper contributes insight into the sources of variability in vowel formant estimation, a major analytic activity in sociophonetics, by reviewing the outcomes of two simulations that manipulated the settings used ...
详细信息
This paper contributes insight into the sources of variability in vowel formant estimation, a major analytic activity in sociophonetics, by reviewing the outcomes of two simulations that manipulated the settings used for linear predictive coding (LPC)-based vowel formant estimation. Simulation 1 explores the range of frequency differences obtained when minor adjustments are made to LPC settings, and measurement time points around the settings used by trained analysts, in order to determine the range of variability that should be expected in sociophonetic vowel studies. Simulation 2 examines the variability that emerges when LPC settings are varied combinatorially around constant default settings, rather than settings set by trained analysts. The impacts of different LPC settings are discussed as a way of demonstrating the inherent properties of LPC-based formant estimation. This work suggests that differences more fine-grained than about 10 Hz in F1 and 15-20 Hz in F2 are within the range of LPC-based formant estimation variability.
In conventional automatic speech recognition systems, linguistic information of the speech signal are usually acquired from short-time frames about 10-30 ms. In this paper we have proposed two novel methods extracting...
详细信息
ISBN:
(纸本)9781424481835
In conventional automatic speech recognition systems, linguistic information of the speech signal are usually acquired from short-time frames about 10-30 ms. In this paper we have proposed two novel methods extracting the long-term information of the speech signal. Both of the methods are based on "sub-band FDLP" which divides the long-time frame of signal into several sub-bands. Using the MFCC algorithm, we are able to represent the long-term temporal features of the each sub-band. Our results show that the proposed methods could improve the recognition ratio by %1.73. The proposed methods were evaluated using the FarsDat database and the method's robustness against different conditions of noise was experimented.
The current front-end for distributed speech recognition (DSR) systems provided by European Telecommunications Standards Institute (ETSI) is mainly based on the state-of-the-art MFCC features. The method proposed in t...
详细信息
The current front-end for distributed speech recognition (DSR) systems provided by European Telecommunications Standards Institute (ETSI) is mainly based on the state-of-the-art MFCC features. The method proposed in this paper aims to improve the performance of the present ETSI DSR-XAFE (XAFE: extended Audio Front-End). For this purpose two sets of acoustical features namely formant-like features and MFCC features are integrated under the multi-stream framework to form a feature vector which is more robust against additive noise. It is shown that for noisy speech, combining cepstral coefficients with main spectral peaks also known as formant-like features, using the multi-stream framework, leads to significant improvement in word recognition accuracy relative to word accuracy obtained for MFCCs alone.
暂无评论