In the paper problems related to the classification of singing voice quality are presented. For this purpose a database consisting of singers' sample recordings is constructed and parameters are extracted from rec...
详细信息
In the paper problems related to the classification of singing voice quality are presented. For this purpose a database consisting of singers' sample recordings is constructed and parameters are extracted from recorded voice of trained and untrained singers. The parameterization process is based on both voice source and formant analysis of a singing voice. These parameters are explained as to their physical interpretation and analyzed statistically in order to diminish their number. The statistical analysis is based on the Fisher statistic. In such a way a feature vector of a singing voice is formed. Decision systems based on neutral networks and rough sets are utilized in the context of the voice type and voice quality classification. Results obtained in the automatic classification performed by both decision systems are compared. A possibility to classify automatically type/quality of voice is judged. The methodology proposed provides means for discerning trained and untrained singers.
A new approach to temporal decomposition (TD) of speech, called "spectral stability based event localizing temporal decomposition", abbreviated S/sup 2/ BEL-TD, is presented. The original method of TD propos...
详细信息
A new approach to temporal decomposition (TD) of speech, called "spectral stability based event localizing temporal decomposition", abbreviated S/sup 2/ BEL-TD, is presented. The original method of TD proposed by Atal (1983) is known to have the drawbacks of high computational cost, and the instability of the number and locations of events. In S/sup 2/ BEL-TD, the event localization is performed based on a maximum spectral stability criterion. This overcomes the instability problem of events of the Atal's method. Also, S/sup 2/ BEL-TD avoids the use of the computationally costly singular value decomposition routine used in the Atal's method, thus resulting in a computationally simpler algorithm of TD. Simulation results show that an average spectral distortion of about 1.5 dB can be achieved with LSF as the spectral parameter. Also, we have shown that the temporal pattern of the speech excitation parameters can also be well described using the S/sup 2/ BEL-TD technique.
Several pre-processing algorithms modify the residual speech signal to facilitate efficient estimation of speech model parameters. This, however, can result in misalignment between the modified residual signal and the...
详细信息
Several pre-processing algorithms modify the residual speech signal to facilitate efficient estimation of speech model parameters. This, however, can result in misalignment between the modified residual signal and the time-variant linear prediction (LP) filter used during the synthesis stage. The resulting misalignment may cause audible artifacts particularly at onsets when the frequency response of successive LP filters changes rapidly. We propose a new solution to control the LP filter gain at subframes. This technique is performed before and after time modification of speech and therefore is called preanalysis and post-processing. A pitch smoothing technique is used to illustrate the effect of the proposed technique
Line spectrum pair (LSP) representation of linear predictive coding (LPC) parameters is widely used in speech coding applications. An efficient method for LPC to LSP conversion is Kabal's method. In this method th...
详细信息
Line spectrum pair (LSP) representation of linear predictive coding (LPC) parameters is widely used in speech coding applications. An efficient method for LPC to LSP conversion is Kabal's method. In this method the LSPs are the roots of two polynomials P'/sub p/(x) and Q'/sub p/(x), and are found by a zero crossing search followed by successive bisections and interpolation. The precision of the obtained LSPs is higher than required by most applications, but the number of bisections cannot be decreased without compromising the zero crossing search. In this paper, it is shown that, in the case of 10th-order LPC, five intervals containing each only one zero crossing of P'/sub 10/(x) and one zero crossing of Q'/sub 10/(x) can be calculated, avoiding the zero crossing search. This allows a trade-off between LSP precision and computational complexity resulting in considerable computational saving.
Source coding and encryption are linked theoretically by the aim of removing redundancy. However, so far, no attempt has been made to combine source coding and encryption together, except in lossless compression model...
详细信息
Source coding and encryption are linked theoretically by the aim of removing redundancy. However, so far, no attempt has been made to combine source coding and encryption together, except in lossless compression models. When applied to speech coding, this appears to be a novel idea for research which provides the potential for developing new designs and implementations. The authors have considered the effect of combined encoding and encryption on analysis-by-synthesis (AbS) LPC based techniques, a class of time domain speech compression algorithms which are widely used nowadays in commercial as well as military communication applications. A novel pre-processing speech scrambling algorithm (PSSA) is proposed, which given speech, produces a scrambled signal with speech like characteristics. The resulting signal can be then compressed by a low bit rate speech codec.< >
In this paper, a new scheme for the estimation of formant frequencies of noise-corrupted speech signals is presented. A once-repeated autocorrelation function (ORACF) of the observed noisy speech signal is proposed to...
详细信息
In this paper, a new scheme for the estimation of formant frequencies of noise-corrupted speech signals is presented. A once-repeated autocorrelation function (ORACF) of the observed noisy speech signal is proposed to employ in a linearpredictive based formant estimation method. It has been shown that the ORACF is capable of reducing the effect of additive noise significantly and if, instead of conventional ACF, ORACF is used in a modified form of least-squares Yule-Walker equations, a better performance in the formant estimation is achieved. Moreover, a frequency-domain algorithm is incorporated in the proposed scheme to avoid the possible estimation error in extracting a formant with low energy. The proposed algorithm has been tested on synthetic and natural vowels as well as some naturally spoken sentences in the presence of additive noise. The experimental results demonstrate a better performance obtained by the proposed scheme in comparison to some of the existing methods at low levels of signal-to-noise ratio (SNR).
A two-sided linear prediction (TSLP) model is shown to have high prediction gain over the conventional linear prediction (LPC) model [David and Ramamurthi, 1991], while it requires fewer coefficients in modeling. Unfo...
详细信息
A two-sided linear prediction (TSLP) model is shown to have high prediction gain over the conventional linear prediction (LPC) model [David and Ramamurthi, 1991], while it requires fewer coefficients in modeling. Unfortunately, speech synthesis cannot use the TSLP model directly because it needs future samples which are not available in the process. Autoregressive spectral matching (ARSM) is proposed to render the TSLP model suitable for speech synthesis. Vector sum excitation method is used to generate the excitation to the new model and its performance is comparable to the standard VSELP.< >
Frame predictive vector quantization is developed to compress the bit rate for coding the LPC filter coefficients to under 250 bits/sec. An innovative LPC compression technique, matrix quantization, is also developed ...
详细信息
Frame predictive vector quantization is developed to compress the bit rate for coding the LPC filter coefficients to under 250 bits/sec. An innovative LPC compression technique, matrix quantization, is also developed to compress the LPC filter coefficients to a rate under 150 bits/sec. Subjective evaluation with the diagnostic rhyme test (DRT) finds the proposed techniques to be feasible for intelligible speech transmission at bit rates between 400 bits/sec and 200 bits/sec.
Peterson, Wang, and Sivertsen[1] suggested the use of the units called "dyads" as the basic unit for speech synthesis. This paper describes an approach to speech synthesis by rule which uses a unit that is s...
详细信息
Peterson, Wang, and Sivertsen[1] suggested the use of the units called "dyads" as the basic unit for speech synthesis. This paper describes an approach to speech synthesis by rule which uses a unit that is similar but smaller than the dyad as defined by Peterson et al. This new unit specifies only the transition between the two phones of the dyad, while the "steady state" portions are obtained by connecting with straight lines the end points of adjacent transitions. Further simplifications of the dyadic concept include a reduced collection of dyadic transitions, and the storage of only the end points of the dyadic transitions: the transitions themselves are then obtained by interpolation between these end points. This paper describes a complete rule synthesis scheme which uses these simplified dyads in combination with a word pronouncing dictionary and suitable prosodic rules.
暂无评论