In this study, we examined articulatory inversion using audiovisual information based on Gaussian mixture model (Gmm). In this method the joint distribution of the articulatory movement and audio (and/or visual) data ...
详细信息
Pulse Repetition Interval (PRI) tracking has particular interest and importance in Electronic Warfare. In this work, a specific part of PRI tracking, Sinusoidal PRI is studied. Extended Kalman Filter and Unscented Kal...
详细信息
This work examines the utility of formant frequencies and their energies in acoustic-to-articulatory inversion. For this purpose, formant frequencies and formant spectral amplitudes are automatically estimated from au...
详细信息
In this work we propose a method for accurate and reliable estimation of vocal tract resonance (VTR) frequencies. The proposed method is based on finding candidates of the VTR frequencies using LPC analysis and estima...
详细信息
This paper analyzes vocal tract resonance (VTR) frequency trajectories and their relationship to formants from a new point of view. Considering abrupt/continuous changes in the physical geometry of vocal tract, VTR ma...
详细信息
ISBN:
(纸本)9781604234497
This paper analyzes vocal tract resonance (VTR) frequency trajectories and their relationship to formants from a new point of view. Considering abrupt/continuous changes in the physical geometry of vocal tract, VTR may change in number, suddenly change their positions or may leak to some regions where they usually do not exist. We define the visible VTR (VVTR) as VTR that can be seen from the spectrogram. So we propose an algorithm, based on Kalman filtering, that can handle all these changes in VVTR. The suggested properties of VVTR trajectories and the performance of the algorithm are demonstrated on several examples.
This study is about tracking of formant frequencies using Kalman filtering. Assuming that the formant frequencies are changing in time slowly, it is possible to model their behaviors as outputs of a dynamic system and...
详细信息
In this paper, an efficient algorithm for pitch determination of speech signals is presented. Pitch period of speech signals does not contain sharp changes in time in voiced and voiced-unvoiced transition regions. Dep...
详细信息
In this paper, work on developing a Turkish microphone speech corpus at the middle East Technical University (mETU) is presented. Before collecting the audio corpus, sound properties of Turkish have been investigated ...
详细信息
In this paper, we present a dynamic programming approach to voice transformation (VT). The goal of VT is to modify the speech of a source speaker such that it is perceived as if spoken by a target speaker. The speech ...
详细信息
In this paper, we present a dynamic programming approach to voice transformation (VT). The goal of VT is to modify the speech of a source speaker such that it is perceived as if spoken by a target speaker. The speech model used in this work is based on mELP (mixed Excitation Linear Prediction) speech coding algorithm. The designed system obtains speaker-specific codebooks of line spectral frequencies (LSFs) out of mELP's multi-stage vector quantization LSF codebook for both source and target speakers. Those codebooks are used to train a mapping histogram, which is used for LSF transformation from one speaker to the other. The baseline system uses the maxima of the histograms for LSF transformations. The shortcomings of this system, which are the limitation of the target LSF space and the spectral discontinuities due to independent mapping of subsequent frames, have been overcome by applying the dynamic programming approach. Dynamic programming approach tries to model the long-term behaviour of the LSFs of the target speaker, while it is trying to preserve the relationship between the subsequent frames of the source LSFs, during transformation. Both objective and subjective evaluations have been conducted and it has been shown that dynamic programming approach improves the performance of the system in terms of both the speech quality and speaker similarity.
The goal of voice transformation (VT) is to modify the speech of a source speaker such that it is perceived as if spoken by a target speaker. In this paper, we present a speaker specific line spectral frequency (LSF) ...
详细信息
The goal of voice transformation (VT) is to modify the speech of a source speaker such that it is perceived as if spoken by a target speaker. In this paper, we present a speaker specific line spectral frequency (LSF) quantization based on principle component analysis (PCA) and k-means clustering for VT. An LPC based source-filter model is used to model the speech. Transformation is applied to the spectral characteristics of the speaker, while pitch scaling is applied on the residual signal. PCA has been used to determine the principle components of the source and target LSFs to obtain a more efficient quantization. Only the dimensions with high variance have been quantized and those dimensions have been used to obtain the histogrammatrix mapping the two speakers during training. To select the best target codeword sequence corresponding to a source codeword sequence in a sentence, a dynamic programming approach is used. Dynamic programming approach approximates the long-term behavior of LSFs of the target speaker, while it is trying to preserve the relationship between the subsequent frames of the source LSFs. Objective and subjective evaluations have shown that dimension reduction of LSFs before quantization and dynamic programming improves the voice transformation performance.
暂无评论