When estimated space-time covariance matrices from finite data, any intersections of ground truth eigenvalues will be obscured, and the exact eigenvalues become spectrally majorised with probability one. In this paper...
详细信息
A handset compensation technique for speaker verification from coded telephone speech is proposed. The proposed technique combines handset selectors with stochastic feature transformation to reduce the acoustic mismat...
详细信息
In telephone-based speaker identification, variation in handset characteristics can introduce severe speech variability even for speech uttered by the same speaker. This paper proposes a method to compensate the varia...
详细信息
In telephone-based speaker verification, the channel conditions can be varied significantly from sessions to sessions. Therefore, it is desirable to estimate the channel conditions online and compensate the acoustic d...
详细信息
In telephone-based speaker verification, the channel conditions can be varied significantly from sessions to sessions. Therefore, it is desirable to estimate the channel conditions online and compensate the acoustic distortion without prior knowledge of the channel characteristics. Because no a priori knowledge is used, the estimation accuracy depends greatly on the length of the verification utterances. This paper extends the Blind Stochastic Feature Transformation (BSFT) algorithm that we recently proposed to handle the short-utterance scenario. The idea is to estimate a set of prior transformation parameters from a development set in which a wide variety of channel conditions exists in the verification utterances. The prior transformations are then incorporated into the online estimation of the BSFT parameters in a Bayesian (maximum a posteriori) fashion. The resulting transformation parameters are therefore dependent on both the prior transformations and the verification utterances. For short (long) utterances, the prior transformations play a more (less) important role. We referred the extended algorithm to as Bayesian BSFT (BBSFT) and applied it to the 2001 NIST SRE task. Results show that Bayesian BSFT outperforms BSFT for utterances shorter than or equal to 4 seconds.
作者:
Chang, CIBrumbley, CIEEE
Remote Sensing Signal and Image Processing Laboratory Dept. of Computer Science and Electrical Engineering University of Maryland Baltimore County
Linear unmixing is a widely used remote sensing image processing technique for subpixel classification and detection where a scene pixel is generally modeled by a linear mixture of spectral signatures of materials pre...
详细信息
Linear unmixing is a widely used remote sensing image processing technique for subpixel classification and detection where a scene pixel is generally modeled by a linear mixture of spectral signatures of materials present within the pixel. tin approach, called linear unmixing Kalman filtering (LUKF), is presented which incorporates the concept of linear unmixing into Kalman filtering so as to achieve signature abundance estimation, subpixel detection and classification for remotely sensed images. Zn this case, the linear mixture model used in linear unmixing is implemented as the measurement equation in Kalman filtering. The state equation which is required for Kalman filtering but absent in linear unmixing is then used to model the signature abundance. By utilizing these two equations the proposed LUKF not only can detect abrupt change in various signature abundances within pixels, but also can detect and classify desired target signatures. The performance of effectiveness and robustness of the LUKF is demonstrated through simulated data and real scene images, Satellite Pour l'Observation de la Terra (SPOT) and Hyperspectral Digital Imagery Collection (HYDICE) data.
Feature transformation plays an important role in robust speaker verification over telephone networks. This paper compares several feature transformation techniques and evaluates their verification performance and com...
详细信息
An object-based video coding for video conferencing system is proposed. There are two main processes: segmentation process and face detection process. The segmentation process is used to segment each frame of a video ...
详细信息
Because of the differences in education background, accents, etc., different persons have their unique way of pronunciation. This paper exploits the pronunciation characteristics of speakers and proposes a new conditi...
详细信息
Because of the differences in education background, accents, etc., different persons have their unique way of pronunciation. This paper exploits the pronunciation characteristics of speakers and proposes a new conditional pronunciation modeling (CPM) technique for speaker verification. The proposed technique aims to establish a link between articulatory properties (e.g., manners and places of articulation) and phoneme sequences produced by a speaker. This is achieved by aligning two articulatory feature (AF) streams with a phoneme sequence determined by a phoneme recognizer, and formulating the probabilities of articulatory classes conditioned on the phonemes as speaker-dependent probabilistic models. The scores obtained from the AF-based pronunciation models are then fused with those obtained from a spectral-based speaker verification system, with the frame-by-frame fused scores weighted by the confidence of the pronunciation models. Evaluations based on the SPIDRE corpus demonstrate that AF-based CPM systems can recognize speakers even with short utterances and are readily combined with spectral-based systems to further enhance the reliability of speaker verification.
In speaker verification, a claimant may produce two or more utterances. In our previous study [1], we proposed to compute the optimal weights for fusing the scores of these utterances based on their score distribution...
详细信息
In speaker verification, a claimant may produce two or more utterances. In our previous study [1], we proposed to compute the optimal weights for fusing the scores of these utterances based on their score distribution and our prior knowledge about the score statistics estimated from the mean scores of the corresponding client speaker and some pseudo-impostors during enrollment. As the fusion weights depend on the prior scores, in this paper, we propose to adapt the prior scores during verification based on the likelihood of the claimant being an impostor. To this end, a pseudo-imposter GMM score model is created for each speaker. During verification, the claimant?s scores are fed to the score model to obtain a likelihood for adapting the prior score. Experimental results based on the GSM-transcoded speech of 150 speakers from the HTIMIT corpus demonstrate that the proposed prior score adaptation approach provides a relative error reduction of 15% when compared with our previous approach where the prior scores are non-adaptive.
This paper studies the use of profile alignment and support vector machines for subcellular localization. In the training phase, the profiles of all protein sequences in the training set are constructed by PSI-BLAST a...
详细信息
暂无评论