检索结果-内蒙古大学图书馆

21st IEEE International Workshop on Machine Learning for Signal Processing (MLSP)

作者： Ma, Yong Bao, Chang-chun Liu, Jia Beijing Univ Technol Sch Elect Informat & Control Engn Speech & Audio Signal Proc Lab Beijing 100124 Peoples R China Tsinghua Univ Dept Elect Engn Natl Tsing Lab Informat Sci & Technol Beijing 100084 Peoples R China

ISBN: (纸本)9781457716232

Efficient speaker segmentation and clustering method based on the improved spectral clustering is proposed in this paper. Traditional speaker segmentation and clustering is performed by the hierarchical clustering algorithms with Bayesian information criterion (BIC) metric and cross likelihood ratio (CLR) metric after the speakers are segmented. Since this method has high computational complexity and may result in a suboptimal solution, we use spectral clustering to overcome this problem and improve the performance of clustering algorithm. First the affinity matrix is constructed with the mean supervector feature transformed by KL kernel mapping. And then the scaling parameter is selected adaptively. The experiments performed on the NIST 1998 multi-speaker corpus show that the proposed method outperforms the baseline system.

关键词： speaker segmentation and clustering Bayesian information criterion Spectral clustering

来源：评论

学校读者我要写书评

暂无评论

speaker diarization of French broadcast news

Speaker diarization of French broadcast news

引用

33rd IEEE International Conference on Acoustics, Speech and Signal Processing

作者： Gupta, Vishwa Boulianne, Gilles Kenny, Patrick Ouellet, Pierre Dumouchel, Pierre Ctr Rech Informat Montreal Montreal PQ H3T 1P1 Canada

ISBN: (纸本)9781424414833

We report results on speaker diarization of French broadcast news and talk shows on current affairs. This speaker diarization process is a multistage segmentation and clustering system. One of the stages is agglomerative clustering using state-of-the-art speaker identification methods (SID). For the GMMs used in this stage, we tried many different feature parameters, including MFCCs, Gaussianized MFCCs, Gaussianized MFCCs with cepstral mean subtraction, and Gaussianized MFCCs with cepstral mean substraction containing only frames with high energy. We found that this last set of feature parameters gave the best results. Compared to Gaussianized MfCCs, these features reduced the diarization error rate (DER) by 12% on a development set and by 19% on a test set. We also combined clusters resulting from Gaussianized and non-Gaussianized feature sets. This cluster combination resulted in another 4% reduction in DER for both the development and the test sets. The best DER we have achieved is 15.4% on the development set, and 14.5% on the test set.

关键词： speaker diarization speaker segmentation and clustering BIC clustering SID clustering

来源：评论

学校读者我要写书评

暂无评论

Acoustic beamforming for speaker diarization of meetings

引用

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 2007年第7期15卷 2011-2022页

作者： Anguera, Xavier Wooters, Chuck Hernando, Javier Telefon ID Madrid 28043 Spain Univ Politecn Cataluna E-08028 Barcelona Spain

When performing speaker diarization on recordings from meetings, multiple microphones of different qualities are usually available and distributed around the meeting room. Although several approaches have been proposed in recent years to take advantage of multiple microphones, they are either too computationally expensive and not easily scalable or they cannot outperform the simpler case of using the best single microphone. In this paper, the use of classic acoustic beamforming techniques is proposed together with several novel algorithms to create a complete frontend for speaker diarization in the meeting room domain. New techniques we are presenting include blind reference-channel selection, two-step time delay of arrival (TDOA) Viterbi postprocessing, and a dynamic output signal weighting algorithm, together with using such TDOA values in the diarization to complement the acoustic information. Tests on speaker diarization show a 25% relative improvement on the test set compared to using a single most centrally located microphone. Additional experimental results show improvements using these techniques in a speech recognition task.

关键词： acoustic beamforming meeting processing speaker diarization speaker segmentation and clustering

来源：评论

学校读者我要写书评

暂无评论

Combining gaussianized/non-gaussianized features to improve speaker diarization of telephone conversations

引用

IEEE SIGNAL PROCESSING LETTERS 2007年第12期14卷 1040-1043页

作者： Gupta, Vishwa Kenny, Patrick Ouellet, Pierre Boulianne, Gilles Dumouchel, Pierre Ctr Rech Informat Montreal Montreal PQ H3A 1B9 Canada

We report results on speaker diarization of telephone conversations. This speaker diarization process is similar to the multistage segmentation and clustering system used in broadcast news. It consists of an initial acoustic change point detection algorithm, iterative Viterbi re-segmentation, gender labeling, agglomerative clustering using a Bayesian information criterion (BIC), followed by agglomerative clustering using state-of-the-art speaker identification (SID) methods and Viterbi re-segmentation using Gaussian mixture models (GMMs). We repeat these multistage segmentation and clustering steps twice: once with mel-frequency cepstral coefficients (MFCCs) as feature parameters for the GMMs used in gender labeling, SID, and Viterbi re-segmentation steps and another time with Gaussianized MFCCs as feature parameters for the GMMs used in these three steps. The resulting clusters from the parallel runs are combined in a novel way that leads to a significant reduction in the diarization error rate (DER). On a development set containing 30 telephone conversations, this combination step reduced the DER by 20 %. On another test set containing 30 telephone conversations, this step reduced the DER by 13%. The best error rate we have achieved is 6.7 % on the development set and 9.0 % on the test set.

关键词： Bayesian information criterion (BIC) clustering speaker diarization speaker identification (SID) clustering speaker segmentation and clustering

来源：评论

学校读者我要写书评

暂无评论

Model complexity selection and cross-validation em training for robust speaker diarization

Model complexity selection and cross-validation em training ...

引用

32nd IEEE International Conference on Acoustics, Speech and Signal Processing

作者： Anguera, Xavier Shinozaki, Takahiro Wooters, Chuck Hernando, Javier Int Comp Sci Inst Berkeley CA 94704 USA Tech Univ Catalonia UPC Barcelona 08034 Spain Univ Washington Dept Elect Engn Seattle WA 98195 USA Kyoto Univ Kyoto 6068501 Japan

ISBN: (纸本)1424407281

Accurate modeling of speaker clusters is important in the task of speaker diarization. Creating accurate models involves both selection of the model complexity and optimum training given the data. Using models with fixed complexity and trained using the standard EM algorithm poses a risk of overfitting, which can lead to a reduction in diarization performance. In this paper a technique proposed by the author to estimate the complexity of a model is combined with a novel training algorithm called "Cross-Validation EM" to control the number of training iterations. This combination leads to more robust speaker modeling and results in an increase in speaker diarization performance. Tests on the NIST RT (MDM) datasets for meetings show a relative improvement of 10.6% relative on the test set.

关键词： speaker diarization speaker segmentation and clustering complexity selection cross-validation EM training

来源：评论

学校读者我要写书评

暂无评论

Multiple feature combination to improve speaker diarization of telephone conversations

Multiple feature combination to improve speaker diarization ...

引用

IEEE Workshop on Automatic Speech Recognition and Understanding

作者： Gupta, Vishwa Kenny, Patrick Ouellet, Pierre Boulianne, Gilles Dumouchel, Pierre Centre de Recherche Informatique de Montréal (CRIM)

ISBN: (纸本)9781424417452

We report results on speaker diarization of telephone conversations. This speaker diarization process is similar to the multistage segmentation and clustering system used in broadcast news. It consists of an initial acoustic change point detection algorithm, iterative Viterbi re-segmentation, gender labeling, agglomerative clustering using a Bayesian information criterion (BIC), followed by agglomerative clustering using stateof-the-art speaker identification methods (SID) and Viterbi resegmentation using Gaussian mixture models (GMMs). The Viterbi re-segmentation using GNMs is new, and it reduces the diarization error rate (DER) by 10%. We repeat these multistage segmentation and clustering steps twice: once with MFCCs as feature parameters for the GMMs used in gender labeling, SID and Viterbi re-segmentation steps, and another time with Gaussianized MFCCs as feature parameters for the GMMs used in these three steps. The resulting clusters from the parallel runs are combined in a novel way that leads to a significant reduction in the DER. On a development set containing 30 telephone conversations, this combination step reduced the DER by 20%. On another test set containing 30 telephone conversations, this step reduced the DER by 13%. The best error rate we have achieved is 6.7% on the development set, and 9.0% on the test set.

关键词： speaker diarization speaker segmentation and clustering BIC clustering SID clustering

来源：评论

学校读者我要写书评

暂无评论

Robust speaker diarization in a multi-speaker environment using autocorrelation-based noise subtraction

Robust speaker diarization in a multi-speaker environment us...

引用

7th IEEE International Symposium on Signal Processing and Information Technology

作者： Mirrezaie, S. M. Ahadi, S. M. Kashi, A. Amir Kabir Univ Technol Dept Elect Engn Tehran 15914 Iran

ISBN: (纸本)9781424418343

This paper shows research performed into the topic of speaker diarization for multi-speaker environment. It looks into the algorithms and the implementation of an off-line speaker segmentation and indexing system for recorded speech data where usually more than one speaker is present. speaker diarization is a well studied topic in the domain of broadcast news recordings. Most of the proposed systems involve hierarchical clustering of the data, where the number of speakers and their identities are known a priori. speaker diarization is the task of assigning a unique label to all speech segments in an audio stream by the same speaker. There are two key challenges: processing speed and robustness in the presence of noise. In this paper we address the robustness issue by using a method already successful in speech recognition application. Using ANS (Autocorrelation-Based Noise Subtraction) for robust genetic algorithm-based speaker diarization, we compare the results with the baseline MFCC-based system in clean and noisy conditions.

关键词： robust speaker diarization speaker segmentation and clustering meetings indexing noisy speech

来源：评论

学校读者我要写书评

暂无评论

An overview of automatic speaker diarization systems

引用

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 2006年第5期14卷 1557-1565页

作者： Tranter, Sue E. Reynolds, Douglas A. Univ Cambridge Dept Engn Cambridge CB2 1PZ England MIT Lincoln Lab Lexington MA 02420 USA

Audio diarization is the process of annotating an input audio channel with information that attributes (possibly overlapping) temporal regions of signal energy to their specific sources. These sources can include particular speakers, music, background noise sources, and other signal source/channel characteristics. Diarization can be used for helping speech recognition, facilitating the searching and indexing of audio archives, and increasing the richness of automatic transcriptions, making them more readable. In this paper, we provide an overview of the approaches currently used in a key area of audio diarization, namely speaker diarization, and discuss their relative merits and limitations. Performances using the different techniques are compared within the framework of the speaker diarization task in the DARPA EARS Rich Transcription evaluations. We also look at how the techniques are being introduced into real broadcast news systems and their portability to other domains and tasks such as meetings and speaker verification.

关键词： speaker diarization speaker segmentation and clustering

来源：评论

学校读者我要写书评

暂无评论

Multistage speaker diarization of broadcast news

引用

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 2006年第5期14卷 1505-1512页

作者： Barras, Claude Zhu, Xuan Meignier, Sylvain Gauvain, Jean-Luc CNRS Comp Sci Lab Mech & Engn Sci F-91403 Orsay France

This paper describes recent advances in speaker diarization with a multistage segmentation and clustering system, which incorporates a speaker identification step. This system builds upon the baseline audio partitioner used in the LIMSI broadcast news transcription system. The baseline partitioner provides a high cluster purity, but has a tendency to split data from speakers with a large quantity of data into several segment clusters. Several improvements to the baseline system have been made. First, the iterative Gaussian mixture model (GNM) clustering has been replaced by a Bayesian information criterion (BIC) agglomerative clustering. Second, an additional clustering stage has been added, using a GMM-based speaker identification method. Finally, a post-processing stage refines the segment boundaries using the output of a transcription system. On the National Institute of Standards and Technology (NIST) RT-04F and ESTER evaluation data, the multistage system reduces the speaker error by over 70% relative to the baseline system, and gives'between 40% and 50% reduction relative to a single-stage BIC clustering system.

关键词： Bayesian information criterion (BIC) clustering speaker diarization speaker identification (SID) speaker segmentation and clustering

来源：评论

学校读者我要写书评

暂无评论

Step-by-step and integrated approaches in broadcast news speaker diarization

引用

COMPUTER SPEECH AND LANGUAGE 2006年第2-3期20卷 303-330页

作者： Meignier, S Moraru, D Fredouille, C Bonastre, JF Besacier, L Univ Avignon Dept Comp CNRS LIA F-84911 Avignon 9 France UFJ CLIPS IMAG F-38041 Grenoble 9 France CNRS CLIPS IMAG F-38041 Grenoble 9 France Univ Maine CNRS LIUM F-72085 Le Mans 9 France

This paper summarizes the collaboration of the LIA and CLIPS laboratories on speaker diarization of broadcast news during the spring NIST Rich Transcription 2003 evaluation campaign (NIST-RT'03S). The speaker diarization task consists of segmenting a conversation into homogeneous segments which are then grouped into speaker classes. Two approaches are described and compared for speaker diarization. The first one relies on a classical two-step speaker diarization strategy based on a detection of speaker turns followed by a clustering process, while the second one uses an integrated strategy where both segment boundaries and speaker tying of the segments are extracted simultaneously and challenged during the whole process. These two methods are used to investigate various strategies for the fusion of diarization results. Furthermore, segmentation into acoustic macro-classes is proposed and evaluated as a priori step to speaker diarization. The objective is to take advantage of the a priori acoustic information in the diarization process. Along with enriching the resulting segmentation with information about speaker gender, channel quality or background sound, this approach brings gains in speaker diarization performance thanks to the diversity of acoustic conditions found in broadcast news. The last part of this paper describes some ongoing works carried out by the CLIPS and LIA laboratories and presents some results obtained since 2002 on speaker diarization for various corpora. (c) 2005 Elsevier Ltd. All rights reserved.

关键词： speaker indexing speaker segmentation and clustering speaker diarization E-HMM Integrated approach Step-by-step approach

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：