检索结果-内蒙古大学图书馆

3rd Biennial South African Biomedical Engineering Conference (SAIBMEC)

作者： Sutcliffe, Bianca Wiggins, Lindzi Rubin, David Aharonson, Vered Univ Witwatersrand Sch Elect & Informat Engn Biomed Engn Res Grp Johannesburg South Africa

ISBN: (纸本)9781538625163

Vocal rehabilitation devices used by patients after Laryngectomy produce an unnatural sounding speech. Our study aims at increasing the quality of these synthetically generated voices by implementing human-like characteristics. A simplified source filter model, linear predictive coding coefficients and line spectral frequencies were used to model the vocal tract and manipulate the acoustic features of their resulting speech. Two different mapping functions were employed to convert between the features of synthetically generated voice and those of a human voice: A Gaussian mixture model and a linear regression model. The models were trained on a set of 50 human and 50 synthetic voice utterances. Both mapping functions yielded significant changes in the transformed synthetic voices and their spectra were similar to the human voices. The linear regression model mapping produced slightly better results compared to the Gaussian mixture model mapping. Listeners' tests confirmed this result, but indicated that voices re-synthesized from the transformed model coefficients, improved on the synthetic voice but still sounded unnatural. This may imply that the vocal tract model is lacking in information that produces the subjective perception of "artificial speech". Future work will investigate an elaborate model which will include the speech production excitation and radiation signals and the transformation of their features. These models have the potential to improve the conversion of synthetically generated electrolarynx voice into human sounding one.

关键词： voice conversion source-filter model Gaussian mixture model linear regression linear predictive coding coefficients line spectral frequencies

来源：评论

学校读者我要写书评

暂无评论

Comparison of Feature Performance in Gunshot Detection Depending on Noise Degradation 27

Comparison of Feature Performance in Gunshot Detection Depen...

引用

27th International Conference on Radioelektronika (RADIOELEKTRONIKA)

作者： Hrabina, Martin Sigmund, Milan Brno Univ Technol Dept Radio Elect Tech 12 Brno 61600 Czech Republic

ISBN: (纸本)9781509045914

This paper compares three different features and various feature orders for the purpose of determining the best feature for gunshot detection under adverse noise condition. Compared features cover LPC, LPCC and MFCC with orders from 8 to 30. All features were extracted from sounds with the sound-to-noise ratios 30, 20, 10, and 0 dB. The background noise was simulated by white noise. Experimental results indicate that LPC coefficients are the most efficient features, especially for low noise. On the other hand, MFCC performed well in noisy environments at 10 dB and 20 dB.

关键词： gunshot detection feature analysis linear predictive coding coefficients cepstrum noise

来源：评论

学校读者我要写书评

暂无评论

A novel approach of Speech Emotion Recognition with prosody, quality and derived features using SVM classifier for a class of North-Eastern Languages 2

A novel approach of Speech Emotion Recognition with prosody,...

引用

IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)

作者： Samantaray, Amiya Kumar Mahapatra, Kamalakanta Kabi, Bibek Routray, Aurobinda Phoenix Robotix Rourkela India NIT Rourkela Dept ECE Rourkela Odisha India IIT Kharagpur Adv Technol Dept Ctr Kharagpur W Bengal India IIT Kharagpur Dept Elect Engn Kharagpur W Bengal India

ISBN: (纸本)9781479983490

Speech emotion recognition is one of the recent challenges in speech processing and Human Computer Interaction (HCI) in order to address various operational needs for the real world applications. Besides human facial expressions, speech has been proven to be one of the most precious modalities for automatic recognition of human emotions. Speech is a spontaneous medium of perceiving emotions which provides in-depth information related to different cognitive states of a human being. In this context, a novel approach is being introduces using a combination of prosody features (i.e. pitch, energy, Zero crossing rate), quality features (i.e. Formant Frequencies, Spectral features etc.), derived features (i.e. Mel-Frequency Cepstral Coefficient (MFCC), linear predictive coding coefficients (LPCC)) and dynamic feature (Mel-Energy spectrum dynamic coefficients (MEDC)) for robust automatic recognition of speaker's state of emotion. Multilevel SVM classifier is used for identification of seven discrete emotional states namely anger, disgust, fear, happy, neutral, sad and surprise in 'Five native Assamese Languages'. The overall results of the conducted experiments revealed that the approach of using the combination of features achieved an average accuracy rate of 82.26% for speaker independent cases.

关键词： linear predictive coding coefficients Mel Frequency Cepstral coefficients Prosody features Quality features Speech Emotion Recognition Support Vector Machine

来源：评论

学校读者我要写书评

暂无评论

Robust Speech Recognition System Using Conventional and Hybrid Features of MFCC,LPCC,PLP,RASTA-PLP and Hidden Markov Model Classifier in Noisy Conditions

引用

Journal of Computer and Communications 2015年第6期3卷 1-9页

作者： Veton Z.Kepuska Hussien A.Elharati Electrical&Computer Engineering Department Florida Institute of TechnologyMelbourneFLUSA

In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance degradation in noisy conditions or distorted channels. It is necessary to search for more robust feature extraction methods to gain better performance in adverse conditions. This paper investigates the performance of conventional and new hybrid speech feature extraction algorithms of Mel Frequency Cepstrum Coefficient (MFCC), linear Prediction coding Coefficient (LPCC), perceptual linear production (PLP), and RASTA-PLP in noisy conditions through using multivariate Hidden Markov Model (HMM) classifier. The behavior of the proposal system is evaluated using TIDIGIT human voice dataset corpora, recorded from 208 different adult speakers in both training and testing process. The theoretical basis for speech processing and classifier procedures were presented, and the recognition results were obtained based on word recognition rate.

关键词： Speech Recognition Noisy Conditions Feature Extraction Mel-Frequency Cepstral coefficients linear predictive coding coefficients Perceptual linear Production RASTA-PLP Isolated Speech Hidden Markov Model

来源：评论

学校读者我要写书评

暂无评论

Wake-Up-Word Feature Extraction on FPGA

引用

World Journal of Engineering and Technology 2014年第1期2卷 1-12页

作者： Veton ZKepuska Mohamed MEljhani Brian HHight Electrical&Computer Engineering Department Florida Institute of TechnologyMelbourneUSA

Wake-Up-Word Speech Recognition task (WUW-SR) is a computationally very demand, particularly the stage of feature extraction which is decoded with corresponding Hidden Markov Models (HMMs) in the back-end stage of the WUW-SR. The state of the art WUW-SR system is based on three different sets of features: Mel-Frequency Cepstral coefficients (MFCC), linear predictive coding coefficients (LPC), and Enhanced Mel-Frequency Cepstral coefficients (ENH_MFCC). In (front-end of Wake-Up-Word Speech Recognition System Design on FPGA) [1], we presented an experimental FPGA design and implementation of a novel architecture of a real-time spectrogram extraction processor that generates MFCC, LPC, and ENH_MFCC spectrograms simultaneously. In this paper, the details of converting the three sets of spectrograms 1) Mel-Frequency Cepstral coefficients (MFCC), 2) linear predictive coding coefficients (LPC), and 3) Enhanced Mel-Frequency Cepstral coefficients (ENH_MFCC) to their equivalent features are presented. In the WUW- SR system, the recognizer’s frontend is located at the terminal which is typically connected over a data network to remote back-end recognition (e.g., server). The WUW-SR is shown in Figure 1. The three sets of speech features are extracted at the front-end. These extracted features are then compressed and transmitted to the server via a dedicated channel, where subsequently they are decoded.

关键词： Speech Recognition System Feature Extraction Mel-Frequency Cepstral coefficients linear predictive coding coefficients Enhanced Mel-Frequency Cepstral coefficients Hidden Markov Models Field-Programmable Gate Arrays

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：