咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Enhancing English accent ident... 收藏

Enhancing English accent identification in automatic speech recognition using spectral features and hybrid CNN-BiLSTM model

作     者:Ahmed, Ghayas Lawaye, Aadil Ahmad Jain, Vishal Chatterjee, Jyotir Moy Mahajan, Shubham 

作者机构:Department of Computer Sciences Baba Ghulam Shah Badshah University J&K Rajouri India Department of Computer Science & Engineering Sharda University Noida India  Dehradun India Department of Computer Science & Engineering Amity University Haryana India 

出 版 物:《Multimedia Tools and Applications》 (Multimedia Tools Appl)

年 卷 期:2025年

页      面:1-28页

核心收录:

学科分类:08[工学] 070206[理学-声学] 0803[工学-光学工程] 0701[理学-数学] 0702[理学-物理学] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

主  题:Spectrographs 

摘      要:Automatic Speech Recognition (ASR) has been the regnant research area in the domain of Natural Language Processing for the last few decades. Past years’ advancement provides progress in this area of research. The accent of spoken is the prominent factor affecting the performance of ASR. Accent is the pronunciation of a word in a distinct way which may vary from specimen to specimen depending upon their social class, formality, age, vocal property, gender, geography, and influence of native or other language. Recognizing an accent requires various complex characteristics and features of voice such as voice quality, prosody, and phoneme pronunciation which are difficult to extract and analyze. To solve these difficulties, the researchers focus their attention on spectral representation such as Mel Spectrogram and Mel-Frequency Cepstral Coefficient (MFCC). In this study, we will analyze which features achieve maximum accuracy for the English accent classification task. Here in this work, we perform our experiments on Log Mel filter bank and MFCC features with four different window function. We extract features from raw audio with various windowing function and train a hybrid model of Convolutional Neural Network and Bidirectional Long Short-term Memory network (CNN-BiLSTM) on a custom dataset having nine different accents of English language namely American, Australian, British, Indian (Oriya, Bangla, Telegu, Malayalam), and Welsh. The accuracy of each feature is evaluated and compared. The log Mel filter bank outputs the highest accuracy of 99.75% and 99.91% of training and validation with Han window function while MFCC with Bartlett window function achieves 99.98% of training and 99.33% of validation accuracy. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分