文献详情 >Enhancing English accent ident... 收藏

Enhancing English accent identification in automatic speech recognition using spectral features and hybrid CNN-BiLSTM model

作者：Ahmed, Ghayas Lawaye, Aadil Ahmad Jain, Vishal Chatterjee, Jyotir Moy Mahajan, Shubham

作者机构：Department of Computer Sciences Baba Ghulam Shah Badshah University J&K Rajouri India Department of Computer Science & Engineering Sharda University Noida India Dehradun India Department of Computer Science & Engineering Amity University Haryana India

出版物：《Multimedia Tools and Applications》 (Multimedia Tools Appl)

年卷期：2025年

页面：1-28页

核心收录：

学科分类：08[工学] 070206[理学-声学] 0803[工学-光学工程] 0701[理学-数学] 0702[理学-物理学] 0812[工学-计算机科学与技术（可授工学、理学学位）]

主　　题：Spectrographs

摘要：Automatic Speech Recognition (ASR) has been the regnant research area in the domain of Natural Language Processing for the last few decades. Past years’ advancement provides progress in this area of research. The accent of spoken is the prominent factor affecting the performance of ASR. Accent is the pronunciation of a word in a distinct way which may vary from specimen to specimen depending upon their social class, formality, age, vocal property, gender, geography, and influence of native or other language. Recognizing an accent requires various complex characteristics and features of voice such as voice quality, prosody, and phoneme pronunciation which are difficult to extract and analyze. To solve these difficulties, the researchers focus their attention on spectral representation such as Mel Spectrogram and Mel-Frequency Cepstral Coefficient (MFCC). In this study, we will analyze which features achieve maximum accuracy for the English accent classification task. Here in this work, we perform our experiments on Log Mel filter bank and MFCC features with four different window function. We extract features from raw audio with various windowing function and train a hybrid model of Convolutional Neural Network and Bidirectional Long Short-term Memory network (CNN-BiLSTM) on a custom dataset having nine different accents of English language namely American, Australian, British, Indian (Oriya, Bangla, Telegu, Malayalam), and Welsh. The accuracy of each feature is evaluated and compared. The log Mel filter bank outputs the highest accuracy of 99.75% and 99.91% of training and validation with Han window function while MFCC with Bartlett window function achieves 99.98% of training and 99.33% of validation accuracy. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.

本地馆藏 | 借阅须知 | 我要预约

已订购，未入库

sda

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Enhancing English accent identification in automatic speech recognition using spectral features and hybrid CNN-BiLSTM model

读者评论与其他读者分享你的观点

请选择收藏分类：

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Enhancing English accent identification in automatic speech recognition using spectral features and hybrid CNN-BiLSTM model

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：