版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Department of Computer Sciences Baba Ghulam Shah Badshah University J&K Rajouri India Department of Computer Science & Engineering Sharda University Noida India Dehradun India Department of Computer Science & Engineering Amity University Haryana India
出 版 物:《Multimedia Tools and Applications》 (Multimedia Tools Appl)
年 卷 期:2025年
页 面:1-28页
核心收录:
学科分类:08[工学] 070206[理学-声学] 0803[工学-光学工程] 0701[理学-数学] 0702[理学-物理学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
主 题:Spectrographs
摘 要:Automatic Speech Recognition (ASR) has been the regnant research area in the domain of Natural Language Processing for the last few decades. Past years’ advancement provides progress in this area of research. The accent of spoken is the prominent factor affecting the performance of ASR. Accent is the pronunciation of a word in a distinct way which may vary from specimen to specimen depending upon their social class, formality, age, vocal property, gender, geography, and influence of native or other language. Recognizing an accent requires various complex characteristics and features of voice such as voice quality, prosody, and phoneme pronunciation which are difficult to extract and analyze. To solve these difficulties, the researchers focus their attention on spectral representation such as Mel Spectrogram and Mel-Frequency Cepstral Coefficient (MFCC). In this study, we will analyze which features achieve maximum accuracy for the English accent classification task. Here in this work, we perform our experiments on Log Mel filter bank and MFCC features with four different window function. We extract features from raw audio with various windowing function and train a hybrid model of Convolutional Neural Network and Bidirectional Long Short-term Memory network (CNN-BiLSTM) on a custom dataset having nine different accents of English language namely American, Australian, British, Indian (Oriya, Bangla, Telegu, Malayalam), and Welsh. The accuracy of each feature is evaluated and compared. The log Mel filter bank outputs the highest accuracy of 99.75% and 99.91% of training and validation with Han window function while MFCC with Bartlett window function achieves 99.98% of training and 99.33% of validation accuracy. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.