文献详情 >Enhanced text-independent spea... 收藏

Enhanced text-independent speaker recognition using MFCC, Bi-LSTM, and CNN-based noise removal techniques

作者：Tiwari, Manish Verma, Deepak Kumar

作者机构：Department of Computer Science and Engineering University Institute of Engineering and Technology Chhatrapati Shahu Ji Maharaj University UP Kanpur India

出版物：《International Journal of Speech Technology》 (Int J Speech Technol)

年卷期：2024年第27卷第4期

页面：1013-1026页

核心收录：

学科分类：0810[工学-信息与通信工程] 08[工学] 070206[理学-声学] 0702[理学-物理学] 0812[工学-计算机科学与技术（可授工学、理学学位）]

主　　题：Convolutional neural networks

摘要：This research article introduces a novel approach to text-independent speaker recognition by integrating Mel-Frequency Cepstral Coefficients (MFCC) and Bidirectional Long Short-Term Memory (Bi-LSTM) networks, with noise removal facilitated by Convolutional Neural Networks (CNNs). The primary objective is to upgrade the robustness and precision of speaker recognition systems in real-world environments where background noise is prevalent. The proposed method begins with the extraction of MFCC features, which effectively capture the timbral characteristics of the speech signal. To enhance these features, we employ a CNN-based noise removal mechanism that reduces background interference, thereby improving the quality of the input signal. The denoised MFCC features are then fed into a Bi-LSTM network, which excels in modeling temporal dependencies and capturing long-range contextual information inherent in speech data. Extensive experiments were conducted on publicly available datasets, demonstrating significant improvements in speaker recognition accuracy under various noise conditions compared to traditional approaches. The integration of CNN for noise removal and Bi-LSTM for temporal feature modeling showcases a synergistic effect, leading to a more robust and reliable speaker recognition system. Our results underscore the effectiveness of combining advanced feature extraction, noise reduction, and deep learning techniques for enhanced speaker recognition in challenging acoustic environments. The accuracy of the proposed method is found to be 98.17% at the Signal to Noise Ratio (SNR) level of 30 dB. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.

本地馆藏 | 借阅须知 | 我要预约

已订购，未入库

sda

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Enhanced text-independent speaker recognition using MFCC, Bi-LSTM, and CNN-based noise removal techniques

读者评论与其他读者分享你的观点

请选择收藏分类：

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Enhanced text-independent speaker recognition using MFCC, Bi-LSTM, and CNN-based noise removal techniques

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：