The voice recognition of non-native tongue English learners is an important challenge in the field of speech recognition. Existing technology still has obvious defects when dealing with non-native tongue pronunciation...
详细信息
The voice recognition of non-native tongue English learners is an important challenge in the field of speech recognition. Existing technology still has obvious defects when dealing with non-native tongue pronunciation. This study combined bert (Bidirectional Encoder Representations from Transformers) and CNN-HMM models and introduced an attention mechanism to improve the accuracy of voice recognition of non-native tongue English learners. It used the pre-trained bert model to extract the context of the voice signal and used the CNN (Convolutional Neural Network) for local feature extraction, and used the Hidden Markov model (HMM) to make a sequence model building model to capture ability of key features. The experimental results show that the accuracy rate of voice recognition of the bert-CNN-HMM model in this article reaches 88.9% under normal speed, which is significantly better than 78.5% of the traditional HMM model. Under different noise levels, the accuracy of the article's model in low noise, medium noise, and high noise environments is 86.9%, 80.5%, and 72.8%, respectively, which are higher than other comparative models. The accuracy of the models in this article remained above 83.8% when processing different accents, showing strong adaptation to strong adaptation and generality. It can be seen from the experimental results that the model of this article can significantly improve the accuracy of the voice recognition of non-native English learners and provide a new direction for further research in the field of voice recognition.
暂无评论