In this paper the authors have developed a Convolutional Neural Network architecture adapted to Speech-to-Text research field. This type of network has been chosen due to its capacity to extract the relevant features ...
详细信息
ISBN:
(纸本)9781728195438
In this paper the authors have developed a Convolutional Neural Network architecture adapted to Speech-to-Text research field. This type of network has been chosen due to its capacity to extract the relevant features and its popularity in classification problems. A particular model for a Speech-to-Text application has been designed. The parameters of the model (i.e. the size of filters and kernels), and the number of the layers have been chosen by conducting appropriate experiments, and the model that ensured the highest accuracy has been selected. The model takes raw waveforms of spoken digits as input, and outputs a text with the predicted digit. The network is capable of providing the right digit no matter the gender or age of the speaker. The overfitting has been avoided by using Dropout layers and early stopping function. In order to select the best model, the authors have taken into account two basic criteria: the accuracy of the model, and the execution time, respectively. Considering the computational time, the first order cost function has been chosen. By testing different gradient descent optimization algorithms, the best optimizer has been selected. The application has been developed using Python programming language.
暂无评论