Background. In today's world, numerous applications integral to various facets of daily life include automatic speech recognition methods. Thus, the development of a successful automatic speech recognition system ...
详细信息
Background. In today's world, numerous applications integral to various facets of daily life include automatic speech recognition methods. Thus, the development of a successful automatic speech recognition system can significantly augment the convenience of people's daily routines. While many automatic speech recognition systems have been established for widely spoken languages like English, there has been insufficient progress in developing such systems for less common languages such as Turkish. Moreover, due to its agglutinative structure, designing a speech recognition system for Turkish presents greater challenges compared to other language groups. Therefore, our study focused on proposing deep learning models for automatic speech recognition in Turkish, complemented by the integration of a language model. Methods. In our study, deep learning models were formulated by incorporating convolutional neural networks, gated recurrent units, long short-term memories, and transformer layers. The Zemberek library was employed to craft the language model to improve system performance. Furthermore, the Bayesian optimization method was applied to fine-tune the hyper-parameters of the deep learning models. To evaluate the model's performance, standard metrics widely used in automatic speech recognition systems, specifically word error rate and character error rate scores, were employed. Results. Upon reviewing the experimental results, it becomes evident that when optimal hyper-parameters are applied to models developed with various layers, the scores are as follows: Without the use of a language model, the Turkish Microphone Speech Corpus dataset yields scores of 22.2 -word error rate and 14.05-character error rate, while the Turkish Speech Corpus dataset results in scores of 11.5 -word error rate and 4.15 character error rate. Upon incorporating the language model, notable improvements were observed. Specifically, for the Turkish Microphone Speech Corpus dataset, the word
When recommendation algorithms are used in different languages and environments, the application of traditional recommendation algorithms will face many challenges. This study explores recommendation algorithms based ...
详细信息
When recommendation algorithms are used in different languages and environments, the application of traditional recommendation algorithms will face many challenges. This study explores recommendation algorithms based on POI similarity and the matching method of translation machine to improve the accuracy and personalization of recommendations. This paper constructs a POI database including tourist attractions information, calculates the similarity between the attractions by similarity calculation methods, and utilizes translation machine technology to match the user language preferences with the attraction information. The recommendation algorithm designed by the paper is to use the results from POI similarity and translation machine matching to generate a personalized recommendation for tourist attractions. From the experimental evaluation, the recommendation algorithms based on POI similarity and translation machine matching have shown significant results in the recommendation, which can provide a more accurate and personalized recommendation results based on the user's interest and the language user prefers.
Converting sign language to a form of natural language is one of the recent areas of the machine learning domain. Many research efforts have focused on categorizing sign language into gesture or facial recognition. Ho...
详细信息
Converting sign language to a form of natural language is one of the recent areas of the machine learning domain. Many research efforts have focused on categorizing sign language into gesture or facial recognition. However, these efforts ignore the linguistic structure and the context of natural sentences. Traditional translation methods have low translation quality, poor scalability of their underlying models, and are time-consuming. The contribution of this paper is twofold. First, it proposes a deep learning approach for bidirectional translation using GRU and LSTM. In each of the proposed models, Bahdanau and Luong's attention mechanisms are used. Second, the paper experiments proposed models on two sign languages corpora: namely, ASLG-PC12 and Phoenix-2014T. The experiment conducted on 16 models reveals that the proposed model outperforms the other previous work on the same corpus. The results on the ASLG-12 corpus, when translating from text to gloss, reveal that the GRU model with Bahdanau attention gives the best result with ROUGE (Recall-Oriented Understudy for Gisting Evaluation) score 94.37% and BLEU (Bilingual Evaluation Understudy)-4 score 83.98%. When translating from gloss to text, the results also show that the GRU model with Bahdanau attention achieves the best result with ROUGE score 87.31% and BLEU-4 66.59%. On Phoenix-2014T corpus, the results of text to gloss translation show that the GRU model with Bahdanau attention gives the best result in ROUGE with a score of 42.96%, while the GRU model with Luong attention gives the best result in BLEU-4 with 10.53%. When translating from gloss to text, the results report that the GRU model with Luong attention achieves the best result in ROUGE with a score of 45.69% and BLEU-4 with a score of 19.56%.
This paper describes a short-term traffic flow forecasting approach that combines efficient probability based diffusion mechanism and embedding of surrounding information to handle both ordinary situation and abnormal...
详细信息
ISBN:
(数字)9781728150338
ISBN:
(纸本)9781728150345
This paper describes a short-term traffic flow forecasting approach that combines efficient probability based diffusion mechanism and embedding of surrounding information to handle both ordinary situation and abnormal situations. A discrete diffusion model is insufficient in handle Spatial temporal traffic flow data sampled at regular time interval as there may be information loss during volatile environment. The hybrid model utilize DCRNN as discrete diffusion model and GRU as embedding, combining both for accurate prediction. Preliminary results shows improvement in both microscopic and macroscopic scales indicating the potential of hybrid approach towards accurate and efficient short term traffic flow forecasting.
暂无评论