language understanding is a multi-faceted cognitive capability, which the Natural languageprocessing (NLP) community has striven to model computationally for decades. Traditionally, facets of linguistic intelligence ...
详细信息
Nasalance, defined as the ratio of nasal energy to total acoustic energy during speech, is an important metric in speechscience and clinical phonetics. Measurement of nasalance, how-ever, requires specialized equipme...
详细信息
ISBN:
(数字)9798331516826
ISBN:
(纸本)9798331516833
Nasalance, defined as the ratio of nasal energy to total acoustic energy during speech, is an important metric in speechscience and clinical phonetics. Measurement of nasalance, how-ever, requires specialized equipment, which has severely limited its widespread applications. In this study, we explored methods of predicting nasalance from speech waveforms. We designed an oral-nasal separation mask with thermal flow sensors to record the airflows from the mouth and nose separately during speech production, alongside a microphone recording speech sounds. Nasalance was calculated from the oral and nasal airflows, and multilayer perceptron (MLP) models were trained to predict nasalance from speech waveforms. We compared Mel-spectrogram, Mel Frequency Cepstral Coefficients (MFCC), and Wav2vec 2.0 features as inputs to MLPs. The results demonstrated that the Wav2vec 2.0-based features have the highest Pearson Product Moment Correlation Coefficient (PPMC) of 0.7459, outperforming both the Mel-spectrogram and MFCC baselines. These findings emphasize the potential of leveraging pre-trained deep learning models such as Wav2vec 2.0 to predict nasalance directly from raw audio data, reducing reliance on expensive instruments and improving diagnostic capabilities in speech pathology. Moreover, this paper under-scores the promise of deep learning methods in advancing clinical assessment and opens up new avenues for applying compu-tational techniques to better understand and treat speech disorders.
Extraction of supportive premises for a mathematical problem can contribute to profound success in improving automatic reasoning systems. One bottleneck in automated theorem proving is the lack of a proper semantic in...
详细信息
We extend the Yawipa Wiktionary Parser (Wu and Yarowsky, 2020) to extract and normalize translations from etymology glosses, and morphological form-of relations, resulting in 300K unique translations and over 4 millio...
详细信息
This paper describes the participation of the DUTH-ATHENA team of Democritus University of Thrace and Athena Research center in the eRisk 2021 task, which focuses on measuring the level of depression based on Reddit u...
详细信息
In this work, we study the features extracted by English self-supervised learning (SSL) models in cross-lingual contexts and propose a new metric to predict the quality of feature representations. Using automatic spee...
详细信息
In Natural languageprocessing, entity linking (EL) has centered around Wikipedia, but remains underexplored for the job market domain. Disambiguating skill mentions can help us to get insight into the labor market de...
详细信息
With careful manipulation, malicious agents can reverse engineer private information encoded in pre-trained language models. Security concerns motivate the development of quantum pre-training. In this work, we propose...
详细信息
With careful manipulation, malicious agents can reverse engineer private information encoded in pre-trained language models. Security concerns motivate the development of quantum pre-training. In this work, we propose a highly portable quantum language model (PQLM) that can easily transmit information to downstream tasks on classical machines. The framework consists of a cloud PQLM built with random Variational Quantum Classifiers (VQC) and local models for downstream applications. We demonstrate the ad hoc portability of the quantum model by extracting only the word embeddings and effectively applying them to downstream tasks on classical machines. Our PQLM exhibits comparable performance to its classical counterpart on both intrinsic evaluation (loss, perplexity) and extrinsic evaluation (multilingual sentiment analysis accuracy) metrics. We also perform ablation studies on the factors affecting PQLM performance to analyze model stability. Our work establishes a theoretical foundation for a portable quantum pre-trained language model that could be trained on private data and made available for public use with privacy protection guarantees.
This paper presents an Isarn dialect word segmentation based on a recurrent neural network. In this study, the Isarn text written in Thai script is taken as input. We explored the effectiveness of the types of recurre...
详细信息
ISBN:
(纸本)9781665462730
This paper presents an Isarn dialect word segmentation based on a recurrent neural network. In this study, the Isarn text written in Thai script is taken as input. We explored the effectiveness of the types of recurrent layers; recurrent neural networks (RNN), gated recurrent units (GRU), and long short-term memory (LSTM). The F1-scores of RNN, GRU, and LSTM are 95.36, 96.05, and 95.70, respectively. The experiment results showed that using GRU as the recurrent layer achieved the best performance. To deal with borrowed words from Thai, transfer learning was applied to improve the performance of the model by fine-tuning the pre-trained model given the limited size of the Isarn corpus. The model trained through the transfer learning approach outperformed the model trained from the Isarn dataset alone.
We present a large multi-signer video corpus for the Greek Sign language (GSL), suitable for the development and evaluation of GSL recognition algorithms. The database has been collected as part of the “SL-ReDu” pro...
We present a large multi-signer video corpus for the Greek Sign language (GSL), suitable for the development and evaluation of GSL recognition algorithms. The database has been collected as part of the “SL-ReDu” project that focuses on the education use-case of systematic teaching of GSL as a second language (L2). The project aims to assist this process by allowing self-monitoring and objective assessment of GSL learners’ productions through the use of recognition technology, thus requiring suitable data resources relevant to the aforementioned use-case. To this end, we present the SL-ReDu GSL corpus, an extensive RGB+D video collection of 21 informants with a duration of 36 hours, recorded under studio conditions, consisting of: (i) isolated signs; (ii) continuous signing (annotated at the sentence level); and (iii) fingerspelling of words. We provide a detailed description of the design and acquisition methods used to develop it, along with corpus statistics and a comparison to existing sign language datasets. The SL-ReDu GSL corpus, as well as proposed frameworks for recognition experiments on it, are publicly available at https://***/corpus.
暂无评论