This work builds together two popular blocks of neural architecture, namely convolutional layers and Transformers, for large language models (LLMs). Non-causal conformers are used ubiquitously in automatic speech reco...
ISBN:
(纸本)9783031780134;9783031780141
This work builds together two popular blocks of neural architecture, namely convolutional layers and Transformers, for large language models (LLMs). Non-causal conformers are used ubiquitously in automatic speech recognition. This work aims to adapt these architectures in a causal setup for training LLMs. Transformers decoders effectively capture long-range dependencies over several modalities and form a core backbone of modern advancements in machine learning. Convolutional architectures have been popular in extracting features in domains such as raw 1-D signals, speech, and images, to name a few. In this paper, by combining local and global dependencies over latent representations using causal convolutional filters and Transformer, we achieve significant gains in performance. This work showcases a robust speech architecture that can be integrated/adapted in a causal setup beyond speech applications for large-scale language modeling.
Alcohol is a progressive central nervous system depressant. Increased alcohol consumption leads to alterations in cognitive processes and also affects speech production. In this study we present a corpus of n=35 patie...
ISBN:
(纸本)9783031780134;9783031780141
Alcohol is a progressive central nervous system depressant. Increased alcohol consumption leads to alterations in cognitive processes and also affects speech production. In this study we present a corpus of n=35 patients diagnosed with Alcohol Dependency Syndrome (ADS) and n=35 matched healthy controls, and attempt to automatically distinguish the two speaker groups based on their spontaneous speech. By using wav2vec 2.0 embeddings as features, we were able to identify the two speaker categories with quite high accuracy (EER scores between 9% and 20%, and AUC scores above 0.885). We also sought to find the difference between the two speech tasks (a general spontaneous task and an alcohol-related one) performed by the subjects. Lastly, we analyzed the amount of pauses present in the speech of the subjects. Based on our results, even three simple pause-related attributes are sufficient for the automatic identification of the ADS subjects with an acceptable performance for both speech tasks.
Radiology Report Generation aims to generate accurate diagnostic reports based on medical images. Existing approaches based on the Transformer paradigm and grid features had achieved significant performance. However, ...
ISBN:
(纸本)9789819794362;9789819794379
Radiology Report Generation aims to generate accurate diagnostic reports based on medical images. Existing approaches based on the Transformer paradigm and grid features had achieved significant performance. However, this paradigm inevitably loses fine-grained visual representations and ignores multi-level semantic information. Therefore, in this paper, we propose a Semantic Aware and Attention Refine Transformer (SA(3)RT) model to enhance ability of radiology report generation by utilizing multi-granularity semantic information. Specifically, the semantic-aware unsupervised region recognition module relies on clustering algorithms to efficiently and effectively utilize grid features to make the model focus on global-local visual representations. In multi-level fashion, where different layers learn complementary semantic information, the attention-aware refinement module exploits the semantic relationships between tokens of multi-level attention-aware to fuse low- and high-level semantic information. Experiments are performed on four radiology report generation datasets, COV-CTR, COVID-19 CT, IUX-Ray and MIMIC-CXR. The experiments show that the SA(3)RT model achieves results competitive with state-of-the-art methods, and the related experiments also demonstrate that the SA(3)RT model has a strong generalization capability for radiology report generation in different disease domains. Code is available at https://***/Xiaojin-Hua/SA3RT.
Convolutional Neural Networks (CNNs) have demonstrated effectiveness in knowledge graph embedding, but existing CNN-based methods encounter two main challenges. Firstly, CNN-based models with simple architectures are ...
ISBN:
(纸本)9789819794362;9789819794379
Convolutional Neural Networks (CNNs) have demonstrated effectiveness in knowledge graph embedding, but existing CNN-based methods encounter two main challenges. Firstly, CNN-based models with simple architectures are unable to extract latent features. Secondly, these models enhance feature extraction by adding extra modules, which inevitably increases training cost with limited performance improvement. To address these challenges, we go beyond traditional CNNs and propose a novel knowledge graph embedding model, which utilizes the powerful capability of Quaternion Convolutional Neural Networks (QCNNs) for effective representation learning. Specifically, we learn representations of entities and relations in a quaternion space and utilize QCNN to extract the inherent information of the entity-relation matrix. We evaluate the performance of our model on multiple knowledge graph completion benchmark datasets. Experimental results show that our model achieves effective improvements compared to existing CNN-based models. Moreover, in terms of training time, our model is faster than other outstanding models. The code of all experiments is available at https://***/llqy123/ConvQE.
This paper proposes a multi-level discourse coherence evaluation framework aimed at addressing the complex challenges in assessing the coherence of Chinese essays. By integrating advanced deep learning technologies, i...
ISBN:
(纸本)9789819794423;9789819794430
This paper proposes a multi-level discourse coherence evaluation framework aimed at addressing the complex challenges in assessing the coherence of Chinese essays. By integrating advanced deep learning technologies, including TextRCNN-LERT, UIE, and GLM4 models, we successfully constructed a comprehensive evaluation process from logical error detection to topic modeling and feedback generation. Experimental results show that this framework not only effectively identifies logical errors in essays, accurately extracts topic sentences and evaluates their logical relationships, but also generates specific and targeted feedback suggestions, significantly improving the accuracy and practicality of Chinese essay coherence evaluation.
Adverse drug reaction (ADR) is a serious medical issue, so early ADR extraction from Electronic Medical Records (EMRs) is necessary. The majority of current researches on ADR extraction from EMRs are mainly oriented t...
ISBN:
(纸本)9789819794300;9789819794317
Adverse drug reaction (ADR) is a serious medical issue, so early ADR extraction from Electronic Medical Records (EMRs) is necessary. The majority of current researches on ADR extraction from EMRs are mainly oriented to sentence-level, non-real and single-source data, leading a gap in research and practice. To solve this problem, we propose a novel method LLMADR based on style aligned large language models (LLMs) fine-tuning for ADR extraction from document-level and real multi-source Chinese EMRs. We utilize the comprehension and generation capability of LLMs to accomplish ADR extraction from document-level EMRs where irrelevant information interference and long-distance ADR exist, and we craft prompts to guide LLMs in aligning multi-source EMRs with varying styles before training and reasoning, thereby enhancing the generalization capability of our model. Furthermore, We construct a document-level Chinese ADR dataset CADR from two medical organizations without simplification of EMRs to training and evaluating. Comparative experiments on CADR illustrate that from classification and extraction perspectives, LLMADR performs better than several mainstream models and has better generalization capability.
In this work, we conducted a comparative testing of 20 sets of pre-trained vectors to computationally estimate valence ratings of words in the Russian language. The word valence was estimated using neural network pred...
ISBN:
(纸本)9783031780134;9783031780141
In this work, we conducted a comparative testing of 20 sets of pre-trained vectors to computationally estimate valence ratings of words in the Russian language. The word valence was estimated using neural network predictors. A vector representing a word was fed to the input of a multilayer feed-forward neural network that calculated the valence rating of this word. The currently largest Russian dictionary with valence ratings, KartaSlovSent, was used as a source of word valence ratings for training models. The highest accuracy of valence rating estimation was obtained using a set of fasttext vectors trained on the Common-Crawl corpus that includes 103 billion words. Spearman's correlation coefficient between human ratings and their machine ratings was 0.859. The high estimation accuracy and the large size of the dictionary allows one to use this set of vectors to extrapolate human valence ratings to the widest range of words in the Russian language. It is also worth mentioning 4 sets of vectors presented on the RusVectores project page and trained using the texts of the Araneum Russicum Maximum and Taiga corpora. Despite a significantly smaller size of the training corpus, using these sets of vectors allows obtaining only slightly lower accuracy. The lowest results were obtained for sets of vectors trained using corpora of news texts.
This paper presents a new tool learning dataset Seal-Tools, which contains self-instruct API-like tools. Seal-Tools not only offers a large number of tools, but also includes instances which demonstrate the practical ...
ISBN:
(纸本)9789819794331;9789819794348
This paper presents a new tool learning dataset Seal-Tools, which contains self-instruct API-like tools. Seal-Tools not only offers a large number of tools, but also includes instances which demonstrate the practical application of tools. Seeking to generate data on a large scale while ensuring reliability, we propose a self-instruct method to generate tools and instances, allowing precise control over the process. Moreover, our Seal-Tools contains hard instances that call multiple tools to complete the job, among which some are nested tool callings. For precise and comprehensive evaluation, we use strict format control and design three metrics from different dimensions. Therefore, Seal-Tools can serve as a new benchmark to evaluate the tool-calling ability of LLMs. Finally, we evaluate several prevalent LLMs and our finetuned model on Seal-Tools. The results show that current systems are far from perfect. The code, data and experiment results are available at https://***/fairyshine/Seal-Tools.
In traditional lexical chain extraction tasks, researchers typically focus on identifying simple lexical items based on surface grammatical relations, often overlooking compound words with underlying semantic framewor...
ISBN:
(纸本)9789819794423;9789819794430
In traditional lexical chain extraction tasks, researchers typically focus on identifying simple lexical items based on surface grammatical relations, often overlooking compound words with underlying semantic frameworks. To address this limitation, the task of Nominal Compound Chain Extraction (NCCE) has emerged. This task aims to identify and cluster nominal compounds sharing the same semantic theme, thereby providing richer semantic information and facilitating a deeper understanding of the latent themes within documents. In this study, we fine-tune the large language model Qwen2-0.5b, employ data augmentation techniques, and introduce Chain-of-Thought (CoT) information from large models as an auxiliary aid, significantly enhancing the model's document comprehension capabilities.
Textual personality detection aims to identify personality traits by analyzing user-generated content. To achieve this effectively, it is essential to thoroughly examine user-generated content from various perspective...
ISBN:
(纸本)9789819794393;9789819794409
Textual personality detection aims to identify personality traits by analyzing user-generated content. To achieve this effectively, it is essential to thoroughly examine user-generated content from various perspectives. However, previous studies have struggled with automatically extracting and effectively integrating information from multiple perspectives, thereby limiting their performance on personality detection. To address these challenges, we propose the Multi-view Mixture-of-Experts Model for Textual Personality Detection (MvP). MvP introduces a Multi-view Mixture-of-Experts (MoE) network to automatically analyze user posts from various perspectives. Additionally, it employs User Consistency Regularization to mitigate conflicts among different perspectives and learn a multi-view generic user representation. The model's training is optimized via a multi-task joint learning strategy that balances supervised personality detection with self-supervised user consistency constraints. Experimental results on two widely-used personality detection datasets demonstrate the effectiveness of the MvP model and the benefits of automatically analyzing user posts from diverse perspectives for textual personality detection.
暂无评论