Efficient Transformer models typically employ local and global attention methods, or utilize hierarchical or recurrent architectures, to process long text inputs in natural language processing tasks. However, these mo...
详细信息
Efficient Transformer models typically employ local and global attention methods, or utilize hierarchical or recurrent architectures, to process long text inputs in natural language processing tasks. However, these models face challenges in terms of sacrificing either efficiency, accuracy, or compatibility to develop their application in longer sequences. To maintain both the accuracy of global attention and the efficiency of local attention, while keeping a good compatibility to be easily applied to an existing pre-trained model, in this paper, we propose multi-level local attention (Mulla attention), which is a hierarchical local attention that acts on both the input sequence and multiple pooling sequences of different granularity simultaneously, thus performing long-range modeling while maintaining linear or log-linear complexity. We apply Mulla attention to LongT5 and implement our LongT5-Mulla sequence-to-sequence model, without introducing new parameters except for positional embeddings. Experiments show that our model can surpass all baseline models, including two original variants of LongT5, in the 8 similar to 16k-input long text summarization task on the Multi-News, arXiv and WCEP-10 datasets, with improvements of at least +0.22, +0.01, +0.52 percentage points (pp) averaged Rouge scores respectively, while at the meantime being able to effectively process longer sequences that have 16 similar to 48k tokens with at least 52.6% lower memory consumption than LongT5-tglobal, and +0.56 similar to 1.62 pp averaged Rouge scores higher than LongT5-local. These results demonstrate that our proposed LongT5-Mulla model can effectively process long sequences and extend the maximum input length for long text tasks from 16k to 48k while maintaining accuracy and efficiency.
Due to the data inefficiency and low speech quality of grapheme-based end-to-end text-to-speech (TTS), having a separate high-performance TTS linguistic frontend is still commonly regarded as necessary. However, a TTS...
详细信息
Due to the data inefficiency and low speech quality of grapheme-based end-to-end text-to-speech (TTS), having a separate high-performance TTS linguistic frontend is still commonly regarded as necessary. However, a TTS frontend is itself difficult to build and maintain, since it requires abundant linguistic knowledge for its construction. In this article, we start by bootstrapping an integrated sequence-to-sequence (Seq2Seq) TTS frontend using a pre-existing pipeline-based frontend and large amounts of unlabelled normalized text, achieving promising memorization and generalisation abilities. To overcome the performance limitation imposed by the pipeline-based frontend, this work proposes a Forced Alignment (FA) method to decode the pronunciations from transcribed speech audio and then use them to update the Seq2Seq frontend. Our experiments demonstrate the effectiveness of our proposed FA method, which can significantly improve the word token accuracy from 52.6% to 91.2% for out-of-dictionary words. In addition, it can also correct the pronunciation of homographs from transcribed speech audio and potentially improve the homograph disambiguation performance of the Seq2Seq frontend.
Conversational Question Generation (CQG) aims to generate conversational questions with the given passage and conversa-tion history. Previous work of CQG presumes a contiguous span as the answer and generates a questi...
详细信息
Conversational Question Generation (CQG) aims to generate conversational questions with the given passage and conversa-tion history. Previous work of CQG presumes a contiguous span as the answer and generates a question targeting it. However, this limits the application scenarios because answers in practical conversations are usually abstractive free-form text instead of extractive spans. In addition, most state-of-the-art CQG systems are based on pretrained language models consisting of hundreds of millions of parameters, bringing challenges to real-life applications due to latency and capacity constraints. To elegantly address these problems, in this work, we introduce the Tiny Answer-Guided Network (TAGNET) based on the lightweight module (Bi-LSTM) for CQG. We explicitly take the target answers as input, which interacts with the passages and conversation history in the encoder and guides the question generation through the gated attention mechanism in the decoder. Besides, we distill the knowledge from larger pretrained language models into our smaller network to make the trade-off between performance and efficiency. Experimental results show that our TAGNET achieves a comparable perfor-mance with large pretrained language models (retaining 95.9% of teacher performance) while using 5.7x fewer parameters and 10.4x faster inference latency. TAGNET outperforms the previous best-performing model with similar parameter size by a large margin, and further analysis shows that TAGNET generates more answer-specific conversational questions.
Adverse drug reactions (ADRs), which are harmful physical reactions of patients to drug treatments, are inherent to the nature of drugs;the reactions can occur with any drug and are becoming a leading cause of patient...
详细信息
Adverse drug reactions (ADRs), which are harmful physical reactions of patients to drug treatments, are inherent to the nature of drugs;the reactions can occur with any drug and are becoming a leading cause of patient morbidity and mortality during medical procedures. ADRs can be hazardous and even fatal to patients. In traditional methods, ADRs are detected through clinical trials. To obtain a comprehensive collection of ADRs, sufficient experimental samples and time are required before a drug comes to the market, which is not a realistic possibility. Moreover, even if extensive clinical trials are performed, many undetected ADRs might still be discovered after a drug is released to the market. ADRs can lead to disastrous consequences for humanity, which obviates a dramatically increased need for precise predictions of potential ADRs as early as possible. In this paper, we propose an encoder-decoder framework based on attention mechanism and the long short-term memory (LSTM) model to predict potential ADRs. We regard the prediction of ADRs as a sequence-to-sequence problem and improve the encoder-decoder framework based on the attention mechanism to learn the interrelationships between ADRs. Unlike other classical methods utilizing molecular drug structures, our model is based solely on ADRs, which is an independent but parallel approach compared to traditional methods. We capitalize on the mask method to generate the target data and use the 5-fold cross-validation method to cyclically verify the performance of our proposed model. Based on the Top-k accuracy test results, our model outperforms the baseline models in potential ADRs predictions.
Previous work in slogan generation focused on utilising slogan skeletons mined from existing slogans. While some generated slogans can be catchy, they are often not coherent with the company's focus or style acros...
详细信息
Previous work in slogan generation focused on utilising slogan skeletons mined from existing slogans. While some generated slogans can be catchy, they are often not coherent with the company's focus or style across their marketing communications because the skeletons are mined from other companies' slogans. We propose a sequence-to-sequence (seq2seq) Transformer model to generate slogans from a brief company description. A naive seq2seq model fine-tuned for slogan generation is prone to introducing false information. We use company name delexicalisation and entity masking to alleviate this problem and improve the generated slogans' quality and truthfulness. Furthermore, we apply conditional training based on the first words' part-of-speech tag to generate syntactically diverse slogans. Our best model achieved a ROUGE-1/-2/-L F-1 score of 35.58/18.47/33.32. Besides, automatic and human evaluations indicate that our method generates significantly more factual, diverse and catchy slogans than strong long short-term memory and Transformer seq2seq baselines.
Named entity recognition (NER) is a fundamental task for natural language processing, which aims to detect mentions of real-world entities from text and classifying them into predefined types. Recently, research on ov...
详细信息
Named entity recognition (NER) is a fundamental task for natural language processing, which aims to detect mentions of real-world entities from text and classifying them into predefined types. Recently, research on overlapped and discontinuous named entity recognition has received increasing attention. However, we note that few studies have considered both overlapped and discontinuous entities. In this paper, we proposed a novel sequence-to-sequence model that is capable of recognizing both overlapped and discontinuous entities based on machine reading comprehension. The model utilizes machine reading comprehension formulation to encode significant inferior information about the entity category. Then input sequence passes through a question-answering model to predict the mention relevance of the given source sentences to the query. Finally, we incorporate the mention relevance into the BART-based generation model. We conducted experiments on three type of NER datasets to show the generality of our model. The experimental results demonstrate that our model beats almost all the current top-performing baselines achieves a vast amount of performance boost over current SOTA models on overlapped and discontinuous NER datasets.
The dialogue data usually consist of the pairs of a query and its response, but no previous response generators have exploited the responses explicitly in their training while a response provides significant informati...
详细信息
The dialogue data usually consist of the pairs of a query and its response, but no previous response generators have exploited the responses explicitly in their training while a response provides significant information about the meaning of a query. Therefore, this paper proposes a sequence-to-sequence response generator with a response-aware encoder. The proposed generator exploits golden responses by reflecting them into query representation. For this purpose, the response-aware encoder adds a relevancy scorer layer to the transformer encoder that calculates the relevancy of query tokens to a response. However, golden responses are available only during training of the response generator and unavailable at inference time. As a solution to this problem, the joint learning of a teacher and a student relevancy scorer is adopted. That is, at the training time, both the teacher and the student relevancy scorers are optimized but the decoder generates a response using only the relevancy of the teacher scorer. However, at the inference time, the decoder uses that of the student scorer. Since the student scorer is trained to minimize the difference from the teacher scorer, it can be used to compute the relevancy of a prospective response. The proposed model is the first attempt to use a golden response directly for generating a query representation, whereas previous studies used the responses for its implicit and indirect reflection. As a result, it achieved higher dialogue evaluation score than the current state-of-the-art model for Reddit, Persona-Chat, and DailyDialog data sets.
Machine translation is the process of using computers to convert one natural language into another natural language, shouldering the important task of building a language communication bridge. It has always been a con...
详细信息
ISBN:
(纸本)9798350349122;9798350349115
Machine translation is the process of using computers to convert one natural language into another natural language, shouldering the important task of building a language communication bridge. It has always been a concern research direction in natural language processing. As the latest paradigm of machine translation, neural machine translation completely relies on a neural network to execute the translation process from the source language to the target language. Thanks to the development of artificial intelligence, rich research results have been achieved in recent years, effectively alleviating the bottleneck problem of statistical machine translation. This paper first compares neural machine translation with other machine translation methods, then introduces the mainstream neural machinetranslation models, and finally introduces the problems and challenges faced by neural machine translation.
Chinese Spelling Check (CSC) and Chinese Grammatical Error Correction (CGEC) are two important and challenging tasks in the Natural Language Processing (NLP) field. The former aims to detect and correct Chinese misspe...
详细信息
ISBN:
(纸本)9798350359329;9798350359312
Chinese Spelling Check (CSC) and Chinese Grammatical Error Correction (CGEC) are two important and challenging tasks in the Natural Language Processing (NLP) field. The former aims to detect and correct Chinese misspellings while the latter focuses on grammatical errors in sentences. Existing methods treat them as two separate tasks, sequence labeling, and conditional text generation respectively. As a consequence, a single encoder is typically selected as the backbone network to handle the CSC task whereas an encoder-decoder structure becomes a requisite for the CGEC task. However, in real-world applications, it is inefficient for a system to determine whether an input sentence contains spelling or grammatical errors and subsequently select different models according to the decision from the previous step. In this paper, to address these two tasks effectively, we propose a unified approach, denoted as UCSC-CGEC, based on a standard Transformer encoder-decoder structure. Notably, we choose to use a recent dataset named CSCD-IME instead of SIGHAN to ensure higher data quality in the CSC task. Additionally, to reduce the training difficulty and enhance generation quality, we introduce Copy Mechanism. Furthermore, to improve training efficiency and reduce cost, we adopt AdaLoRA, a Parameter-Efficient Fine-Tuning (PEFT) method, rather than fine-tuning the model with entire parameter set during the training phase. Experiments are conducted on CSCD-IME and NLPCC2018 datasets, and the results indicate the superiority of our approach when compared to all baseline models.
This paper presents a respiratory sound compression and reconstruction method based on convolutional Auto-Encoder. By utilizing convolutional and transpose convolutional layers, this model can process variable length ...
详细信息
ISBN:
(纸本)9798350354966;9798350354959
This paper presents a respiratory sound compression and reconstruction method based on convolutional Auto-Encoder. By utilizing convolutional and transpose convolutional layers, this model can process variable length sound waveform, which is an important feature for data transmission from edge-based medical devices to cloud server and reconstruct the signal with high fidelity. This work shows that utilizing a non-variational latent space in respiratory sounds compression generates smaller reconstruction error compared to other state-of-art solution. Additionally, this work proposes a new composite loss function to guide the network training. Tested with BioCAS 2024 Grand Challenge dataset, this method achieves a Percent Root Mean Square Difference (PRD) of 0.2230, Correlation Coefficient (CC) of 0.972, and Signal-to-Noise Ratio Loss (SNRL) -0.7129 dB with a compression rate of 222.
暂无评论