Pretrained language models (PLMs) have become remarkably adept at task and language generalization. Nonetheless, they often fail when faced with unseen languages. In this work, we present LINGUALCHEMY, a regularizatio...
详细信息
Structured generation, the process of producing content in standardized formats like JSON and XML, is widely utilized in real-world applications to extract key output information from large language models (LLMs). Thi...
详细信息
Understanding how Transformer-based language Models (LMs) learn and recall information is a key goal of the deep learning community. Recent interpretability methods project weights and hidden states obtained from the ...
详细信息
For a language model (LM) to faithfully model human language, it must compress vast, potentially infinite information into relatively few dimensions. We propose analyzing compression in (pre-trained) LMs from two poin...
详细信息
ISBN:
(纸本)9798891760608
For a language model (LM) to faithfully model human language, it must compress vast, potentially infinite information into relatively few dimensions. We propose analyzing compression in (pre-trained) LMs from two points of view: geometric and information-theoretic. We demonstrate that the two views are highly correlated, such that the intrinsic geometric dimension of linguistic data predicts their coding length under the LM. We then show that, in turn, high compression of a linguistic dataset predicts rapid adaptation to that dataset, confirming that being able to compress linguistic information is an important part of successful LM performance. As a practical byproduct of our analysis, we evaluate a battery of intrinsic dimension estimators for the first time on linguistic data, showing that only some encapsulate the relationship between information-theoretic compression, geometric compression, and ease-of-adaptation.
Loss spikes, a phenomenon in which the loss value diverges suddenly, is a fundamental issue in the pre-training of large language models. This paper supposes that the non-uniformity of the norm of the parameters is on...
详细信息
Large language Models (LLMs) have garnered significant attention due to their remarkable ability to process information across various languages. Despite their capabilities, they exhibit inconsistencies in handling id...
详细信息
In the rapidly evolving domain of naturallanguage Generation (NLG) evaluation, introducing Large language Models (LLMs) has opened new avenues for assessing generated content quality, e.g., coherence, creativity, and...
详细信息
The capabilities of large language models (LLMs) have raised concerns about their potential to create and propagate convincing ***, we study their performance in detecting convincing arguments to gain insights into LL...
详细信息
Mitigating explicit and implicit biases in Large language Models (LLMs) has become a critical focus in the field of naturallanguageprocessing. However, many current methodologies evaluate scenarios in isolation, wit...
详细信息
Mongolian fixed phrase recognition is one of the most fundamental tasks in Mongolian naturallanguageprocessing, and its main purpose is to identify the fixed phrase boundaries and types with specific meanings in Mon...
详细信息
ISBN:
(纸本)9798331540869;9798331540852
Mongolian fixed phrase recognition is one of the most fundamental tasks in Mongolian naturallanguageprocessing, and its main purpose is to identify the fixed phrase boundaries and types with specific meanings in Mongolian. We introduce the concept, classification, and evaluation of Mongolian fixed phrases at first. Secondly, according to the research process of Mongolian fixed phrase recognition, methods are classified into two categories: dictionary-based and rule-based methods in the past, which are summarized. A detailed comparative analysis reveals the effectiveness and limitations of these methods in practical applications. Then, due to the lack of literature and research, we implement three Mongolian fixed phrase recognition models based on sequences labeling, and briefly introduce the development of Mongolian named entity recognition based on deep learning. Finally, the research trends of Mongolian fixed phrase recognition are discussed to provide some reference for the proposal of new methods and future research directions. Due to the lack of reference studies, we implement a sequences labeling-based model using the available corpus and conduct a discussion and analysis.
暂无评论