This paper investigates explainability in natural Legal languageprocessing (NLLP). We study the task of legal outcome prediction of the European Court of Human Rights cases in a ternary classification setup, where a ...
详细信息
In this work, we systematically expose and measure the inconsistency and knowledge gaps of Large language Models (LLMs).Specifically, we propose an automated testing framework (called KONTEST) which leverages a knowle...
详细信息
Recent zero-shot evaluations have highlighted important limitations in the abilities of language models (LMs) to perform meaning extraction. However, it is now well known that LMs can demonstrate radical improvements ...
详细信息
Large language models (LLMs) have seen increasing popularity in daily use, with their widespread adoption by many corporations as virtual assistants, chatbots, predictors, and many more. Their growing influence raises...
详细信息
In recent developments within the research community, the integration of Large language Models (LLMs) in creating fully autonomous agents has garnered significant interest. Despite this, LLM-based agents frequently de...
详细信息
NLPers frequently face reproducibility crisis in a comparison of various models of a real-world NLP task. Many studies have empirically showed that the standard splits tend to produce low reproducible and unreliable c...
详细信息
ISBN:
(纸本)9798891760608
NLPers frequently face reproducibility crisis in a comparison of various models of a real-world NLP task. Many studies have empirically showed that the standard splits tend to produce low reproducible and unreliable conclusions, and they attempted to improve the splits by using more random repetitions. However, the improvement on the reproducibility in a comparison of NLP models is limited attributed to a lack of investigation on the relationship between the reproducibility and the estimator induced by a splitting strategy. In this paper, we formulate the reproducibility in a model comparison into a probabilistic function with regard to a conclusion. Furthermore, we theoretically illustrate that the reproducibility is qualitatively dominated by the signal-to-noise ratio (SNR) of a model performance estimator obtained on a corpus splitting strategy. Specifically, a higher value of the SNR of an estimator probably indicates a better reproducibility. On the basis of the theoretical motivations, we develop a novel mixture estimator of the performance of an NLP model with a regularized corpus splitting strategy based on a blocked 3 x 2 cross-validation. We conduct numerical experiments on multiple NLP tasks to show that the proposed estimator achieves a high SNR, and it substantially increases the reproducibility. Therefore, we recommend the NLP practitioners to use the proposed method to compare NLP models instead of the methods based on the widely-used standard splits and the random splits with multiple repetitions.
Massively multilingual models are known to have limited utility in any one language, and to perform particularly poorly on low-resource languages. By contrast, targeted multinguality has been shown to benefit low-reso...
详细信息
language models typically tokenize raw text into sequences of subword identifiers from a predefined vocabulary, a process inherently sensitive to typographical errors, length variations, and largely oblivious to the i...
详细信息
Large language Models (LLMs) possess the potential to exert substantial influence on public perceptions and interactions with information. This raises concerns about the societal impact that could arise if the ideolog...
详细信息
暂无评论