Detecting plagiarism in documents is a well-established task in natural language processing (NLP). Broadly, plagiarism detection is categorized into two types (1) intrinsic: to check the whole document or all the pass...
详细信息
Detecting plagiarism in documents is a well-established task in natural language processing (NLP). Broadly, plagiarism detection is categorized into two types (1) intrinsic: to check the whole document or all the passages have been written by a single author;(2) extrinsic: where a suspicious document is compared with a given set of source documents to figure out sentences or phrases which appear in both documents. In the pursuit of advancing intrinsic plagiarism detection, this study addresses the critical challenge of intrinsic plagiarism detection in Urdu texts, a language with limited resources for comprehensive language models. Acknowledging the absence of sophisticated large language models (LLMs) tailored for Urdu language, this study explores the application of various machine learning, deep learning, and language models in a novel framework. A set of 43 stylometry features at six granularity levels was meticulously curated, capturing linguistic patterns indicative of plagiarism. The selected models include traditional machine learning approaches such as logistic regression, decision trees, SVM, KNN, Naive Bayes, gradient boosting and voting classifier, deep learning approaches: GRU, BiLSTM, CNN, LSTM, MLP, and large language models: BERT and GPT-2. This research systematically categorizes these features and evaluates their effectiveness, addressing the inherent challenges posed by the limited availability of Urdu-specific language models. Two distinct experiments were conducted to evaluate the impact of the proposed features on classification accuracy. In experiment one, the entire dataset was utilized for classification into intrinsic plagiarized and non-plagiarized documents. Experiment two categorized the dataset into three types based on topics: moral lessons, national celebrities, and national events. Both experiments are thoroughly evaluated through, a fivefold cross-validation analysis. The results show that the random forest classifier achieved an ex
It has been widely proven that Augmented Reality (AR) brings numerous benefits in learning experiences, including enhancing learning outcomes and motivation. However, not many studies investigate how different forms o...
详细信息
The manual analysis of job resumes poses specific challenges, including the time-intensive process and the high likelihood of human error, emphasizing the need for automation in content-based recommendations. Recent a...
详细信息
Internet of Things (IoT) devices are typically powered by small-sized batteries with limited energy storage capacity, requiring regular replacement or recharging. To reduce costs and maintain connectivity in IoT netwo...
详细信息
Zero Trust Architecture (ZTA) is one of the paradigm changes in cybersecurity, from the traditional perimeter-based model to perimeterless. This article studies the core concepts of ZTA, its beginning, a few use cases...
The detection of skin cancer holds paramount importance worldwide due to its impact on global health. While deep convolutional neural networks (DCNNs) have shown potential in this domain, current approaches often stru...
详细信息
The steep technological and performance advances in GPU cards have led to their increasing use in data centers in the recent years, especially in machine learning jobs. However, high hardware performance alone does no...
详细信息
Traditional rule-based Intrusion Detection Systems (IDS) are commonly employed owing to their simple design and ability to detect known threats. Nevertheless, as dynamic network traffic and a new degree of threats exi...
详细信息
Reliable artificial intelligence (AI) systems not only propose a challenge on providing intelligent services with high quality for customers but also require customers' privacy to be protected as much as possible ...
详细信息
Generic Code Clone Detection (GCCD) is a code clone detection model that use distance measure equation, enabling detection of all types of code clones, naming clone Type-1, Type-2, Type-3 and Type-4 in Java programmin...
详细信息
暂无评论