Detecting plagiarism in documents is a well-established task in natural language processing (NLP). Broadly, plagiarism detection is categorized into two types (1) intrinsic: to check the whole document or all the pass...
详细信息
Detecting plagiarism in documents is a well-established task in natural language processing (NLP). Broadly, plagiarism detection is categorized into two types (1) intrinsic: to check the whole document or all the passages have been written by a single author;(2) extrinsic: where a suspicious document is compared with a given set of source documents to figure out sentences or phrases which appear in both documents. In the pursuit of advancing intrinsic plagiarism detection, this study addresses the critical challenge of intrinsic plagiarism detection in Urdu texts, a language with limited resources for comprehensive language models. Acknowledging the absence of sophisticated large language models (LLMs) tailored for Urdu language, this study explores the application of various machine learning, deep learning, and language models in a novel framework. A set of 43 stylometry features at six granularity levels was meticulously curated, capturing linguistic patterns indicative of plagiarism. The selected models include traditional machine learning approaches such as logistic regression, decision trees, SVM, KNN, Naive Bayes, gradient boosting and voting classifier, deep learning approaches: GRU, BiLSTM, CNN, LSTM, MLP, and large language models: BERT and GPT-2. This research systematically categorizes these features and evaluates their effectiveness, addressing the inherent challenges posed by the limited availability of Urdu-specific language models. Two distinct experiments were conducted to evaluate the impact of the proposed features on classification accuracy. In experiment one, the entire dataset was utilized for classification into intrinsic plagiarized and non-plagiarized documents. Experiment two categorized the dataset into three types based on topics: moral lessons, national celebrities, and national events. Both experiments are thoroughly evaluated through, a fivefold cross-validation analysis. The results show that the random forest classifier achieved an ex
The identification of separable nonlinear models, prevalent in tasks such as signal analysis, image processing, time series analysis, and machine learning, presents a non-convex optimization challenge that necessitate...
详细信息
1 Introduction On-device deep learning(DL)on mobile and embedded IoT devices drives various applications[1]like robotics image recognition[2]and drone swarm classification[3].Efficient local data processing preserves ...
详细信息
1 Introduction On-device deep learning(DL)on mobile and embedded IoT devices drives various applications[1]like robotics image recognition[2]and drone swarm classification[3].Efficient local data processing preserves privacy,enhances responsiveness,and saves ***,current ondevice DL relies on predefined patterns,leading to accuracy and efficiency *** is difficult to provide feedback on data processing performance during the data acquisition stage,as processing typically occurs after data acquisition.
Disguised face identification is challenging since people cover their identities by wearing masks, hats, sunglasses, or other disguises. These disguises dramatically modify face features, making identifying individual...
详细信息
The deaf and mute population has difficulty conveying their thoughts and ideas to others. Sign language is their most expressive mode of communication, but the general public is callow of sign language;therefore, the ...
详细信息
A graph G is k list equitably colorable, if for any given k-uniform list assignment L, G is L-colorable and each color appears on at most ⌈|V(G)|k⌉ vertices. Kostochka et al. conjectured that if G is a connected graph...
详细信息
Low back pain is a leading cause of disability globally, is often associated with degenerative lumbar spine conditions. Accurate diagnosis of these conditions is critical but challenging due to the subjective nature o...
详细信息
Text-to-SQL is a technology that converts natural language questions into executable SQL queries, allowing users to query and manage relational databases more easily. In recent years, large language models have signif...
详细信息
Colonoscopy is vital for detecting colorectal polyps, which are closely linked to colorectal cancer. Accurate segmentation of polyps in colonoscopic images is essential for diagnosis and surgical planning but is chall...
详细信息
暂无评论