Multi-distribution or collaborative learning involves learning a single predictor that works well across multiple data distributions, using samples from each during training. Recent research on multi-distribution lear...
Background: Cervical cancer is the fourth most frequent cancer in women worldwide. Even though cervical cancer deaths have decreased significantly in Western countries, low and middle-income countries account for near...
详细信息
When there are outliers or heavy-tailed distributions in the data, the traditional least squares with penalty function is no longer applicable. In addition, with the rapid development of science and technology, a lot ...
详细信息
When there are outliers or heavy-tailed distributions in the data, the traditional least squares with penalty function is no longer applicable. In addition, with the rapid development of science and technology, a lot of data, enjoying high dimension, strong correlation and redundancy, has been generated in real life. So it is necessary to find an effective variable selection method for dealing with collinearity based on the robust method. This paper proposes a penalized M-estimation method based on standard error adjusted adaptive elastic-net, which uses M-estimators and the corresponding standard errors as weights. The consistency and asymptotic normality of this method are proved theoretically. For the regularization in high-dimensional space, the authors use the multi-step adaptive elastic-net to reduce the dimension to a relatively large scale which is less than the sample size, and then use the proposed method to select variables and estimate parameters. Finally, the authors carry out simulation studies and two real data analysis to examine the finite sample performance of the proposed method. The results show that the proposed method has some advantages over other commonly used methods.
Human activity recognition (HAR) techniques pick out and interpret human behaviors and actions by analyzing data gathered from various sensor devices. HAR aims to recognize and automatically categorize human activitie...
详细信息
When ensuring the reliability of device or the suitability of a material, it is necessary to take into consideration the stress cases in the operating environment. This means that the uncertainty about the reality env...
详细信息
Detecting plagiarism in documents is a well-established task in natural language processing (NLP). Broadly, plagiarism detection is categorized into two types (1) intrinsic: to check the whole document or all the pass...
详细信息
Detecting plagiarism in documents is a well-established task in natural language processing (NLP). Broadly, plagiarism detection is categorized into two types (1) intrinsic: to check the whole document or all the passages have been written by a single author;(2) extrinsic: where a suspicious document is compared with a given set of source documents to figure out sentences or phrases which appear in both documents. In the pursuit of advancing intrinsic plagiarism detection, this study addresses the critical challenge of intrinsic plagiarism detection in Urdu texts, a language with limited resources for comprehensive language models. Acknowledging the absence of sophisticated large language models (LLMs) tailored for Urdu language, this study explores the application of various machine learning, deep learning, and language models in a novel framework. A set of 43 stylometry features at six granularity levels was meticulously curated, capturing linguistic patterns indicative of plagiarism. The selected models include traditional machine learning approaches such as logistic regression, decision trees, SVM, KNN, Naive Bayes, gradient boosting and voting classifier, deep learning approaches: GRU, BiLSTM, CNN, LSTM, MLP, and large language models: BERT and GPT-2. This research systematically categorizes these features and evaluates their effectiveness, addressing the inherent challenges posed by the limited availability of Urdu-specific language models. Two distinct experiments were conducted to evaluate the impact of the proposed features on classification accuracy. In experiment one, the entire dataset was utilized for classification into intrinsic plagiarized and non-plagiarized documents. Experiment two categorized the dataset into three types based on topics: moral lessons, national celebrities, and national events. Both experiments are thoroughly evaluated through, a fivefold cross-validation analysis. The results show that the random forest classifier achieved an ex
Generating cover photos from story text is a non trivial challenge to solve. Existing approaches focus on generating only images from given text prompt. To the best of our knowledge, non of these approaches focus on g...
详细信息
Colorectal cancer is one of the most prevalent cancers in the world. It illustrates the effectiveness of early detection and treatment of precursor polyps to prevent progression to malignancy. Despite the pivotal role...
详细信息
To efficiently estimate the central subspace in sufficient dimension reduction,response discretization via slicing its range is one of the most used methodologies when inverse regression-based methods are ***,existing...
详细信息
To efficiently estimate the central subspace in sufficient dimension reduction,response discretization via slicing its range is one of the most used methodologies when inverse regression-based methods are ***,existing slicing schemes are almost all ad hoc and not widely ***,how to define datadriven schemes with certain optimal properties is a longstanding problem in this *** research described here is then ***,we introduce a likelihood-ratio-based framework for dimension reduction,subsuming the popularly used methods including the sliced inverse regression,the sliced average variance estimation and the likelihood acquired ***,we propose a regularized log likelihood-ratio criterion to obtain a data-driven slicing scheme and derive the asymptotic properties of the estimators.A simulation study is carried out to examine the performance of the proposed method and that of existing methods.A data set concerning concrete compressive strength is also analyzed for illustration and comparison.
Autism spectrum disorder (ASD) affects 1 in 100 children globally. Early detection and intervention can enhance life quality for individuals diagnosed with ASD. This research utilizes the support vector machine-recurs...
详细信息
Autism spectrum disorder (ASD) affects 1 in 100 children globally. Early detection and intervention can enhance life quality for individuals diagnosed with ASD. This research utilizes the support vector machine-recursive feature elimination (SVM-RFE) method in its approach for ASD classification using the phenotypic and Automated Anatomical Labeling (AAL) Brain Atlas datasets of the Autism Brain Imaging data Exchange preprocessed dataset. The functional connectivity matrix (FCM) is computed for the AAL data, generating 6670 features representing pair-wise brain region activity. The SVM-RFE feature selection method was applied five times to the FCM data, thus determining the optimal number of features to be 750 for the best performing support vector machine (SVM) model, corresponding to a dimensionality reduction of 88.76%. Pertinent phenotypic data features were manually selected and processed. Subsequently, five experiments were conducted, each representing a different combination of the features used for training and testing the linear SVM, deep neural networks, one-dimensional convolutional neural networks, and random forest machine learning models. These models are fine-tuned using grid search cross-validation (CV). The models are evaluated on various metrics using 5-fold CV. The most relevant brain regions from the optimal feature set are identified by ranking the SVM-RFE feature weights. The SVM-RFE approach achieved a state-of-the-art accuracy of 90.33% on the linear SVM model using the data Processing Assistant for Resting-State Functional Magnetic Resonance Imaging pipeline. The SVM model’s ability to rank the features used based on their importance provides clarity into the factors contributing to the diagnosis. The thalamus right, rectus right, and temporal middle left AAL brain regions, among others, were identified as having the highest number of connections to other brain regions. These results highlight the importance of using traditional ML models fo
暂无评论