Autism is one of the neurological disorders that occurs in children. There are many causes of autism, one of which is genetic factors. Therefore, in order to find effective treatments, we need to discover the genes wh...
详细信息
ISBN:
(纸本)9781728130033
Autism is one of the neurological disorders that occurs in children. There are many causes of autism, one of which is genetic factors. Therefore, in order to find effective treatments, we need to discover the genes which relate to autism disease. In this paper, we use a computational approach to train a model that can predict new autism-related candidate genes. The methodology combines different data sources such as protein-protein interaction networks, microRNAs (miRNA)target network and known autism-related genes into an integrated network. The structural properties of this network are represented as a vector dataset and a binary classification problem is formulated. However, because the number of known autism-related genes is very small, we face an imbalance data classification problem. To solve this issue, an under-samplingclustering-based data balancing algorithm has been proposed. Training classifiers with machine learning models such as SVMs, k-NN, and RFs, we obtained results of 1-3% higher in G-mean measures when comparing to cases without using any data balancing strategies. These results implied that our proposed model may contribute to finding new autism-related gene candidates.
The COVID-19 pandemic has been a global public health concern since March 11, 2020. Healthcare systems struggled to meet patients' growing needs for diagnosis, treatment, and care. As healthcare industries struggl...
详细信息
The COVID-19 pandemic has been a global public health concern since March 11, 2020. Healthcare systems struggled to meet patients' growing needs for diagnosis, treatment, and care. As healthcare industries struggled to cope with the overwhelming demands, advanced intelligence and computing technologies have become essential. Artificial intelligence techniques have become essential for identifying and triaging patients, predicting disease severity, and detecting outcomes. The aim of the paper is to propose a gradient boosting-based model to predict the mortality of COVID-19 patients and to improve the prediction accuracy by incorporating resampling strategies. A real COVID-19 data that includes patients' travel, health, geographical, and demographic information is obtained from a public repository. The dataset used in the study has the class imbalance problem, and several approaches are applied to solve the problem. In this study, a gradient boosting-based model for predicting the mortality of COVID-19 patients is proposed. This approach incorporates resampling strategies, such as synthetic minority oversampling technique (SMOTE), random under-sampling, and clustering-based under-sampling, to address the imbalanced class distribution problem in the dataset. Then, gradient boosting machines (GBM) such as extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and categorical boosting (CatBoost) are analyzed in terms of accuracy and computational time. Random search method is used to find the optimal hyper-parameters for the algorithms. A stacking-based hybrid model that combines the XGBoost, LightGBM, and CatBoost algorithms was used for comparison in the experiments. In the experiments, the factors that can influence the mortality of COVID-19 patients are investigated. And, it is found that the age of the patient, whether the patient belonged to Wuhan, the difference between when they first noticed symptoms and when they visited the hospital
A clustering-basedundersampling (CUS) and distance-based near-miss method are widely used in current imbalanced learning algorithms, but this method has certain drawbacks. In particular, the CUS does not consider the...
详细信息
A clustering-basedundersampling (CUS) and distance-based near-miss method are widely used in current imbalanced learning algorithms, but this method has certain drawbacks. In particular, the CUS does not consider the influence of the distance factor on the majority of instances, and the near-miss method omits the inter-class(es) within the majority of samples. To overcome these drawbacks, this study proposes an undersampling method combining distance measurement and majority class clustering. Resampling methods are used to develop an ensemble-based imbalanced-learning algorithm called the clustering and distance-based imbalance learning model (CDEILM). This algorithm combines distance-basedundersampling, feature selection, and ensemble learning. In addition, a cluster size-based resampling (CSBR) method is proposed for preserving the original distribution of the majority class, and a hybrid imbalanced learning framework is constructed by fusing various types of resampling methods. The combination of CDEILM and CSBR can be considered as a specific case of this hybrid framework. The experimental results show that the CDEILM and CSBR methods can achieve better performance than the benchmark methods, and that the hybrid model provides the best results under most circumstances. Therefore, the proposed model can be used as an alternative imbalanced learning method under specific circumstances, e.g., for providing a solution to credit evaluation problems in financial applications.
暂无评论