Rough set is an important method for dealing with incomplete information systems. In incomplete information systems, the most common way to determine the relation between two samples is the tolerance relation. However...
详细信息
Rough set is an important method for dealing with incomplete information systems. In incomplete information systems, the most common way to determine the relation between two samples is the tolerance relation. However, the condition for the tolerance relation to determine those samples may belong to the same category is very lenient, which makes the reduction rate low when using the rough set generated by this relation to select features. In response to the above problems, we design the neighborhood equivalence tolerance relation to solve them. Different from other improved tolerance relations, firstly, the relation designed in this paper does not require additional threshold to accomplish the above goals, which will avoid the trouble caused by the given threshold. Secondly, we notice that most of the current improvements for this kind of problems are computationally cumbersome, and the relation designed in this paper is simple and effective. Based on this, we construct a neighborhood rough set model that handles incomplete information by using this relation, introduce its properties, expound the properties that a reduction set should satisfy, quantify the importance of conditional attributes with attribute dependence degree, which provides the basis for the design of feature selection algorithm. Finally, the greedy strategy is used to design a forward feature selection algorithm. Experimental results show that the model is effective in dealing with incomplete information systems. The feature selection algorithm has the smallest size of the average reduced subset on twelve datasets, and maintains the accuracy of the classifier, which verifies that the feature selection algorithm can effectively deal with incomplete information systems.
The rapid progress in fields such as data mining and machine learning, as well as the explosive growth of sports big data, have posed new challenges to the research of sports big data. Most of the available sports dat...
详细信息
The rapid progress in fields such as data mining and machine learning, as well as the explosive growth of sports big data, have posed new challenges to the research of sports big data. Most of the available sports data mining techniques concentrates on extracting and constructing effective features for basic sports data, which cannot be achieved simply by using data statistics. Especially in the targeted mining of sports data, traditional mining techniques still have shortcomings such as low classification accuracy and insufficient refinement. In order to solve the problem of low accuracy in traditional mining methods, the study combines the random forest algorithm with the artificial raindrop algorithm, and adopts a sports data mining method based on featureselection to achieve effective analysis of sports big data. This study is based on the evaluation method of motion effects using random forests, and uses feature extraction algorithms to study the motion effect impacts. It uses the information gain index to rank the importance of features and accurately gain the degree of influence of exercise on various indicators of the human body. Through simulation verification, the algorithm proposed by the research institute performs the best in accuracy and FI scores on the training and testing sets, with accuracies of 0.849 +/- 0.021 and 0.819 +/- 0.022, respectively, and F1 scores of 0.837 +/- 0.020 and 0.864 +/- 0.021, respectively. This indicates that the algorithm proposed by the research institute has high classification accuracy and performance proves that the Random Forest-based feature selection algorithm established in this study is superior to the existing traditional feature extraction and extraction methods in terms of both performance and accuracy. The proposal of this data analysis method has achieved accurate and efficient utilization of sports big data, which is of great significance for the development of the sports education industry.
The spatial distribution of soil organic matter (SOM) is highly significant to the assessment of the regional carbon balance, food security and cultivated land quality. Due to climate change and the increasing food de...
详细信息
The spatial distribution of soil organic matter (SOM) is highly significant to the assessment of the regional carbon balance, food security and cultivated land quality. Due to climate change and the increasing food demand, the intensity of cultivated land development in the Northeast China black soil region is increasing, and it is urgent to accurately map the SOM content in this region. Remote sensing technology has been widely applied in the field of soil mapping, but large-scale and high-precision soil mapping remains a significant challenge. In this study, the Google Earth Engine (GEE) platform is adopted to generate synthetic soil images based on Landsat-8 and Sentinel-2 images capturing bare soil periods at 20-d intervals. Then, the spectral index and band are adopted as input variables to evaluate the prediction accuracy of these synthetic images depicting different periods using random forest (RF) regression. Finally, two featureselection methods (Boruta and recursive feature elimination (RFE)) are employed to evaluate the performance of these two methods. The results indicate that 1) the optimal time window for SOM prediction is day of year (DOY) 120-140 for the Songnen Plain;2) the performance of SOM prediction based on Landsat-8 synthetic images is better than that based on Sentinel-2 synthetic images;and 3) both featureselection methods improve the SOM prediction accuracy, but RFE has the highest accuracy(Landsat-8 with Coefficient of Determination (R-2) of 0.702, Root Mean Square Error (RMSE) of 0.681%;Sentinel-2 with R-2 of 0.5963, RMSE of 0.793%). This study provides a new model for large-scale and high-spatial resolution SOM prediction and verifies the importance of the time window to the SOM prediction accuracy.
feature selection algorithms are the cornerstone of machine learning. By increasing the properties of the samples and samples, the feature selection algorithm selects the significant features. The general name of the ...
详细信息
feature selection algorithms are the cornerstone of machine learning. By increasing the properties of the samples and samples, the feature selection algorithm selects the significant features. The general name of the methods that perform this function is the feature selection algorithm. The general purpose of feature selection algorithms is to select the most relevant properties of data classes and to increase the classification performance. Thus, we can select features based on their classification performance. In this study, we have developed a feature selection algorithm based on decision support vectors classification performance. The method can work according to two different selection criteria. We tested the classification performances of the features selected with P-Score with three different classifiers. Besides, we assessed P-Score performance with 13 feature selection algorithms in the literature. According to the results of the study, the P-Score feature selection algorithm has been determined as a method which can be used in the field of machine learning. (C) 2020 AGBM. Published by Elsevier Masson SAS. All rights reserved.
Researchers use different methods to investigate and quantify clay minerals. X-ray diffraction is a common and widespread approach for clay mineralogy investigation, but is time-consuming and expensive, especially in ...
详细信息
Researchers use different methods to investigate and quantify clay minerals. X-ray diffraction is a common and widespread approach for clay mineralogy investigation, but is time-consuming and expensive, especially in highly calcareous soils. The aim of this research was prediction of clay minerals in calcareous soils of southern Iran using a feature selection algorithm and adaptive neuro-fuzzy inference system (ANFIS) methods. Fifty soil samples from different climatic regions of southern Iran were collected and different climatic, soil properties and clay minerals were determined using X-ray diffraction. feature selection algorithms were used for selection of the best feature subset for prediction of clay mineral types along with two sets of training and testing data. Results indicated that the best feature subset by Best-First for prediction of illite was cation exchange capacity (CEC), sand, total potassium, silt and agroclimatic index (correlation coefficient (R) = 0.99 for training and testing data);for smectite was precipitation, temperature, evapotranspiration and CEC (R = 0.89 and 0.87 for training and testing data respectively);and for palygorskite was precipitation, temperature, evapotranspiration and calcium carbonate equivalent (CCE) (R = 0.98 for training and testing data). An attempt was made to predict clay minerals type by ANFIS using selected data from the feature selection algorithm. The evaluation of method by calculating root mean square error (RMSE), mean absolute error (MAE) and R indicated that the ANFIS method may be suitable for illite, chlorite, smectite and palygorskite prediction (RMSE, MAE and R of 0.001-0.028, 0.004-0.012 and 0.67-0.89 respectively for training and testing data). Comparison of data for all clay minerals showed that ANFIS method did not predict illite and chlorite as well as other minerals in the studied soils.
The value of schooling and academic performance of student is the topmost priority of all academic institutions. Educational Data Mining (EDM) is an evolving area of research which aids academic institutions to enhanc...
详细信息
ISBN:
(纸本)9781728107554
The value of schooling and academic performance of student is the topmost priority of all academic institutions. Educational Data Mining (EDM) is an evolving area of research which aids academic institutions to enhance their student's performances. feature selection algorithms eradicates inapt and unrelated data from the dataset, thereby increasing the classifiers performances that are utilized in EDM. This aim of this paper is to evaluate the performance of students utilizing a heuristic technique known as Differential Evolution for feature selection algorithms on the dataset of students and some other feature selection algorithms have also been used which have never been used before on the dataset. Also, classification techniques such as Naive Bayes (NB), Decision Tree (DT), K-Nearest Neighbor (KNN) and Discriminant Analysis (DISC) were used to evaluate. The Differential Evolution (DE) algorithm is proposed as a better feature selection algorithm for evaluating the academic performance of students and this gave a better accuracy than other feature selection algorithm that were used. The outcome of the different feature selection algorithms and classification techniques will help researchers to find the finest combinations of the classifiers and feature selection algorithms. This paper is a step towards playing an important role in enhancing the standard of education in academic institutions and also to carefully guide researchers in strategically interfering in academic issues.
Student's academic performance is the main focus of all educational institutions. Educational Data Mining (EDM) is an emerging research area help the educational institutions to improve the performance of their st...
详细信息
ISBN:
(纸本)9781538607909
Student's academic performance is the main focus of all educational institutions. Educational Data Mining (EDM) is an emerging research area help the educational institutions to improve the performance of their students. featureselection (FS) algorithms remove irrelevant data from the educational dataset and hence increases the performance of classifiers used in EDM techniques. This paper present an analysis of the performance of feature selection algorithms on student data set. The obtained results of the different FS algorithms and classifiers will also help the new researchers in finding the best combinations of FS algorithms and classifiers. Selecting relevant features for student prediction model is very sensitive issue for educational stakeholders, as they have to take decisions on the basis of results of prediction models. Furthermore our paper is an attempt of playing a positive role in the improvement of education quality, as well as guides new researchers in making academic intervention.
Prediction plays a vital role in decision making. Correct prediction leads to right decision making to save the life, energy,efforts, money and time. The right decision prevents physical and material losses and it is ...
详细信息
Prediction plays a vital role in decision making. Correct prediction leads to right decision making to save the life, energy,efforts, money and time. The right decision prevents physical and material losses and it is practiced in all the fields including medical,finance, environmental studies, engineering and emerging technologies. Prediction is carried out by a model called classifier. The predictive accuracy of the classifier highly depends on the training datasets utilized for training the classifier. The irrelevant and redundant features of the training dataset reduce the accuracy of the classifier. Hence, the irrelevant and redundant features must be removed from the training dataset through the process known as featureselection. This paper proposes a feature selection algorithm namely unsupervised learning with ranking based featureselection(FSULR). It removes redundant features by clustering and eliminates irrelevant features by statistical measures to select the most significant features from the training dataset. The performance of this proposed algorithm is compared with the other seven feature selection algorithms by well known classifiers namely naive Bayes(NB),instance based(IB1) and tree based J48. Experimental results show that the proposed algorithm yields better prediction accuracy for classifiers.
featureselection plays an important role in machine learning and data min-ing problems. Identifying the best feature selection algorithm that helps to remove irrelevant and redundant features is a complex task. This ...
详细信息
featureselection plays an important role in machine learning and data min-ing problems. Identifying the best feature selection algorithm that helps to remove irrelevant and redundant features is a complex task. This research tries to address it by recommending a feature selection algorithm based on dataset meta-features. The main contribution of the work is the use of Semantic Web principles to develop a recommendation model for the feature selection algorithm. As a result, dataset meta-features are modeled in a domain ontology, and a set of Semantic Web rule language (SWRL) pre-dictive rules have been proposed to recommend a feature selection algorithm. The result of this research is a feature selection algorithm recommendation based on the data characteristics and quality (FSDCQ) ontology, which not only helps with recommendations but also finds the data points with data quality violations. An experiment is conducted on the classification datasets from the UCI repository to evaluate the proposed ontology. The usefulness and effectiveness of the proposed method is evaluated by comparing it with the widely used method in the literature for the recommendation. Results show that the ontology-based recommendations are equally good as the widely used recommendation model, which is k-NN, with added benefits.
Monkeypox is an uncommon viral infection leading to skin eruptions resembling smallpox. Recent monkeypox outbreaks demonstrate the persistent danger presented by this virus. Accurate and timely diagnosis of monkeypox ...
详细信息
ISBN:
(纸本)9784907764807
Monkeypox is an uncommon viral infection leading to skin eruptions resembling smallpox. Recent monkeypox outbreaks demonstrate the persistent danger presented by this virus. Accurate and timely diagnosis of monkeypox is important for the effective treatment and prevention of outbreaks. In this study, we propose an integration of feature selection algorithms for the deep features approach for the classification of monkeypox skin lesions. The deep pre-trained models (ResNet50, GoogleNet, and InceptionNetV3) are fine-tuned at initial stage. After that, deep model-based features are extracted and filtered by the feature selection algorithms. Finally, the selected features are then classified using traditional classifiers. The obtained results show that the classification selected of deep features achieved high performance and outperformed the original version of the pre-trained model. The highest performance metrics belongs to the case of ResNet50-based features and Grey Wolf Optimization giving 96.8%, 95.3%, 98.0%, and 96.5% in terms of accuracy, precision, sensitivity, and F1-score, respectively.
暂无评论