This paper takes education data mining as the research theme, mine the existing massive education big data, compares the analysis methods of existing datamodels, and proposes an improved random forest reference model...
详细信息
This paper takes education data mining as the research theme, mine the existing massive education big data, compares the analysis methods of existing datamodels, and proposes an improved random forest reference model. The information gain of various features is calculated by introducing the feature weighting system, and the evaluation index is used to improve the existing data analysis. The simulation results show that the improved model is highly efficient as compared to the existing models for classification. In order to resolve the performance bottleneck of a single node in multiple dataclassification tasks in the era of big data, a classification and prediction model of graduates' large-scale employment data, based on distributed improved RF algorithm, is proposed. The MapReduce distributed computing framework is used to complete the serial writing and deserialization loading of the training model between the local disk and the distributed file system, and realizing the distributed expansion of the large-scale data classification model based on the improved RF model.
暂无评论