Wireless Sensor Networks (WSN) are groups of stand-alone gadgets that typically feature one or more sensors (for example, light level, temperature), with relatively limited computing capabilities, and a wireless conne...
详细信息
ISBN:
(纸本)9798350328929
Wireless Sensor Networks (WSN) are groups of stand-alone gadgets that typically feature one or more sensors (for example, light level, temperature), with relatively limited computing capabilities, and a wireless connection that enable interaction with a base station. Today, WSN is being implemented within critical infrastructures such as connected vehicles, drones, smart cities, smart grids, and surveillance systems. The major issue of WSN is that they are primarily focused on security issues linked to packet transfer across network's multiple sensor nodes. Intrusion detection is essential due to the growing importance of WSN security. To address this flaw in WSN, an effective wrapper feature selection founded on the Firefly algorithm (FFA) is developed for the selection of significant attributes in this paper. This wrapper-based feature selection solution reduces time consumption to a higher extent while also increasing the network's lifetime and scalability. In the first phase of this work, data preprocessing was performed with a minimum maximum normalization approach, subsequently, FFA was used for feature dimensionality reduction and c5.0 for the classification. The simulations were done using the UNSW-NB15 benchmark data, and the suggested firefly with c5.0 (FFA-05.0) has an accuracy of 98.7%.
Backgroundchondrosarcoma (cHS), a bone malignancy, poses a significant challenge due to its heterogeneous nature and resistance to conventional treatments. There is a clear need for advanced prognostic instruments tha...
详细信息
Backgroundchondrosarcoma (cHS), a bone malignancy, poses a significant challenge due to its heterogeneous nature and resistance to conventional treatments. There is a clear need for advanced prognostic instruments that can integrate multiple prognostic factors to deliver personalized survival predictions for individual patients. This study aimed to develop a novel prediction tool based on recursive partitioning analysis (RPA) to improve the estimation of overall survival for patients with *** from the Surveillance, Epidemiology, and End Results (SEER) database were analyzed, including demographic, clinical, and treatment details of patients diagnosed between 2000 and 2018. Using c5.0 algorithm, decision trees were created to predict survival probabilities at 12, 24, 60, and 120 months. The performance of the models was assessed through confusion scatter plot, accuracy rate, receiver operator characteristic (ROc) curve, and area under ROccurve (AUc).ResultsThe study identified tumor histology, surgery, age, visceral (brain/liver/lung) metastasis, chemotherapy, tumor grade, and sex as critical predictors. Decision trees revealed distinct patterns for survival prediction at each time point. The models showed high accuracy (82.40%-89.09% in training group, and 82.16%-88.74% in test group) and discriminatory power (AUc: 0.806-0.894 in training group, and 0.808-0.882 in test group) in both training and testing datasets. An interactive web-based shiny APP (URL: ) was developed, simplifying the survival prediction process for *** study successfully employed RPA to develop a user-friendly tool for personalized survival predictions in cHS. The decision tree models demonstrated robust predictive capabilities, with the interactive application facilitating clinical decision-making. Future prospective studies are recommended to validate these findings and further refine the predictive model.
To efficiently extract and leverage the vast amounts of data stored within university academic management systems, a novel approach utilizing data mining techniques for analyzing higher education teaching data has bee...
详细信息
ISBN:
(纸本)9798400711732
To efficiently extract and leverage the vast amounts of data stored within university academic management systems, a novel approach utilizing data mining techniques for analyzing higher education teaching data has been proposed. This study delves into the performance data recorded in the academic affairs system, aiming to uncover meaningful insights. Initially, data collection and preprocessing are conducted to ensure clean and structured input. Subsequently, a comprehensive evaluation of student performance is performed using factor analysis. For predictive purposes, an enhanced decision tree model is developed by integrating the K-means clustering algorithm with the c5.0 algorithm. This hybrid approach is benchmarked against alternative methodologies to assess its effectiveness. Experimental results demonstrate that the enhanced decision tree achieves the highest prediction accuracy of 64.8%, characterized by minimal tree depth and a reduced number of leaf nodes. This not only enhances model precision but also ensures greater robustness while mitigating the risk of overfitting. These findings highlight the superiority of the proposed method in analyzing and predicting academic performance within the context of higher education.
In recent decades, scarce work was done on risk management of the domino effect by establishing the relationship models between influencing factors and consequences of accidents. In this study, the statistical analysi...
详细信息
In recent decades, scarce work was done on risk management of the domino effect by establishing the relationship models between influencing factors and consequences of accidents. In this study, the statistical analysis of the 1144 accidents of tank farms including 100 domino accidents and 1044 non-domino accidents occurred in china from 1960 to 2018 was performed. Unlike the existing statistical analysis literature, the causes of the primary events and the secondary events were separately analyzed to determine the common causes of accident escalation. The c5.0 decision tree algorithm was adopted to extract rules that show the most likely sequences of causal factors for triggering domino accidents. Association rule mining was performed on season factors, units of accidents, operation status and causal factors to predict the causal factors under different process conditions and scenarios. The results showed that inadequate training, inadequate procedure and design deficiency were the most important factors in the c5.0 models on the prediction of domino risks. Six decision rules were learned by c5.0 algorithm and twenty effective rules were learned by the association rule mining, which can jointly provide accurate and reliable prevention strategies and decision support for the risk management of domino accidents.
Since maize water requirement is different at different growth stages, so prediction and extraction of association rules related to water requirements of the plant were performed separately at initial, development, mi...
详细信息
Since maize water requirement is different at different growth stages, so prediction and extraction of association rules related to water requirements of the plant were performed separately at initial, development, mid, and late season growth stages. Accordingly, information on water requirement of maize during 20 years (2000-2019) in Qazvin plain was used. First, the results of c5.0, cART, cHAID, and QUIST algorithms related to corn water demand forecast were evaluated. According to the results, cART algorithm at the initial growth stage and c5.0 algorithm at the development, mid and late season stages of growth, had the best performance in predicting water requirements of maize. The factors of air humidity and precipitation were the most important factors in predicting water requirements of maize at the initial stage of growth by cART tree algorithm. Also, according to the results of c5.0 algorithm, it was found that at the development and mid-season growth stages, precipitation and air temperature were the most important, while at the late season stage of growth, the two factors of sunny (sunshine) hours and wind speed were most important in predicting plant water requirements. Finally, using Apriori algorithm, association rules between water requirements and the factors affecting it were extracted at four growth stages of maize. The results of association rules were evaluated by indicators of confidence, support, and lift. According to the results of Apriori algorithm, precipitation at the initial and development growth stages as well as air temperature and wind speed factors at the mid and late season stages of growth, respectively, had the greatest relationship with water requirements of maize. (c) 2021 Published by Elsevier B.V.
Land-cover classification using remote sensing imagery is an important part of environmental research because it provides baseline information for ecological vulnerability and risk assessment, disaster management, lan...
详细信息
Land-cover classification using remote sensing imagery is an important part of environmental research because it provides baseline information for ecological vulnerability and risk assessment, disaster management, landscape conservation, local and regional planning, and so on. Rural-land-cover classification is challenging for both object-based image analysis methods and classifiers. The objective of this study is to improve the object-oriented classification accuracy of rural land cover by combining two models based on high spatial resolution imagery. We apply the c5.0 algorithm in R to combine support vector machines (SVMs) and random forest (RF) to create the model RS_c5.0. The effectiveness of the model combination is assessed by comparing the classification results with the state-of-the-art machine learning algorithm, namely extreme gradient boosting (XGBoost). The comparisons are done based on the classification results of both the study area and the case area. Results show that in the classification of the study area, RF performs slightly better than SVM, and XGBoost performs worse than RF but better than SVM. However, in the classification of the case area, SVM performs slightly better than RF and both SVM and RF perform better than XGBoost. Furthermore, RS_c5.0 obtains the highest overall accuracies and kappa coefficients in the classifications of both the study area and the case area. In terms of training time, XGBoost runs the slowest in the classifications of both the study area and the case area. SVM and RF as well as the combined model (RS_c5.0) run much faster than XGBoost classifier. To summarize, the combination of SVM and RF classifiers using c5.0 algorithm is found to be a fast and effective way to improve rural-land-cover classification. (c) 2019 Society of Photo-Optical Instrumentation Engineers (SPIE)
To provide appropriate solutions for problematic smartphone use, we need to first understand its types. This study aimed to identify types of problematic smartphone use based on psychiatric symptoms, using the decisio...
详细信息
To provide appropriate solutions for problematic smartphone use, we need to first understand its types. This study aimed to identify types of problematic smartphone use based on psychiatric symptoms, using the decision tree method. We recruited 5,372 smartphone users from online surveys conducted between February 3 and February 22, 2016. Based on scores on the Korean Smartphone Addiction Proneness Scale for Adults (S-Scale), 974 smartphone users were assigned to the smartphone-dependent group and 4398 users were assigned to the normal group. The data-mining technique of c5.0 decision tree was applied. We used 15 input variables, including demographic and psychological factors. Four psychiatric variables emerged as the most important predictors: self-control (Sc;66%), anxiety (Anx;25%), depression (Dep;7%), and dysfunctional impulsivities (Imp;3%). We identified the following five types of problematic smartphone use: (1) non-comorbid, (2) self-control, (3) Sc + Anx, (4) Sc + Anx + Dep, and (5) Sc + Anx + Dep + Imp. We found that 74% of smartphone-dependent users had psychiatric symptoms. The ratio of participants belonging to the non-comorbid and self-control types was 64%. We proposed that these types of problematic smartphone use may be used for the development of an appropriate service for controlling and preventing such behaviors in adults.
Trafficclassification is an essential tool for network management and security. Traditional techniques such as port-based and payload analysis are ineffective as major Internet applications use dynamic port numbers a...
详细信息
ISBN:
(纸本)9789897583599
Trafficclassification is an essential tool for network management and security. Traditional techniques such as port-based and payload analysis are ineffective as major Internet applications use dynamic port numbers and encryption. Recent studies have used statistical properties of flows to classify traffic with high accuracy, minimising the overhead limitations associated with other schemes such as deep packet inspection (DPI). classification accuracy of statistical flow-based approaches, however, depends on the discrimination ability of the traffic features used. To this effect, the present paper customised the popular tcptrace utility to generate classification features based on traffic burstiness and periods of inactivity (idle time) for everyday Internet usage. An attempt was made to train a c5.0 decision tree classifier using the proposed features for eleven different Internet applications, generated by ten users. Overall, the newly proposed features reported a significant level of accuracy (similar to 98%) in classifying the respective applications.
Land-cover monitoring is one of the core applications of remote sensing. Monitoring and mapping changes in the distribution of agricultural land covers provide a reliable source of information that helps environmental...
详细信息
Land-cover monitoring is one of the core applications of remote sensing. Monitoring and mapping changes in the distribution of agricultural land covers provide a reliable source of information that helps environmental sustainability and supports agricultural policies. Synthetic Aperture Radar (SAR) can contribute considerably to this monitoring effort. The first objective of this research is to extend the use of time series of polarimetric data for land-cover classification using a decision tree classification algorithm. With this aim, RADARSAT-2 (quad-pol) and Sentinel-1 (dual-pol) data were acquired over an area of 600 km(2) in central Spain. Ten polarimetric observables were derived from both datasets and seven scenarios were created with different sets of observables to evaluate a multitemporal parcel-based approach for classifying eleven land-cover types, most of which were agricultural crops. The study demonstrates that good overall accuracies, greater than 83%, were achieved for all of the different proposed scenarios and the scenario with all RADARSAT-2 polarimetric observables was the best option (89.1%). Very high accuracies were also obtained when dual-pol data from RADARSAT-2 or Sentinel-1 were used to classify the data, with overall accuracies of 87.1% and 86%, respectively. In terms of individual crop accuracy, rapeseed achieved at least 95% of a producer's accuracy for all scenarios and that was followed by the spring cereals (wheat and barley), which achieved high producer's accuracies (79.9%-95.3%) and user's accuracies (85.5% and 93.7%).
Network traffic exhibits a high level of variability over short periods of time. This variability impacts negatively on the performance (accuracy) of anomaly-based network Intrusion Detection Systems (IDS) that are bu...
详细信息
Network traffic exhibits a high level of variability over short periods of time. This variability impacts negatively on the performance (accuracy) of anomaly-based network Intrusion Detection Systems (IDS) that are built using predictive models in a batch-learning setup. This thesis investigates how adapting the discriminating threshold of model predictions, specifically to the evaluated traffic, improves the detection rates of these Intrusion Detection models. Specifically, this thesis studied the adaptability features of three well known Machine Learning algorithms: c5.0, Random Forest, and Support Vector Machine. The ability of these algorithms to adapt their prediction thresholds was assessed and analysed under different scenarios that simulated real world settings using the prospective sampling approach. A new dataset (STA2018) was generated for this thesis and used for the analysis. This thesis has demonstrated empirically the importance of threshold adaptation in improving the accuracy of detection models when training and evaluation (test) traffic have different statistical properties. Further investigation was undertaken to analyse the effects of feature selection and data balancing processes on a model's accuracy when evaluation traffic with different significant features were used. The effects of threshold adaptation on reducing the accuracy degradation of these models was statistically analysed. The results showed that, of the three compared algorithms, Random Forest was the most adaptable and had the highest detection rates. This thesis then extended the analysis to apply threshold adaptation on sampled traffic subsets, by using different sample sizes, sampling strategies and label error rates. This investigation showed the robustness of the Random Forest algorithm in identifying the best threshold. The Random Forest algorithm only needed a sample that was 0.05% of the original evaluation traffic to identify a discriminating threshold with an overall ac
暂无评论