Intrusion detection system is a system that can detect the presence intrusion or attack in a computer network. There are 2 type of intrusion detection system that misuse/signature detection and anomaly detection. This...
详细信息
ISBN:
(纸本)9781479977529
Intrusion detection system is a system that can detect the presence intrusion or attack in a computer network. There are 2 type of intrusion detection system that misuse/signature detection and anomaly detection. This research use a combination of classification and regression tree (CART) and Fuzzy Logic method is used to detect intrusion or attack. CART is used to build rule or model that will be implemented by fuzzy inference engine. Testing process is performed using Fuzzy Logic without doing defuzzification because the resulting rule will be used as a classification. Training, testing and validation of the model is done by using KDD Cup 1999 dataset that has been through the preprocessing and cleaning data process. Accuracy testing and validation is calculated by using the confusion matrix. From several test performed, the best model is built from training 70%, the depth of tree 11 and node leaf minimum percentage 90% with an accuracy was 85,68% and average time validation was 21,92 second.
This study assesses the relative utility of a traditional regression approach logistic regression (LR) - and three classification techniques - classification and regression tree (CART), chi-squared automatic interacti...
详细信息
This study assesses the relative utility of a traditional regression approach logistic regression (LR) - and three classification techniques - classification and regression tree (CART), chi-squared automatic interaction detection (CHAID), and multi-layer perceptron neural network (MLPNN)-in predicting inmate misconduct. The four models were tested using a sample of inmates held in state and federal prisons and predictors derived from the importation model on inmate adaptation. Multivalidation procedure and multiple evaluation indicators were used to evaluate and report the predictive accuracy. The overall accuracy of the four models varied between 0.60 and 0.66 with an overall AUC range of 0.60-0.70. The LR and MLPNN methods performed significantly better than the CART and CHAID techniques at identifying misbehaving inmates and the CHAID method outperformed the CART approach in classifying defied inmates. The MLPNN method performed significantly better than the LR technique in predicting inmate misconduct among the training samples.
This study aims to predict amounts of waste generated from detached house by their types and materials when they are constructed and *** achieve this objective, required data was collected based on material informatio...
详细信息
This study aims to predict amounts of waste generated from detached house by their types and materials when they are constructed and *** achieve this objective, required data was collected based on material information of the buildings before *** using the established database, CART analysis was conducted, with the buildings' types and materials as analysis factors, to identify what affects the generation of waste concrete and to estimate indicators of waste *** results were as ***, the most influential factor on generation of waste concrete was types of ***, generation of waste concrete of RC-type and wood-type buildings was not affected by materials, while that of masonry-type buildings was affected by roof ***-type buildings were divided into two categories by roof materials: i) buildings with slab and slab with roof tiles as roof materials and ii) buildings with roof tiles and slate as roof ***, amounts of waste concrete of RC-type and wood-type buildings were 0.324m/m and 0.018 m/m, respectively, and those of masonry-type buildings with roof materials of slab and slab with roof tiles and the other masonrytype buildings were 0.127 m/m and 0.040 m/m, respectively.
Estimation of structural forest attributes, such as volume, basal area, and tree density using a combination of remote sensing and field data, is currently considered a favored option compared to only using field surv...
详细信息
Estimation of structural forest attributes, such as volume, basal area, and tree density using a combination of remote sensing and field data, is currently considered a favored option compared to only using field survey data. In a comparative study, multiple linear regression (MLR) and classification and regression tree (CART) models were used to estimate volume, basal area, and tree density using advanced space-borne thermal emission and reflection radiometer (ASTER) and satellite poure I'observation de la terre (SPOT)-high resolution grounding (HRG) imagery in the Darabkola forests, located at the Hyrcanian region of Iran. Results showed that the CART model using SPOT-HRG data achieved the best overall performance for all three forest structural attributes, with adjusted R-2 - 0.746 and RMSE - 67.9 m(3) ha(-1) for volume, adjusted R-2 = 0.771 and RMSE = 3.94 m(2) ha(-1) for basal area, and adjusted R-2 = 0.871 and RMSE = 34.71 nha(-1) for tree density. In general, the CART model, using both ASTER and SPOT-HRG data, produced better estimates of forest attributes compared to the MLR model. In addition, results showed that forest attribute estimations using SPOT-HRG were better than those obtained from ASTER data. (C) 2014 Society of Photo-Optical Instrumentation Engineers (SPIE)
Compared to classical data which take a single value, there is another type of data, symbolic data, which can be a list, an interval, and even a distribution into consideration. Symbolic data are very common in our da...
详细信息
Compared to classical data which take a single value, there is another type of data, symbolic data, which can be a list, an interval, and even a distribution into consideration. Symbolic data are very common in our daily life; however, the analysis methods for symbolic data are very limited. For instance, a famous and useful method for supervised learning such as regression or classification is the decision tree. There are many useful algorithms based on the decision tree. However, the decision tree is only useful to classic data taking a single value, either numerical or categorical. In this dissertation, I will extend the classification and regression tree method (CART) to symbolic data.
A major source of uncertainty in flood statistics are the different flood generation processes. These make the assumption of homogeneous samples questionable. To overcome this issue, a framework for assessing the infl...
详细信息
A major source of uncertainty in flood statistics are the different flood generation processes. These make the assumption of homogeneous samples questionable. To overcome this issue, a framework for assessing the influence of catchment and climate attributes on flood-generating processes and their effect on flood statistics has been developed and applied to 252 catchments in New Zealand. Mean daily discharge data time series with a length ranging from 20 to 81 years were used. Flood events were classified according to their hydrograph shape. Three types were considered based on the different forcing: heavy rainfall of short duration (termed R1), moderate rainfall of medium intensity and duration (R2), and long-duration rainfall sequences of usually larger spatial extent (R3). The dominant flood type in each catchment was then linked to catchment and climate attributes. This allowed to identify the impact of each flood type on flood statistics and how the flood types have changed over time. The main drivers determining the flood type were rainfall variability and antecedent conditions. Small and steep catchments were dominated by heavy-rainfall floods of shorter duration, while flat and wet catchments were dominated by long-duration floods with larger volumes. Such information can support selection of effective flood protection and management measures.
Renewable energy, particularly biogas, stands as a pivotal solution amidst global energy challenges, offering sustainable alternatives to fossil fuels. Himachal Pradesh, a mountainous region in North India, exemplifie...
详细信息
Renewable energy, particularly biogas, stands as a pivotal solution amidst global energy challenges, offering sustainable alternatives to fossil fuels. Himachal Pradesh, a mountainous region in North India, exemplifies this shift with a growing network of biogas plants to alleviate energy poverty and environmental degradation. However, the operational success of these biogas plants remains precarious, marked by a significant non-functionality rate. The present study examines 180 biogas plants across Hamirpur and Kangra districts, revealing that 74.81% of these plants are non-operational. Key reasons include inadequate cattle populations, lack of interest and constructional issues. classification and regression tree (CART) model was used to identify the reasons and found that inadequate cattle population, coupled with socio-economic factors like declining interest and migration were primary barriers to sustained biogas plant functionality. These findings highlight the urgent need for targeted interventions, including technological upgrades and policy reforms, to enhance biogas plant sustainability and foster rural energy resilience in hilly terrains. By addressing these challenges, Himachal Pradesh can harness its rich agricultural resources more effectively, thereby advancing towards a greener and more sustainable energy future and informing policymakers on enhancing biogas technology's effectiveness in addressing energy poverty and promoting sustainable practices in rural communities.
The by-product gases generated during steel manufacturing processes, including blast furnace gas, coke oven gas, and Linz-Donawitz gas, exhibit considerable variability in composition and supply. Consequently, achievi...
详细信息
The by-product gases generated during steel manufacturing processes, including blast furnace gas, coke oven gas, and Linz-Donawitz gas, exhibit considerable variability in composition and supply. Consequently, achieving stable combustion control of these gases is critical for improving boiler efficiency. This study developed the advanced boiler combustion control model (ABCCM) by combining the random forest (RF) and classification and regression tree (CART) algorithms to optimize the combustion of steam power boilers using steel by-product gases. The ABCCM derives optimal combustion patterns in real time using the RF algorithm and minimizes fuel consumption through the CART algorithm, thereby optimizing the overall gross heat rate. The results demonstrate that the ABCCM achieves a 0.86% improvement in combustion efficiency and a 1.7% increase in power generation efficiency compared to manual control methods. Moreover, the model reduces the gross heat rate by 58.3 kcal/kWh, which translates into an estimated annual energy cost saving of USD 89.6 K. These improvements contribute considerably to reducing carbon emissions, with the ABCCM being able to optimize fuel utilization and minimize excess air supply, thus enhancing the overall sustainability of steelmaking operations. This study underscores the potential of the ABCCM to extend beyond the steel industry.
Predicting patient outcomes based on patient characteristics and care processes is a common task in medical research. Such predictive features are often multifaceted and complex, and are usually simplified into one or...
详细信息
Predicting patient outcomes based on patient characteristics and care processes is a common task in medical research. Such predictive features are often multifaceted and complex, and are usually simplified into one or more scalar variables to facilitate statistical analysis. This process, while necessary, results in a loss of important clinical detail. While this loss may be prevented by using distance-based predictive methods which better represent complex healthcare features, the statistical literature on such methods is limited, and the range of tools facilitating distance-based analysis is substantially smaller than those of other methods. Consequently, medical researchers must choose to either reduce complex predictive features to scalar variables to facilitate analysis, or instead use a limited number of distance-based predictive methods which may not fulfil the needs of the analysis problem at hand. We address this limitation by developing a Distance-Based extension of classification and regression trees (DB-CART) capable of making distance-based predictions of categorical, ordinal and numeric patient outcomes. We also demonstrate how this extension is compatible with other extensions to CART, including a recently published method for predicting care trajectories in chronic disease. We demonstrate DB-CART by using it to expand upon previously published dose-response analysis of stroke rehabilitation data. Our method identified additional detail not captured by the previously published analysis, reinforcing previous conclusions. We also demonstrate how by combining DB-CART with other extensions to CART, the method is capable of making predictions about complex, multifaceted outcome data based on complex, multifaceted predictive features.
Automation is the core transformation strategy that every industry wants to get on its roadmap today. Artificial Intelligence (AI) and Machine Learning (ML) are the key components of automation. It is increasingly use...
详细信息
Automation is the core transformation strategy that every industry wants to get on its roadmap today. Artificial Intelligence (AI) and Machine Learning (ML) are the key components of automation. It is increasingly used in both data analysis and building predictive models from the data. Growing privacy concerns, data confidentiality, and disclosure risks have posed a challenge to the accessibility of right and meaningful data. Several privacy-preserving and disclosure-limiting techniques have come up through research. One such disclosure limiting technique is Synthetic Data. Early research efforts have shown that synthetic data is an effective substitute for real data which can be effectively used to train AI and ML models. However, this needs a comprehensive evaluation before the data user can be confident enough that it is indeed a good substitute for real data. In this paper, we look at three main parameters of synthetic data which should provide a holistic assessment of the quality of synthetic data. First and foremost, how well synthetic data can preserve privacy and control disclosure, second is how good is its utility, and third, are they able to give fair results without any bias when used in machine learning. We review the existing literature to understand various disclosure control limiting methods, synthetic data generators, and then the validation methodologies and evaluation techniques. We understand how data privacy, utility and the fairness of synthetic data intervene with each other and identify the areas for future work.
暂无评论