Global Positioning System (GPS) technologies have been increasingly considered as an alternative to traditional travel survey methods to collect activity-travel data. algorithms applied to extract activity-travel patt...
详细信息
Global Positioning System (GPS) technologies have been increasingly considered as an alternative to traditional travel survey methods to collect activity-travel data. algorithms applied to extract activity-travel patterns vary from informal ad-hoc decision rules to advanced machine learning methods and have different accuracy. This paper systematically compares the relative performance of different algorithms for the detection of transportation modes and activity episodes. In particular, naive Bayesian, Bayesian network, logistic regression, multilayer perceptron, support vector machine, decision table, and C4.5 algorithms are selected and compared for the same data according to their overall error rates and hit ratios. Results show that the Bayesian network has a better performance than the other algorithms in terms of the percentage correctly identified instances and Kappa values for both the training data and test data, in the sense that the Bayesian network is relatively efficient and generalizable in the context of GPS data imputation.
Advanced persistent attacks, incorporated by sophisticated malware, are on the rise against hosts, user applications and utility software. Modern malware hide their malicious payload by applying packing mechanism. Pac...
详细信息
Advanced persistent attacks, incorporated by sophisticated malware, are on the rise against hosts, user applications and utility software. Modern malware hide their malicious payload by applying packing mechanism. Packing tools instigate code encryption to protect the original malicious payload. Packing is employed in tandem with code obfuscation/encryption/compression to create malware variants. Despite being just a variant of known malware, the packed malware invalidates the traditional signature based malware detection as packing tools create an envelope of packer code around the original base malware. Therefore, unpacking becomes a mandatory phase prior to anti-virus scanning for identifying the known malware hidden behind packing layers. Existing techniques of unpacking solutions increase execution overhead of AV scanners in terms of time. This paper illustrates an easy to use approach which works in two phases to reduce this overhead. The first phase (ESCAPE) discriminates the packed code from the native code (non-packed) by using random block entropy. The second phase (PEAL) validates inferences of ESCAPE by employing bi-classification (packed vs native) model using relevant hex byte features extracted blockwise. The proposed approach is able to shrink the overall execution time of AV scanners by filtering out native samples and avoiding excessive unpacking overhead. Our method has been evaluated against a set consisting of real packed instances of malware and benign programs.
When working with real-world applications we often find imbalanced datasets, those for which there exists a majority class with normal data and a minority class with abnormal or important data. In this work, we make a...
详细信息
When working with real-world applications we often find imbalanced datasets, those for which there exists a majority class with normal data and a minority class with abnormal or important data. In this work, we make an overview of the class imbalance problem;we review consequences, possible causes and existing strategies to cope with the inconveniences associated to this problem. As an effort to contribute to the solution of this problem, we propose a new rule induction algorithm named Rule Extraction for MEdical Diagnosis (REMED), as a symbolic one-class learning approach. For the evaluation of the proposed method, we use different medical diagnosis datasets taking into account quantitative metrics, comprehensibility, and reliability. We performed a comparison of REMED versus C4.5 and RIPPER combined with over-sampling and cost-sensitive strategies. This empirical analysis of the REMED algorithm showed it to be quantitatively competitive with C4.5 and RIPPER in terms of the area under the Receiver Operating Characteristic curve (AUC) and the geometric mean, but overcame them in terms of comprehensibility and reliability. Results of our experiments show that REMED generated rules systems with a larger degree of abstraction and patterns closer to well-known abnormal values associated to each considered medical dataset.
Understanding irrigator responses to changes in water availability is critical for building strategies to support effective management of water resources. Using remote sensing data, we examine farmer responses to seas...
详细信息
Understanding irrigator responses to changes in water availability is critical for building strategies to support effective management of water resources. Using remote sensing data, we examine farmer responses to seasonal changes in water availability in Idaho's Snake River Plain for the time series 1984-2016. We apply a binary threshold based on the seasonal maximum of the Normalized Difference Moisture Index (NDMI) using Landsat 5-8 images to distinguish irrigated from non-irrigated lands. We find that the NDMI of irrigated lands increased over time, consistent with trends in irrigation technology adoption and increased crop productivity. By combining remote sensing data with geospatial data describing water rights for irrigation, we show that the trend in NDMI is not universal, but differs by farm size and water source. Farmers with small farms that rely on surface water are more likely than average to have a large contraction (over -25%) in irrigated area over the 33-year period of record. In contrast, those with large farms and access to groundwater are more likely than average to have a large expansion (over +25%) in irrigated area over the same period.
In classification, it is generally assumed that data from one class consist of one pure compact data cluster. However, in many cases, this cluster might consist of multiple subclusters, in other words, within-class mu...
详细信息
In classification, it is generally assumed that data from one class consist of one pure compact data cluster. However, in many cases, this cluster might consist of multiple subclusters, in other words, within-class multimodality. In such a scenario, it may be difficult or even impossible for a single classifier to find a suitable model using limited data. So, training a model using smaller chunks of data is an alternative that helps avoid complex models and reduces the task's complexity. This paper proposes the subconcept Perturbation-based Classifier (sPerC) that finds the best clusters per class using cluster validation measures, and one meta-classifier is trained per subcluster. This way, each class is represented by a set of meta-classifiers instead of one classifier. Such a design diminishes the complexity of the task, and using a divide-to-conquer strategy favors the precision of each meta-classifier. Through a set of comprehensive experiments on 30 datasets, the sPerC results compared favorably to other classifiers in multi-class classification tasks, showing that creating specialized classifiers per class in different regions of the feature space can be advantageous.
Many real-world decision-making problems fall into the general category of classification. algorithms for constructing knowledge by inductive inference from example have been widely used for some decades. Although the...
详细信息
Many real-world decision-making problems fall into the general category of classification. algorithms for constructing knowledge by inductive inference from example have been widely used for some decades. Although these learning algorithms frequently address the same problem of learning from preclassified examples and much previous work in inductive learning has focused on the algorithms' predictive accuracy, little attention has been paid to the effect of data factors on the performance of a learning system. An experiment was conducted using five learning algorithms on two data sets to investigate how the change in labeling the class attribute can alter the behavior of learning algorithms. The results show that different preclassification rules applied on the training examples can affect either the classification accuracy or classification structure.
We develop an algorithm framework for isomorph-free exhaustive generation of designs admitting a group of automorphisms from a prescribed collection of pairwise nonconjugate groups, where each prescribed group has a l...
详细信息
We develop an algorithm framework for isomorph-free exhaustive generation of designs admitting a group of automorphisms from a prescribed collection of pairwise nonconjugate groups, where each prescribed group has a large index relative to its normalizer in the isomorphism-inducing group. We demonstrate the practicality of the framework by producing a complete classification of the Steiner triple systems of order 21 admitting a nontrivial automorphism group. The number of such pairwise nonisomorphic designs is 62336617, where 958 of the designs are anti-Pasch. We also develop consistency checking methodology for gaining confidence in the correct operation of the algorithm implementation.
The purpose of the work is to demonstrate the possibilities of identifying the different types of pathological tissue identification directly through tissue mass spectrometry. Glioblastoma parts dissected during neuro...
详细信息
The purpose of the work is to demonstrate the possibilities of identifying the different types of pathological tissue identification directly through tissue mass spectrometry. Glioblastoma parts dissected during neurosurgical operation were investigated. Tumor fragments were investigated by the immunohistochemistry method and were identified as necrotic tissue with necrotized vessels, necrotic tissue with tumor stain, tumor with necrosis (tumor tissue as major), tumor, necrotized tumor (necrotic tissues as major), parts of tumor cells, boundary brain tissue, and brain tissue hyperplasia. The technique of classification of tumor tissues based on mass spectrometric profile data processing is suggested in this paper. Classifiers dividing the researched sample to the corresponding tissue type were created as a result of the processing. Classifiers of necrotic and tumor tissues are shown to yield a combined result when the tissue is heterogeneous and consists of both tumor cells and necrotic tissue.
An experimental bifurcation diagram of a circuit implementing an approximation of the Hindmarsh-Rose (HR) neuron model is presented. Measured asymptotic time series of circuit voltages are automatically classified thr...
详细信息
An experimental bifurcation diagram of a circuit implementing an approximation of the Hindmarsh-Rose (HR) neuron model is presented. Measured asymptotic time series of circuit voltages are automatically classified through an ad hoc algorithm. The resulting two-dimensional experimental bifurcation diagram evidences a good match with respect to the numerical results available for both the approximated and original HR model. Moreover, the experimentally obtained current-frequency curve is very similar to that of the original model. The obtained results are both a proof of concept of a quite general method developed in the last few years for the approximation and implementation of nonlinear dynamical systems and a first step towards the realisation in silica of HR neuron networks with tunable parameters. (C) 2010 Elsevier B.V. All rights reserved.
Large-scale projects,such as the construction of railways and highways,usually cause an extensive Land Use Land Cover Change(LULCC).The China-Central Asia-West Asia Economic Corridor(CCAWAEC),one key large-scale proje...
详细信息
Large-scale projects,such as the construction of railways and highways,usually cause an extensive Land Use Land Cover Change(LULCC).The China-Central Asia-West Asia Economic Corridor(CCAWAEC),one key large-scale project of the Belt and Road Initiative(BRI),covers a region that is home to more than 1.6 billion *** numerous studies have been conducted on strategies and the economic potential of the Economic Corridor,reviewing LULCC mapping studies in this area has not been *** study provides a comprehensive review of the recent research progress and discusses the challenges in LULCC monitoring and driving factors identifying in the study *** review will be helpful for the decision-making of sustainable development and construction in the Economic *** this end,350 peer-reviewed journal and conference papers,as well as book chapters were analyzed based on 17 attributes,such as main driving factors of LULCC,data collection methods,classification algorithms,and accuracy assessment *** was observed that:(1)rapid urbanization,industrialization,population growth,and climate change have been recognized as major causes of LULCC in the study area;(2)LULCC has,directly and indirectly,caused several environmental issues,such as biodiversity loss,air pollution,water pollution,desertification,and land degradation;(3)there is a lack of well-annotated national land use data in the region;(4)there is a lack of reliable training and reference datasets to accurately study the long-term LULCC in most parts of the study area;and(5)several technical issues still require more attention from the scientific ***,several recommendations were proposed to address the identified issues.
暂无评论