If two fragments of source code are identical to each other, they are called code clones. Code clones introduce difficulties in software maintenance and cause bug propagation. In this paper, we present a machine learn...
详细信息
ISBN:
(纸本)9781509061679
If two fragments of source code are identical to each other, they are called code clones. Code clones introduce difficulties in software maintenance and cause bug propagation. In this paper, we present a machine learning framework to automatically detect clones in software, which is able to detect Types-3 and the most complicated kind of clones, Type-4 clones. Previously used traditional features are often weak in detecting the semantic clones The novel aspects of our approach are the extraction of features from abstract syntax trees (AST) and program dependency graphs (PDG), representation of a pair of code fragments as a vector and the use of classification algorithms. The key benefit of this approach is that our approach can find both syntactic and semantic clones extremely well. Our evaluation indicates that using our new AST and PDG features is a viable methodology, since they improve detecting clones on the IJaDataset2.0.
This paper shows the application of an embedded system with a wireless sensor network to collect atmospheric pollutants data obtained from sensors placed into micro-climates;such dataset provides the information requi...
详细信息
ISBN:
(纸本)9783319270609;9783319270593
This paper shows the application of an embedded system with a wireless sensor network to collect atmospheric pollutants data obtained from sensors placed into micro-climates;such dataset provides the information required to test classification algorithms, that helps to develop applications to improve air quality in specific areas.
Background: There is an urgent need for objective criteria adjunctive to standard clinical assessment of acute Traumatic Brain Injury (TBI). Details of the development of a quantitative index to identify structural br...
详细信息
Background: There is an urgent need for objective criteria adjunctive to standard clinical assessment of acute Traumatic Brain Injury (TBI). Details of the development of a quantitative index to identify structural brain injury based on brain electrical activity will be described. Methods: Acute closed head injured and normal patients (n=1470) were recruited from 16 US Emergency Departments and evaluated using brain electrical activity (EEG) recorded from forehead electrodes. Patients had high GCS (median=15), and most presented with low suspicion of brain injury. Patients were divided into a CT positive (CT+) group and a group with CT negative findings or where CT scans were not ordered according to standard assessment (CT-/CT_NR). Three different classifier methodologies, Ensemble Harmony, Least Absolute Shrinkage and Selection Operator (LASSO), and Genetic Algorithm (GA), were utilized. Results: Similar performance accuracy was obtained for all three methodologies with an average sensitivity/specificity of 97.5%/59.5%, area under the curves (AUC) of 0.90 and average Negative Predictive Validity (NPV) > 99%. Sensitivity was highest for CT+ cases with potentially life threatening hematomas, where two of three classifiers were 100%. Conclusion: Similar performance of these classifiers suggests that the optimal separation of the populations was obtained given the overlap of the underlying distributions of features of brain activity. High sensitivity to CT+ injuries (highest in hematomas) and specificity significantly higher than that obtained using ED guidelines for imaging, supports the enhanced clinical utility of this technology and suggests the potential role in the objective, rapid and more optimal triage of TB! patients. Published by Elsevier Ltd.
In our previous research work, we proposed a methodology that uses magnetic-field and multivariate methods to estimate user location in an indoor environment. In this paper, we propose the use of this methodology to e...
详细信息
In our previous research work, we proposed a methodology that uses magnetic-field and multivariate methods to estimate user location in an indoor environment. In this paper, we propose the use of this methodology to evaluate the performance of four different classification algorithms: Random Forest, Nearest Centroid, K Nearest Neighbors and Artificial Neural Networks;each classifier will be considered as a cost function of a genetic algorithm (GA) used in the feature selection process task of the methodology. The motivation to evaluate the algorithms of classification was that several ILSs use a classification algorithm in order to estimate the location of the user, but the classifiers performance vary from application to application. In order to evaluate the performance of each classification algorithm, the following issues were considered: (1) the time of the training phase to obtain the final classification algorithm;(2) the number of features needed for getting the model;(3) the type of the features from the final model;and (4) the sensitivity and specificity of the model. Our results indicate that Nearest centroid is the classfier algorithm that is best suited to be implemented in an end-user application given the obtained results on the evaluated criteria for the indoor location system (ILS). (C) 2014 Published by Elsevier B.V.
In our previous research work, we proposed a methodology that uses magnetic-field and multivariate methods to estimate user location in an indoor environment. In this paper, we propose the use of this methodology to e...
详细信息
In our previous research work, we proposed a methodology that uses magnetic-field and multivariate methods to estimate user location in an indoor environment. In this paper, we propose the use of this methodology to evaluate the performance of four different classification algorithms: Random Forest, Nearest Centroid, K Nearest Neighbors and Artificial Neural Networks; each classifier will be considered as a cost function of a genetic algorithm (GA) used in the feature selection process task of the methodology. The motivation to evaluate the algorithms of classification was that several ILSs use a classification algorithm in order to estimate the location of the user, but the classifiers performance vary from application to application. In order to evaluate the performance of each classification algorithm, the following issues were considered: (1) the time of the training phase to obtain the final classification algorithm; (2) the number of features needed for getting the model; (3) the type of the features from the final model; and (4) the sensitivity and specificity of the model. Our results indicate that Nearest centroid is the classfier algorithm that is best suited to be implemented in an end-user application given the obtained results on the evaluated criteria for the indoor location system (ILS).
User simulation inspection of every appliance on a production line is time consuming and expensive. A more effective way is to use sensors for fast indirect measurements of selected quality indexes, ie appliance featu...
详细信息
User simulation inspection of every appliance on a production line is time consuming and expensive. A more effective way is to use sensors for fast indirect measurements of selected quality indexes, ie appliance features, which may be used for automatic on-line inspection and classification, by correlation to manual inspection. Feature vectors for classifying electrical appliances tend to form overlapping, irregular amorphous clusters in a multidimensional feature space. Three classifier algorithms were formulated to address this difficult classification problem, which is aggravated by the requirement that almost no bad units should be misclassified as good ones. The discriminatory power of two or all three classifiers is combined by a classifier voting strategy. The different classifiers and voting strategies are compared in terms of four performance indexes, predicting the percentage of bad units sent to the customer, percentage of good units rejected as bad ones, a cost-weighted class contamination index and the expected percentage of correctly classified units. Practical application is implemented on a feature data base of several hundred labelled refrigerators, whereby it is demonstrated that three classifier c voting will practically never misclassify a bad unit as a good one.
暂无评论