the larger size and complexity of software source code builds many challenges in bug detection. datamining based bug detection methods eliminate the bugs present in software source code effectively. Rule violation an...
详细信息
ISBN:
(纸本)9781467355834;9781467355827
the larger size and complexity of software source code builds many challenges in bug detection. datamining based bug detection methods eliminate the bugs present in software source code effectively. Rule violation and copy paste related defects are the most concerns for bug detection system. Traditional datamining approaches such as frequent Itemset mining and frequent sequence mining are relatively good but they are lacking in accuracy and patternrecognition. Neural networks have emerged as advanced datamining tools in cases where other techniques may not produce satisfactory predictive models. the neural network is trained for possible set of errors that could be present in software source code. From the training datathe neural network learns how to predict the correct output. the processing elements of neural networks are associated with weights which are adjusted during the training period.
Bi-directional Associative Memory (BAM) is an artificial neural network that consists of two Hopfield networks. the most important advantage of BAM is the ability to recall a stored pattern from a noisy input, which d...
详细信息
Malicious PDF files have been used to harm computer security during the past two-three years, and modern antivirus are proving to be not completely effective against this kind of threat. In this paper an innovative te...
详细信息
Previous experiments with low dimensional data sets have shown that Gabriel graph methods for instance-based learning are among the best machinelearning algorithms for pattern classification applications. However, as...
详细信息
Ontology is a formal, explicit specification of a shared conceptual model and provides a way for computers to exchange, search and identify characteristics. datamining is a drawing work from areas including database ...
详细信息
there is a need to facilitate access to the required information in the web and adapting it to the users' preferences and requirements. this paper presents a system that, based on a collaborative filtering approac...
详细信息
Several machinelearning techniques were evaluated for the prediction of parameters relevant in pharmacology and drug discovery including rat and human microsomal intrinsic clearance as well as plasma protein binding ...
详细信息
ISBN:
(纸本)9781618397461
Several machinelearning techniques were evaluated for the prediction of parameters relevant in pharmacology and drug discovery including rat and human microsomal intrinsic clearance as well as plasma protein binding represented as the fraction of unbound compound. the algorithms assessed in this study include artificial neural networks (ANN), support vector machines (SVM) withthe extension for regression, kappa nearest neighbor (KNN), and Kohonen Networks. the data sets, obtained through literature datamining, were described through a series of scalar, two- and three-dimensional descriptors including 2-D and 3-D autocorrelation, and radial distribution function. the feature sets were optimized for each data set individually for each machinelearning technique using sequential forward feature selection. the data sets range from 400 to 600 compounds with experimentally determined values. Intrinsic clearance (CL int) is a measure of metabolism by cytochrome P-450 enzymes primarily in the vesicles of the smooth endoplasmic reticulum. these important enzymes contribute to the metabolism of an estimated 75% of the most frequently prescribed drugs in the U.S. the fraction of unbound compound (fu) greatly influences pharmacokinetics, efficacy, and toxicology. In this study, machinelearning models were constructed by systematically optimizing feature sets and algorithmic parameters to calculate these parameters of interest with cross validated correlation/RMSD values reaching 9.53 over the normalized data set. these fully in silico models are useful in guiding early stages of drug discovery, such as analogue prioritization prior to synthesis and biological testing while reducing costs associated withthe in vitro determination of these parameters. these models are made freely available for academic use.
the purpose of this paper is to classify the sole patterns from a 3D shoe model which is comprised of scattered point cloud data. Sole patterns can be divided into five categories based on the texture of each pattern....
详细信息
ISBN:
(纸本)9780819490261
the purpose of this paper is to classify the sole patterns from a 3D shoe model which is comprised of scattered point cloud data. Sole patterns can be divided into five categories based on the texture of each pattern. the point cloud data is sliced into a number of layers, and the unordered data points in each layer are projected onto a viewing plane to get a 2D shoeprint, in which we can further segment a texture element by region growing. then, each texture element segmented can be classified into two types, non-closed curve and closed curve, by detecting if there are point cloud data in each external unit of the region and looking for the nearest points to the region. Finally, we can identify the type of the texture element into one of the five categories by analyzing its geometrical characteristics.
Ordinal data is omnipresent in almost all multiuser-generated feedback - questionnaires, preferences etc. this paper investigates modelling of ordinal data with Gaussian restricted Boltzmann machines (RBMs). In partic...
详细信息
Statistical analysis and patternrecognition have become a daunting endeavour in face of the enormous amount of information in datasets that have continually been made available. In view of the infeasibility of comple...
详细信息
ISBN:
(纸本)9788086943794
Statistical analysis and patternrecognition have become a daunting endeavour in face of the enormous amount of information in datasets that have continually been made available. In view of the infeasibility of complete manual annotation, one seeks active learning methods for data organization, selection and prioritization that could help the user to label the samples. these methods, however, classify and reorganize the entire dataset at each iteration, and as the datasets grow, they become blatantly inefficient from the user's point of view. In this work, we propose an active learning paradigm which considerably reduces the non-annotated dataset into a small set of relevant samples for learning. During active learning, random samples are selected from this small learning set and the user annotates only the misclassified ones. A training set with new labelled samples increases at each iteration and improves the classifier for the next one. When the user is satisfied, the classifier can be used to annotate the rest of the dataset. To illustrate the effectiveness of this paradigm, we developed an instance based on the optimum path forest (OPF) classifier, while relying on clustering and classification for the learning process. By using this method, we were able to iteratively generate classifiers that improve quickly, to require few iterations, and to attain high accuracy while keeping user involvement to a minimum. We also show that the method provides better accuracies on unseen test sets with less user involvement than a baseline approach based on the OPF classifier and random selection of training samples from the entire dataset.
暂无评论