Many machinelearning algorithms can be applied only to data described by categorical attributes. So discretizatioti of continuous attributes is one of the important steps in preprocessing of extracting knowledge. Tra...
详细信息
ISBN:
(纸本)9781424441990
Many machinelearning algorithms can be applied only to data described by categorical attributes. So discretizatioti of continuous attributes is one of the important steps in preprocessing of extracting knowledge. Traditional discretization algorithms based on clustering need a pre-determined clustering number k, also typically are applied in an unsupervised learning framework. This paper describes such an algorithm, called SX-means (Supervised X-means), which is a new algorithm of supervised discretization of continuous attributes on clustering. The algorithm modifies clusters with knowledge of the class distribution dynamically. And this procedure can not stop until the proper k is found. For the number of clusters k is not pre-determined by the user and class distribution is applied, the random of result is decreased greatly. Experimental evaluation of several discretization algorithms on six artificial data sets show that the proposed algorithm is more efficient and can generate a better discretization schema. Comparing the output of C4.5, resulting tree is smaller, less classification rules, and high accuracy of classification.
Many application domains make use of specific datastructures such as sequences and graphs to represent knowledge. These datastructures are ill-fitted to the standard representations used in machinelearning and data...
详细信息
Many application domains make use of specific datastructures such as sequences and graphs to represent knowledge. These datastructures are ill-fitted to the standard representations used in machinelearning and data-mining algorithms: propositional representations are not expressive enough, and first order ones are not efficient enough. In order to efficiently represent and reason on these datastructures, and the complex patterns that are related to them, we use domain-specific logics. We show these logics can be built by the composition of logical components that model elementary datastructures. The standard strategies of top-down and bottom-up search are ill-suited to some of these logics, and lack flexibility. We therefore introduce a dichotomic search strategy, that is analogous to a dichotomic search in an ordered array. We prove this provides more flexibility in the search, while retaining completeness and non-redundancy. We present a novel algorithm for learning using domain specific logics and dichotomic search, and analyse its complexity. We also describe two applications which illustrates the search for motifs in sequences;where these motifs have arbitrary length and length-constrained gaps. In the first application sequences represent the trains of the East-West challenge;in the second application they represent the secondary structure of Yeast proteins for the discrimination of their biological functions.
Participants in the supply chain may have different information, leading to incomplete or inaccurate information when making decisions. To this end, a process and machinelearning based collaborative scheduling algori...
详细信息
ISBN:
(纸本)9798400707032
Participants in the supply chain may have different information, leading to incomplete or inaccurate information when making decisions. To this end, a process and machinelearning based collaborative scheduling algorithm for all materials is proposed. Design a health monitoring process for material supply chain based on R-tree dynamic indexing algorithm. Based on this, artificial neural networks in machinelearning are applied to mine the data of the entire material supply chain. Through datamining, various data in the supply chain can be integrated and analyzed to improve information transparency and accuracy, and reduce information asymmetry. Adopting a dual layer scheduling model to achieve dual layer collaborative scheduling of materials. The experimental results show that the research method effectively improves the accuracy of datamining in the entire material supply chain, and the utilization rate of materials under this method is always higher than 95%.
A challenge for statistical learning is to deal with large data sets, e.g. in datamining. The training time of ordinary Support Vector machines is at least quadratic, which raises a serious research challenge if we w...
详细信息
A challenge for statistical learning is to deal with large data sets, e.g. in datamining. The training time of ordinary Support Vector machines is at least quadratic, which raises a serious research challenge if we want to deal with data sets of millions of examples. We propose a "hard parallelizable mixture" methodology which yields significantly reduced training time through modularization and paxallelization: the training data is iteratively partitioned by a "gater" model in such a way that it becomes easy to learn an "expert" model separately in each region of the partition. A probabilistic extension and the use of a set of generative models allows representing the gater so that all pieces of the model are locally trained. For SVMs, time complexity appears empirically to local growth linearly with the number of examples, while generalization performance can be enhanced. For the probabilistic version of the algorithm, the iterative algorithm probably goes down in a cost function that is an upper bound on the negative log-likelihood.
In this talk, I will describe a number of machinelearning paradigms that are relevant to utility-based datamining, and review some key techniques and results in each. Copyright 2005 ACM.
ISBN:
(纸本)1595932089
In this talk, I will describe a number of machinelearning paradigms that are relevant to utility-based datamining, and review some key techniques and results in each. Copyright 2005 ACM.
Interpretable machinelearning on complex data requires adequate cos-tumizable as well as scalable computational analysis methods. This paper presents a framework combining the paradigms of exceptional model mining wi...
详细信息
This paper concerns a novel application of machinelearning to Magnetic Resonance Imaging (MRI) by considering Neural Network models for the problem of image estimation from sparsely sampled k-space. Effective solutio...
详细信息
ISBN:
(纸本)3540665994
This paper concerns a novel application of machinelearning to Magnetic Resonance Imaging (MRI) by considering Neural Network models for the problem of image estimation from sparsely sampled k-space. Effective solutions to this problem are indispensable especially when dealing with MRI of dynamic phenomena since then, rapid sampling in k-space is required. The goal in such a case is to reduce the measurement time by omitting as many scanning trajectories as possible. This approach, however, entails underdetermined equations and leads to poor image reconstruction. It is proposed here that significant improvements could be achieved concerning image reconstruction if a procedure, based on machinelearning, for estimating the missing samples of complex k-space were introduced. To this end, the viability of involving Supervised and Unsupervised Neural Network algorithms for such a problem is considered and it is found that their image reconstruction results are very favorably compared to the ones obtained by the trivial zero-filled k-space approach or traditional more sophisticated interpolation approaches.
Epilepsy is a neurological disorder which can, if not controlled, potentially cause unexpected death. It is extremely crucial to have accurate automatic patternrecognition and datamining techniques to detect the ons...
详细信息
ISBN:
(数字)9781510604315
ISBN:
(纸本)9781510604315
Epilepsy is a neurological disorder which can, if not controlled, potentially cause unexpected death. It is extremely crucial to have accurate automatic patternrecognition and datamining techniques to detect the onset of seizures and inform care-givers to help the patients. EEG signals are the preferred biosignals for diagnosis of epileptic patients. Most of the existing patternrecognition techniques used in EEG analysis leverage the notion of supervised machinelearning algorithms. Since seizure data are heavily under-represented, such techniques are not always practical particularly when the labeled data is not sufficiently available or when disease progression is rapid and the corresponding EEG footprint pattern will not be robust. Furthermore, EEG pattern change is highly individual dependent and requires experienced specialists to annotate the seizure and non-seizure events. In this work, we present an unsupervised technique to discriminate seizures and non-seizures events. We employ power spectral density of EEG signals in different frequency bands that are informative features to accurately cluster seizure and non-seizure events. The experimental results tried so far indicate achieving more than 90% accuracy in clustering seizure and non-seizure events without having any prior knowledge on patient's history.
Application of datamining for web log analysis has received significant attention in finding customers' behavioral pattern in e-commerce and learners' behavioral pattern in e-learning. While hit-counts indica...
详细信息
ISBN:
(纸本)9780769530901
Application of datamining for web log analysis has received significant attention in finding customers' behavioral pattern in e-commerce and learners' behavioral pattern in e-learning. While hit-counts indicate customers' interest in the product or purchasing behavior, a student's visits to a learning Management System (LMS) do not necessarily involve transfer of learning. Addressing such complexity in e-learning, this study analyzed students' log of a learning Management System (LMS) of two subjects at a university in Bangladesh, taught over six weeks duration. datamining and statistical tools have been used to rind relationships between students' LMS access behavior and overall performances. Results show that students having 'Low' access obtained poor grade, on campus access was higher than access from home. Background of students is very important for effective usage of web resources. Majority of the student considered LMS to be a quite helpful tool as teaching-learning method. Preparation and cleaning of the web-log files as well as application of datamining algorithms is important for learners' web usage analysis.
暂无评论