Many machinelearning algorithms can be applied only to data described by categorical attributes. So discretizatioti of continuous attributes is one of the important steps in preprocessing of extracting knowledge. Tra...
详细信息
ISBN:
(纸本)9781424441990
Many machinelearning algorithms can be applied only to data described by categorical attributes. So discretizatioti of continuous attributes is one of the important steps in preprocessing of extracting knowledge. Traditional discretization algorithms based on clustering need a pre-determined clustering number k, also typically are applied in an unsupervised learning framework. This paper describes such an algorithm, called SX-means (Supervised X-means), which is a new algorithm of supervised discretization of continuous attributes on clustering. The algorithm modifies clusters with knowledge of the class distribution dynamically. And this procedure can not stop until the proper k is found. For the number of clusters k is not pre-determined by the user and class distribution is applied, the random of result is decreased greatly. Experimental evaluation of several discretization algorithms on six artificial data sets show that the proposed algorithm is more efficient and can generate a better discretization schema. Comparing the output of C4.5, resulting tree is smaller, less classification rules, and high accuracy of classification.
A challenge for statistical learning is to deal with large data sets, e.g. in datamining. The training time of ordinary Support Vector machines is at least quadratic, which raises a serious research challenge if we w...
详细信息
A challenge for statistical learning is to deal with large data sets, e.g. in datamining. The training time of ordinary Support Vector machines is at least quadratic, which raises a serious research challenge if we want to deal with data sets of millions of examples. We propose a "hard parallelizable mixture" methodology which yields significantly reduced training time through modularization and paxallelization: the training data is iteratively partitioned by a "gater" model in such a way that it becomes easy to learn an "expert" model separately in each region of the partition. A probabilistic extension and the use of a set of generative models allows representing the gater so that all pieces of the model are locally trained. For SVMs, time complexity appears empirically to local growth linearly with the number of examples, while generalization performance can be enhanced. For the probabilistic version of the algorithm, the iterative algorithm probably goes down in a cost function that is an upper bound on the negative log-likelihood.
Participants in the supply chain may have different information, leading to incomplete or inaccurate information when making decisions. To this end, a process and machinelearning based collaborative scheduling algori...
详细信息
ISBN:
(纸本)9798400707032
Participants in the supply chain may have different information, leading to incomplete or inaccurate information when making decisions. To this end, a process and machinelearning based collaborative scheduling algorithm for all materials is proposed. Design a health monitoring process for material supply chain based on R-tree dynamic indexing algorithm. Based on this, artificial neural networks in machinelearning are applied to mine the data of the entire material supply chain. Through datamining, various data in the supply chain can be integrated and analyzed to improve information transparency and accuracy, and reduce information asymmetry. Adopting a dual layer scheduling model to achieve dual layer collaborative scheduling of materials. The experimental results show that the research method effectively improves the accuracy of datamining in the entire material supply chain, and the utilization rate of materials under this method is always higher than 95%.
In this talk, I will describe a number of machinelearning paradigms that are relevant to utility-based datamining, and review some key techniques and results in each. Copyright 2005 ACM.
ISBN:
(纸本)1595932089
In this talk, I will describe a number of machinelearning paradigms that are relevant to utility-based datamining, and review some key techniques and results in each. Copyright 2005 ACM.
We describe an approach to learningpatterns in relational data represented as a graph. The approach, implemented in the Subdue system, searches for patterns that maximally compress the input graph. Subdue can be used...
详细信息
We describe an approach to learningpatterns in relational data represented as a graph. The approach, implemented in the Subdue system, searches for patterns that maximally compress the input graph. Subdue can be used for supervised learning, as well as unsupervised pattern discovery and clustering. We apply Subdue in domains related to homeland security and social network analysis.
Interpretable machinelearning on complex data requires adequate cos-tumizable as well as scalable computational analysis methods. This paper presents a framework combining the paradigms of exceptional model mining wi...
详细信息
Epilepsy is a neurological disorder which can, if not controlled, potentially cause unexpected death. It is extremely crucial to have accurate automatic patternrecognition and datamining techniques to detect the ons...
详细信息
ISBN:
(数字)9781510604315
ISBN:
(纸本)9781510604315
Epilepsy is a neurological disorder which can, if not controlled, potentially cause unexpected death. It is extremely crucial to have accurate automatic patternrecognition and datamining techniques to detect the onset of seizures and inform care-givers to help the patients. EEG signals are the preferred biosignals for diagnosis of epileptic patients. Most of the existing patternrecognition techniques used in EEG analysis leverage the notion of supervised machinelearning algorithms. Since seizure data are heavily under-represented, such techniques are not always practical particularly when the labeled data is not sufficiently available or when disease progression is rapid and the corresponding EEG footprint pattern will not be robust. Furthermore, EEG pattern change is highly individual dependent and requires experienced specialists to annotate the seizure and non-seizure events. In this work, we present an unsupervised technique to discriminate seizures and non-seizures events. We employ power spectral density of EEG signals in different frequency bands that are informative features to accurately cluster seizure and non-seizure events. The experimental results tried so far indicate achieving more than 90% accuracy in clustering seizure and non-seizure events without having any prior knowledge on patient's history.
The analysis of the typhoon is based on the manual patternrecognition of cloud patterns on meteorological satellite images by human experts, but this process may be unstable and unreliable, and we think could be impr...
详细信息
ISBN:
(纸本)354044016X
The analysis of the typhoon is based on the manual patternrecognition of cloud patterns on meteorological satellite images by human experts, but this process may be unstable and unreliable, and we think could be improved by taking advantage of both the large collection of past observations and the state-of-the-art machinelearning methods, among which kernel methods, such as support vector machines (SVM) and kernel PCA, are the focus of the paper. To apply the "learning-from-data" paradigm to typhoon analysis, we built the collection of more than 34,000 well-framed typhoon images to be used for spatio-temporal datamining of typhoon cloud patterns with the aim of discovering hidden and unknown regularities contained in large image databases. In this paper, we deal with the problem of visualizing and classifying typhoon cloud patterns using kernel methods. We compare preliminary results with baseline algorithms, such as principal component analysis and a k-NN classifier, and discuss experimental results with the future direction of research.
This paper concerns a novel application of machinelearning to Magnetic Resonance Imaging (MRI) by considering Neural Network models for the problem of image estimation from sparsely sampled k-space. Effective solutio...
详细信息
ISBN:
(纸本)3540665994
This paper concerns a novel application of machinelearning to Magnetic Resonance Imaging (MRI) by considering Neural Network models for the problem of image estimation from sparsely sampled k-space. Effective solutions to this problem are indispensable especially when dealing with MRI of dynamic phenomena since then, rapid sampling in k-space is required. The goal in such a case is to reduce the measurement time by omitting as many scanning trajectories as possible. This approach, however, entails underdetermined equations and leads to poor image reconstruction. It is proposed here that significant improvements could be achieved concerning image reconstruction if a procedure, based on machinelearning, for estimating the missing samples of complex k-space were introduced. To this end, the viability of involving Supervised and Unsupervised Neural Network algorithms for such a problem is considered and it is found that their image reconstruction results are very favorably compared to the ones obtained by the trivial zero-filled k-space approach or traditional more sophisticated interpolation approaches.
暂无评论