One of the tasks of datamining is classification, which provides a mapping from attributes (observations) to pre-specified classes. Classification models are built by using underlying data. In principle, the models b...
详细信息
One of the tasks of datamining is classification, which provides a mapping from attributes (observations) to pre-specified classes. Classification models are built by using underlying data. In principle, the models built with more data yield better results (are more accurate). However, the relationship between the available data and the performance is not well understood. How much data to use, or when to stop the learning process, are the key questions. In this paper we give a suggestion as when to stop the learning process.
In this paper some applications of intelligent systems are presented. In spite of all knowledge-based technologies, which have been proposed recently, the main driving power of "intelligent" approaches is st...
详细信息
In this paper some applications of intelligent systems are presented. In spite of all knowledge-based technologies, which have been proposed recently, the main driving power of "intelligent" approaches is still (and even increasingly) huge computational power of modern information technology, that is used to process vast amounts of data. Two main technologies of intelligent systems in medicine, both based on data processing and search, are thus optimization and machinelearning. They are used for different kinds of medical problems: datamining, diagnosing, medical imaging and signal processing, planning and scheduling, etc. In the paper we summarize some of the most evident applications of this kind.
Recent advancement and wide use of highthroughput technologies for biological research are producing enormous size of biological datasets distributed worldwide. datamining techniques and machinelearning methods prov...
详细信息
Recent advancement and wide use of highthroughput technologies for biological research are producing enormous size of biological datasets distributed worldwide. datamining techniques and machinelearning methods provide useful tools for knowledge discovery in this field. The goal of this paper is to present the design of a pattern classifier to mine distributed biological dataset. The proposed classifier is built around a special class of computing model termed as Fuzzy Cellular Automata (FCA). A concrete example of the effectiveness of this approach is provided by demonstrating its success in gene identification problem. Extensive experimental results confirm the scalability of the FCA to handle distributed biological datasets. Application of the proposed model to solve gene identification problem establishes the FCA as the classifier ideally suited for biological datamining in a distributed environment.
Naive Bayes is one of the most efficient and effective inductive learning algorithms for machinelearning and datamining. Its competitive performance in classification is surprising, because the conditional independe...
详细信息
Naive Bayes is one of the most efficient and effective inductive learning algorithms for machinelearning and datamining. Its competitive performance in classification is surprising, because the conditional independence assumption on which it is based is rarely true in real-world applications. An open question is: what is the true reason for the surprisingly good performance of Naive Bayes in classification? In this paper, we propose a novel explanation for the good classification performance of Naive Bayes. We show that, essentially, dependence distribution plays a crucial role. Here dependence distribution means how the local dependence of an attribute distributes in each class, evenly or unevenly, and how the local dependences of all attributes work together, consistently (supporting a certain classification) or inconsistently (canceling each other out). Specifically, we show that no matter how strong the dependences among attributes are, Naive Bayes can still be optimal if the dependences distribute evenly in classes, or if the dependences cancel each other out. We propose and prove a sufficient and necessary condition for the optimality of Naive Bayes. Further, we investigate the optimality of Naive Bayes under the Gaussian distribution. We present and prove a sufficient condition for the optimality of Naive Bayes, in which the dependences among attributes exist. This provides evidence that dependences may cancel each other out. Our theoretic analysis can be used in designing learning algorithms. In fact, a major class of learning algorithms for Bayesian networks are conditional independence-based (or Cl-based), which are essentially based on dependence. We design a dependence distribution-based algorithm by extending the ChowLiu algorithm, a widely used CI based algorithm. Our experiments show that the new algorithm outperforms the ChowLiu algorithm, which also provides empirical evidence to support our new explanation.
The ability to discover the topic of a large set of text documents using relevant keyphrases is usually regarded as a very tedious task if done by hand. Automatic keyphrase extraction from multi-document data sets or ...
详细信息
The patterns of ultrasonic reflected echoes from objects contain information about the geometric shape, size, orientation and the surface material properties of the reflector. Accurate estimation of the ultrasonic ech...
详细信息
The patterns of ultrasonic reflected echoes from objects contain information about the geometric shape, size, orientation and the surface material properties of the reflector. Accurate estimation of the ultrasonic echo signal pattern is essential for recognition of the target object. We propose a method to classify different objects having specific geometric shape such as cylindrical, rectangular, sphere and conical of different size and material. Here continuous wavelet transform (CWT) has been used for feature extraction. In the present work an attempt has been made to classify the pattern inherent in the features extracted through CWT of different echo signals with the help of two different machinelearning algorithms like self organizing feature map (SOFM) and support vector machine (SVM). CWT allows a time domain signal to be transformed into time frequency domain such that frequency characteristics and the location of particular features in a time series may be highlighted simultaneously. Thus it allows accurate extraction of features from the non-stationary signals like ultrasonic echo envelop. SOFM transforms the input of arbitrary dimension into a one or two dimensional discrete map subject to a topological (neighbourhood preserving) constraint. In the present work the SOFM algorithm with Kohonen's learning and SVM in regression mode has been used to classify the patterns inherent in the features extracted through CWT of different echo envelop
datamining is the quest for knowledge in databases to uncover previously unimagined relationships in the data. This paper generalizes Naive-Bayes classification technique using fuzzy set theory, when the available nu...
详细信息
datamining is the quest for knowledge in databases to uncover previously unimagined relationships in the data. This paper generalizes Naive-Bayes classification technique using fuzzy set theory, when the available numerical probabilistic information is incomplete or partially correct. We consider a training dataset, where attribute values have certain similarities in nature. Though nothing can replace precise and complete probabilistic information, a useful classification system for datamining can be built even with imperfect data by introducing domain-dependent constraints. This observation is analyzed here based on fuzzy proximity relations for the domain of each attribute. The study shows that this approach is highly suitable for real-world applications, especially when databases contain uncertain information
Summary form only given. Cluster analysis is an important tool in a variety of scientific areas such as patternrecognition, information retrieval, microarray, datamining, and so forth. Although many clustering proce...
详细信息
Summary form only given. Cluster analysis is an important tool in a variety of scientific areas such as patternrecognition, information retrieval, microarray, datamining, and so forth. Although many clustering procedures such as hierarchical clustering, k-means or self-organizing maps, aim to construct an optimal partition on the set of objects I or, sometimes, on the set of variables J, there are other methods, called block clustering methods, which consider simultaneously the two sets and organize the data into homogeneous blocks. These methods are speedy and can process large data sets. They require much less computations than if one works on I and J separately. The mixture model is undoubtedly one of the greatest contributions to clustering. Recently we have proposed a generalized EM algorithm (GEM) to maximize a variational approximation of the likelihood. The proposed algorithm is an iterative algorithm whose steps are carried out by the application of the EM algorithm on intermediate mixture models. This paper focus on the clustering context. It deals to compare block GEM and two-way EM, i.e. EM applied separately on I and J. Results on simulated data are given, confirming that block GEM gives much better performance than two-way EM.
This paper aims to take general tensors as inputs for supervised learning. A supervised tensor learning (STL) framework is established for convex optimization based learning techniques such as support vector machines ...
详细信息
This paper aims to take general tensors as inputs for supervised learning. A supervised tensor learning (STL) framework is established for convex optimization based learning techniques such as support vector machines (SVM) and minimax probability machines (MPM). Within the STL framework, many conventional learningmachines can be generalized to take n/sup th/-order tensors as inputs. We also study the applications of tensors to learningmachine design and feature extraction by linear discriminant analysis (LDA). Our method for tensor based feature extraction is named the tenor rank-one discriminant analysis (TR1DA). These generalized algorithms have several advantages: 1) reduce the curse of dimension problem in machinelearning and datamining; 2) avoid the failure to converge; and 3) achieve better separation between the different categories of samples. As an example, we generalize MPM to its STL version, which is named the tensor MPM (TMPM). TMPM learns a series of tensor projections iteratively. It is then evaluated against the original MPM. Our experiments on a binary classification problem show that TMPM significantly outperforms the original MPM.
This paper describes a systems architecture for a hybrid centralised/swarm based multi-agent system. The issue of local goal assignment for agents is investigated through the use of a global agent which teaches the ag...
详细信息
This paper describes a systems architecture for a hybrid centralised/swarm based multi-agent system. The issue of local goal assignment for agents is investigated through the use of a global agent which teaches the agents responses to given situations. We implement a test problem in the form of a pursuit game, where the multi-agent system is a set of captor agents. The agents learn solutions to certain board positions from the global agent if they are unable to find a solution themselves. The captor agents learn through the use of MLP neural networks. The global agent is able to solve board positions through the use of a genetic algorithm. The cooperation between agents and the results of the simulation are discussed here.
暂无评论