Ability to provide convenient access to scientific documents becomes a difficult problem due to large and constantly increasing number of incoming documents and extensive manual work associated withtheir storage, des...
详细信息
ISBN:
(纸本)0769524958
Ability to provide convenient access to scientific documents becomes a difficult problem due to large and constantly increasing number of incoming documents and extensive manual work associated withtheir storage, description and classification. this requires intelligent search and classification capabilities for users to find required information. It is especially true for repositories of scientific medical articles due to their extensive use, large size and number of new documents, and well maintained structure. this research aims to provide an automated method for classification of articles into the structure of medical document repositories, which would support currently performed extensive manual work. the proposed method classifies articles from the largest medical repository, MEDLINE, using state of the art datamining technology. the method is based on a novel associative classification technique which considers recurrent items and most importantly multi-label characteristic of the MEDLINE data. Based on large scale experiments that utilize 350,000 documents several different classification algorithms have been compared including both recurrent and non-recurrent associative classification. the algorithms are capable of assigning each medical document to several classes (multi-label classification) and are characterized by relatively high accuracy. We also investigate different measures of classification quality and point out pros and cons of each. Based on experimental result we show that recurrent item based associative classification demonstrates superior performance and propose three alternative setups that allow the user to obtain different desired classification qualities.
Much information has been hierarchically organized to facilitate information browsing, retrieval, and dissemination. In practice, much information may be entered at any time, but only a small subset of the information...
详细信息
this work suggests an unsupervised fuzzy clustering algorithm based on the concept of participatory learning introduced by Yager in the nineties. the performance of the algorithm is verified with synthetic data sets a...
详细信息
this work suggests an unsupervised fuzzy clustering algorithm based on the concept of participatory learning introduced by Yager in the nineties. the performance of the algorithm is verified with synthetic data sets and withthe well-known Iris data. In both circumstances the participatory learning algorithm determines the expected number of clusters and the corresponding cluster centers successfully. Comparisons with Gustafson-Kessel (GK) and modified fuzzy k-means (MFKM) are included to show the effectiveness of the participatory approach in data clustering
this paper aims to take general tensors as inputs for supervised learning. A supervised tensor learning (STL) framework is established for convex optimization based learning techniques such as support vector machines ...
详细信息
this paper aims to take general tensors as inputs for supervised learning. A supervised tensor learning (STL) framework is established for convex optimization based learning techniques such as support vector machines (SVM) and minimax probability machines (MPM). Within the STL framework, many conventional learningmachines can be generalized to take n/sup th/-order tensors as inputs. We also study the applications of tensors to learningmachine design and feature extraction by linear discriminant analysis (LDA). Our method for tensor based feature extraction is named the tenor rank-one discriminant analysis (TR1DA). these generalized algorithms have several advantages: 1) reduce the curse of dimension problem in machinelearning and datamining; 2) avoid the failure to converge; and 3) achieve better separation between the different categories of samples. As an example, we generalize MPM to its STL version, which is named the tensor MPM (TMPM). TMPM learns a series of tensor projections iteratively. It is then evaluated against the original MPM. Our experiments on a binary classification problem show that TMPM significantly outperforms the original MPM.
the support vector machine (SVM) is considered here in the context of pattern classification, the emphasis is on the soft margin classifier which uses regularization to handle non-separable learning samples. We presen...
详细信息
ISBN:
(纸本)0769521428
the support vector machine (SVM) is considered here in the context of pattern classification, the emphasis is on the soft margin classifier which uses regularization to handle non-separable learning samples. We present an SVM parameter estimation algorithm that first identifies a subset of,the learning samples that we call the support set and then determines not only the weights of the classifier but, also the hyperparameter that controls the influence of the regularizing penalty term, on basis thereof. We provide numerical results using several data sets from the public domain.
Various definitions and frameworks for discovering frequent trees in forests have been developed recently. At the heart of these frameworks lies the notion of matching, which determines when a pattern tree matches a t...
详细信息
ISBN:
(纸本)0769521428
Various definitions and frameworks for discovering frequent trees in forests have been developed recently. At the heart of these frameworks lies the notion of matching, which determines when a pattern tree matches a tree in a data set. We introduce a novel notion of tree matching for use in frequent tree mining and we show that it generalizes the framework of Zaki while still being more specific than that of Termier et al. Furthermore, we show how Zaki's TreeMinerV algorithm can be adapted towards our notion of tree matching. Experiments show the promise of the approach.
the C4.5 Decision Tree and Naive Bayes learners are known to produce unreliable probability forecasts. We have used simple Binning [11] and Laplace Transform [2] techniques to improve the reliability of these learners...
详细信息
ISBN:
(纸本)0769521428
the C4.5 Decision Tree and Naive Bayes learners are known to produce unreliable probability forecasts. We have used simple Binning [11] and Laplace Transform [2] techniques to improve the reliability of these learners and compare their effectiveness withthat of the newly developed Venn Probability machine (VPM) meta-learner [9]. We assess improvements in reliability using loss functions, Receiver Operator Characteristic (ROC) curves and Empirical Reliability Curves (ERC). the VPM outperforms the simple techniques to improve reliability, although at the cost of increased computational intensity and slight increase in error rate. these trade-offs are discussed.
In this paper a discriminative manifold learning method for face recognition is proposed which achieved the discriminative embedding the high dimensional face data into a low dimensional hidden manifold. Unlike the re...
详细信息
ISBN:
(纸本)0769521282
In this paper a discriminative manifold learning method for face recognition is proposed which achieved the discriminative embedding the high dimensional face data into a low dimensional hidden manifold. Unlike the recently proposed LLE, Isomap and Eigenmap algorithms, which are based on reconstruction purpose, our method use the RCA algorithm to achieve nonlinear embedding and data discrimination at the same time. Also, the LLE and Isomap algorithms are crucially depends on the appropriateness of the neighborhood construction rule, in this paper a CK-nearest neighborhood rule is proposed to achieve better neighborhood construction. Experimental results indicate the promising performance of the proposed method.
MAP estimation of Gaussian mixtures through maximisation of penalised likelihoods was used to learn models of spatial context. this enabled prior beliefs about the scale, orientation and elongation of semantic regions...
详细信息
ISBN:
(纸本)0769521282
MAP estimation of Gaussian mixtures through maximisation of penalised likelihoods was used to learn models of spatial context. this enabled prior beliefs about the scale, orientation and elongation of semantic regions to be encoded, encouraging one-to-one correspondences between mixture components and these regions. In conjunction with minimum description lengththis enabled automatic learning of inactivity zones and entry zones from track data in a supportive home environment.
learning Vector Quantization networks are generally considered a powerful patternrecognition tool. their main drawback, however, is the Competitive learning algorithm they are based upon, that suffers of the so calle...
详细信息
ISBN:
(纸本)0769521282
learning Vector Quantization networks are generally considered a powerful patternrecognition tool. their main drawback, however, is the Competitive learning algorithm they are based upon, that suffers of the so called under-utilized or dead unit problem. To solve this problem, algorithms substantially based on a modified distance calculation, such as the Frequency Sensitive Competitive learning (FSCL), have been proposed, but their attainable performance strongly depends on the selection of an appropriate number of neurons. this choice generally require knowledge about the number of clusters in the feature space. In this paper we propose a new supervised training algorithm for LVQ neural networks, which provide the optimal number of neurons for each class by dynamically adding or removing neurons on the basis of a measure of their performance. the experimental results, performed on different databases of synthetic data, confirmed the effectiveness of our approach.
暂无评论