the task of extracting knowledge from text is an important research problem for information processing and document understanding. Approaches to capture the semantics of picture objects in documents constitute subject...
详细信息
We present how the supervised machinelearning techniques can be used to predict quality characteristics in an important chemical engineering application: the wine distillate maturation process. A number of experiment...
详细信息
ISBN:
(纸本)3540269231
We present how the supervised machinelearning techniques can be used to predict quality characteristics in an important chemical engineering application: the wine distillate maturation process. A number of experiments have been conducted with six regression-based algorithms, where the M5' algorithm was proved to be the most appropriate for predicting the organoleptic properties of the matured wine distillates. the rules that are exported by the algorithm are as accurate as human expert's decisions.
this paper is concerned with time series of graphs and proposes a novel scheme that is able to predict the presence or absence of nodes in a graph. the proposed scheme is based on decision trees that are induced from ...
详细信息
ISBN:
(纸本)3540269231
this paper is concerned with time series of graphs and proposes a novel scheme that is able to predict the presence or absence of nodes in a graph. the proposed scheme is based on decision trees that are induced from a training set of sample graphs. the work is motivated by applications in computer network monitoring. However, the proposed prediction method is generic and can be used in other applications as well. Experimental results with graphs derived from real computer networks indicate that a correct prediction rate of up to 97% can be achieved.
Several cost-sensitive boosting algorithms have been reported as effective methods in dealing with class imbalance problem. Misclassification costs, which reflect the different level of class identification importance...
详细信息
Research in protein structure and function is one of the most important subjects in modem bioinformatics and computational biology. It often uses advanced datamining and machinelearning methodologies to perform pred...
详细信息
ISBN:
(纸本)3540269231
Research in protein structure and function is one of the most important subjects in modem bioinformatics and computational biology. It often uses advanced datamining and machinelearning methodologies to perform prediction or patternrecognition tasks. this paper describes a new method for prediction of protein secondary structure content based on feature selection and multiple linear regression. the method develops a novel representation of primary protein sequences based on a large set of 495 features. the feature selection task performed using very large set of nearly 6,000 proteins, and tests performed on standard non-homologues protein sets confirm high quality of the developed solution. the application of feature selection and the novel representation resulted in 14-15% error rate reduction when compared to results achieved when standard representation is used. the prediction tests also show that a small set of 5-25 features is sufficient to achieve accurate prediction for both helix and strand content for non-homologous proteins.
this paper uses a set of 3D geometric measures withthe purpose of characterizing lung nodules as malignant or benign. Based on a sample of 36 nodules, 29 benign and 7 malignant, these measures are analyzed with a tec...
详细信息
An efficient low-level word image representation plays a crucial role in general cursive word recognition. this paper proposes a novel representation scheme, where a word image can be represented as two sequences of f...
详细信息
We propose an approach to embed time series data in a vector space based on the distances obtained from Dynamic Time Warping (DTW), and to classify them in the embedded space. Under the problem setting in which both l...
详细信息
ISBN:
(纸本)3540269231
We propose an approach to embed time series data in a vector space based on the distances obtained from Dynamic Time Warping (DTW), and to classify them in the embedded space. Under the problem setting in which both labeled data and unlabeled data are given beforehand, we consider three embeddings, embedding in a Euclidean space by MDS, embedding in a Pseudo-Euclidean space, and embedding in a Euclidean space by the Laplacian eigenmap technique. We have found through analysis and experiment that the embedding by the Laplacian eigemnap method leads to the best classification result. Furthermore, the proposed approach with Laplacian eigenmap embedding shows better performance than k-nearest neighbor method.
In this paper we describe a new cluster model which is based on the concept of linear manifolds. the method identifies subsets of the data which are embedded in arbitrary oriented lower dimensional linear manifolds. M...
详细信息
ISBN:
(纸本)3540269231
In this paper we describe a new cluster model which is based on the concept of linear manifolds. the method identifies subsets of the data which are embedded in arbitrary oriented lower dimensional linear manifolds. Minimal subsets of points are repeatedly sampled to construct trial linear manifolds of various dimensions. Histograms of the distances of the points to each trial manifold are computed. the sampling corresponding to the histogram having the best separation between a mode near zero and the rest is selected and the data points are partitioned on the basis of the best separation. the repeated sampling then continues recursively on each block of the partitioned data. A broad evaluation of some hundred experiments over real and synthetic data sets demonstrates the general superiority of this algorithm over any of the competing algorithms in terms of stability, accuracy, and computation time.
Whereas the early frequent patternmining methods admitted only relatively simple data and pattern formats (e.g., sets, sequences, etc.), there is nowadays a clear push towards the integration of ever larger portions ...
详细信息
ISBN:
(纸本)0769524958
Whereas the early frequent patternmining methods admitted only relatively simple data and pattern formats (e.g., sets, sequences, etc.), there is nowadays a clear push towards the integration of ever larger portions of domain knowledge in, the mining process in order to increase the precision and. the abstraction-level of the retrieved patterns and hence ease their interpretation. We present here a practically-motivated study of a frequent pattern extraction from sequences of data objects that are described within a domain ontology. As the complexity of the descriptive structures is high, an entire framework for the pattern extraction process had, to be defined. the key elements thereof are a pair of descriptive languages, one for individuals data and another one for generic patterns, a generality relation between patterns, and an Apriori-like method for patternmining.
暂无评论