Research in protein structure and function is one of the most important subjects in modem bioinformatics and computational biology. It often uses advanced datamining and machinelearning methodologies to perform pred...
详细信息
ISBN:
(纸本)3540269231
Research in protein structure and function is one of the most important subjects in modem bioinformatics and computational biology. It often uses advanced datamining and machinelearning methodologies to perform prediction or patternrecognition tasks. this paper describes a new method for prediction of protein secondary structure content based on feature selection and multiple linear regression. the method develops a novel representation of primary protein sequences based on a large set of 495 features. the feature selection task performed using very large set of nearly 6,000 proteins, and tests performed on standard non-homologues protein sets confirm high quality of the developed solution. the application of feature selection and the novel representation resulted in 14-15% error rate reduction when compared to results achieved when standard representation is used. the prediction tests also show that a small set of 5-25 features is sufficient to achieve accurate prediction for both helix and strand content for non-homologous proteins.
this paper is concerned with time series of graphs and proposes a novel scheme that is able to predict the presence or absence of nodes in a graph. the proposed scheme is based on decision trees that are induced from ...
详细信息
ISBN:
(纸本)3540269231
this paper is concerned with time series of graphs and proposes a novel scheme that is able to predict the presence or absence of nodes in a graph. the proposed scheme is based on decision trees that are induced from a training set of sample graphs. the work is motivated by applications in computer network monitoring. However, the proposed prediction method is generic and can be used in other applications as well. Experimental results with graphs derived from real computer networks indicate that a correct prediction rate of up to 97% can be achieved.
In this paper we describe a new cluster model which is based on the concept of linear manifolds. the method identifies subsets of the data which are embedded in arbitrary oriented lower dimensional linear manifolds. M...
详细信息
ISBN:
(纸本)3540269231
In this paper we describe a new cluster model which is based on the concept of linear manifolds. the method identifies subsets of the data which are embedded in arbitrary oriented lower dimensional linear manifolds. Minimal subsets of points are repeatedly sampled to construct trial linear manifolds of various dimensions. Histograms of the distances of the points to each trial manifold are computed. the sampling corresponding to the histogram having the best separation between a mode near zero and the rest is selected and the data points are partitioned on the basis of the best separation. the repeated sampling then continues recursively on each block of the partitioned data. A broad evaluation of some hundred experiments over real and synthetic data sets demonstrates the general superiority of this algorithm over any of the competing algorithms in terms of stability, accuracy, and computation time.
We propose an approach to embed time series data in a vector space based on the distances obtained from Dynamic Time Warping (DTW), and to classify them in the embedded space. Under the problem setting in which both l...
详细信息
ISBN:
(纸本)3540269231
We propose an approach to embed time series data in a vector space based on the distances obtained from Dynamic Time Warping (DTW), and to classify them in the embedded space. Under the problem setting in which both labeled data and unlabeled data are given beforehand, we consider three embeddings, embedding in a Euclidean space by MDS, embedding in a Pseudo-Euclidean space, and embedding in a Euclidean space by the Laplacian eigenmap technique. We have found through analysis and experiment that the embedding by the Laplacian eigemnap method leads to the best classification result. Furthermore, the proposed approach with Laplacian eigenmap embedding shows better performance than k-nearest neighbor method.
During the last years, computer vision tasks like object recognition and localization were rapidly expanded from passive solution approaches to active ones, that is to execute a viewpoint selection algorithm in order ...
详细信息
ISBN:
(纸本)3540269231
During the last years, computer vision tasks like object recognition and localization were rapidly expanded from passive solution approaches to active ones, that is to execute a viewpoint selection algorithm in order to acquire just the most significant views of an arbitrary object. Although fusion of multiple views can already be done reliably, planning is still limited to gathering the next best view, normally the one providing the highest immediate gain in information. In this paper, we show how to perform a generally more intelligent, long-run optimized sequence of actions by linking them with costs. therefore it will be introduced how to acquire the cost of an appropriate dimensionality in a non-empirical way while still leaving the determination of the system's basic behavior to the user. Since this planning process is accomplished by an underlying machinelearning technique, we also point out the ease of adjusting these to the expanded task and show why to use a multi-step approach for doing so.
Concept lattice, core structure in Formal Concept Analysis has been used in various fields like software engineering and knowledge discovery. In this paper, we present the integration of Association rules and Classifi...
详细信息
ISBN:
(纸本)3540269231
Concept lattice, core structure in Formal Concept Analysis has been used in various fields like software engineering and knowledge discovery. In this paper, we present the integration of Association rules and Classification rules using Concept Lattice. this gives more accurate classifiers for Classification. the algorithm used is incremental in nature. Any increase in the number of classes, attributes or transactions does not require the access to the previous database. the incremental behavior is very useful in finding classification rules for real time data such as image processing. the algorithm requires just one database pass through the entire database. Individual classes can have different support threshold and pruning conditions such as criteria for noise and number of conditions in the classifier.
Whereas the early frequent patternmining methods admitted only relatively simple data and pattern formats (e.g., sets, sequences, etc.), there is nowadays a clear push towards the integration of ever larger portions ...
详细信息
ISBN:
(纸本)0769524958
Whereas the early frequent patternmining methods admitted only relatively simple data and pattern formats (e.g., sets, sequences, etc.), there is nowadays a clear push towards the integration of ever larger portions of domain knowledge in, the mining process in order to increase the precision and. the abstraction-level of the retrieved patterns and hence ease their interpretation. We present here a practically-motivated study of a frequent pattern extraction from sequences of data objects that are described within a domain ontology. As the complexity of the descriptive structures is high, an entire framework for the pattern extraction process had, to be defined. the key elements thereof are a pair of descriptive languages, one for individuals data and another one for generic patterns, a generality relation between patterns, and an Apriori-like method for patternmining.
In supervised machinelearning, the partitioning of the values (also called grouping) of a categorical attribute aims at constructing a new synthetic attribute which keeps the information of the initial attribute and ...
详细信息
ISBN:
(纸本)3540269231
In supervised machinelearning, the partitioning of the values (also called grouping) of a categorical attribute aims at constructing a new synthetic attribute which keeps the information of the initial attribute and reduces the number of its values. In case of very large number of values, the risk of overfilling the data increases sharply and building good groupings becomes difficult. In this paper, we propose two new grouping methods founded on a Bayesian approach, leading to Bayes optimal groupings. the first method exploits a standard schema for grouping models and the second one extends this schema by managing a "garbage" group dedicated to the least frequent values. Extensive comparative experiments demonstrate that the new grouping methods build high quality groupings in terms of predictive quality, robustness and small number of groups.
Ranked transformations should preserve a priori given ranked relations (order) between some feature vectors. Designing ranked models includes feature selection tasks. Components of feature vectors which are not import...
详细信息
ISBN:
(纸本)3540269231
Ranked transformations should preserve a priori given ranked relations (order) between some feature vectors. Designing ranked models includes feature selection tasks. Components of feature vectors which are not important for preserving the vectors order should be neglected. this way unimportant dimensions are greatly reduced in the feature space. It is particularly important in the case of "long" feature vectors, when a relatively small number of objects is represented in a high dimensional feature space, in the paper, we describe designing ranked models withthe feature selection which is based on the minimisation of convex and piecewise linear (CPL) functions.
We propose an unsupervised, probabilistic method for learning visual feature hierarchies. Starting from local, low-level features computed at interest point locations, the method combines these primitives into high-le...
详细信息
ISBN:
(纸本)3540269231
We propose an unsupervised, probabilistic method for learning visual feature hierarchies. Starting from local, low-level features computed at interest point locations, the method combines these primitives into high-level abstractions. Our appearance-based learning method uses local statistical analysis between features and Expectation-Maximization to identify and code spatial correlations. Spatial correlation is asserted when two features tend to occur at the same relative position of each other. this learning scheme results in a graphical model that constitutes a probabilistic representation of a flexible visual feature hierarchy. For feature detection, evidence is propagated using Belief Propagation. Each message is represented by a Gaussian mixture where each component represents a possible location of the feature. In experiments, the proposed approach demonstrates efficient learning and robust detection of object models in the presence of clutter and occlusion and under view point changes.
暂无评论