Protein fingerprints are groups of conserved motifs which can be used as diagnostic signatures to identify and characterize collections of protein sequences. these fingerprints are stored in the prints database after ...
详细信息
In this paper, we address the characterization task and we present a general framework for the characterization of a target set of objects by means of their own properties, but also the properties of objects linked to...
详细信息
ISBN:
(纸本)3540200851
In this paper, we address the characterization task and we present a general framework for the characterization of a target set of objects by means of their own properties, but also the properties of objects linked to them. According to the kinds of objects, various links can be considered. For instance, in the case of relational databases, associations are the straightforward links between pairs of tables. We propose Caracterix, a new algorithm for mining characterization rules and we show how it can be used on multi-relational and spatial databases.
the problem of efficiently finding patterns in massive time series databases has attracted great interest, and, at least for the Euclidean distance measure, may now be regarded as a solved problem. However in recent y...
详细信息
ISBN:
(纸本)3540200851
the problem of efficiently finding patterns in massive time series databases has attracted great interest, and, at least for the Euclidean distance measure, may now be regarded as a solved problem. However in recent years there has been an increasing awareness that Euclidean distance is inappropriate for many real world applications. the limitations of Euclidean distance stems from the fact that it is very sensitive to distortions in the time axis. A partial solution to this problem, Dynamic Time Warping (DTW), aligns the time axis before calculating the Euclidean distance. However, DTW can only address the problem of local scaling. As we demonstrate in this work, uniform scaling may be just as important in many domains, including applications as diverse as bioinformatics, space telemetry monitoring and motion editing for computer animation. In this work, we demonstrate a novel technique to speed up similarity search under uniform scaling. As we will demonstrate, our technique is simple and intuitive, and can achieve a speedup of 2 to 3 orders of magnitude under realistic settings.
the proceedings contain 43 papers. the special focus in this conference is on principles of Data Mining and knowledgediscovery. the topics include: Optimized substructure discovery for semi-structured data;fast outli...
ISBN:
(纸本)3540440372
the proceedings contain 43 papers. the special focus in this conference is on principles of Data Mining and knowledgediscovery. the topics include: Optimized substructure discovery for semi-structured data;fast outlier detection in high dimensional spaces;fast algorithms for mining emerging patterns;on the discovery of weak periodicities in large time series;the need for low bias algorithms in classification learning from large data sets;mining all non-derivable frequent itemsets;iterative data squashing for boosting based on a distribution-sensitive distance;finding association rules with some very frequent attributes;self-aggregation in scaled principal component space;a classification approach for prediction of target events in temporal sequences;privacy-oriented data mining by proof checking;an empirical study of feature selection metrics for text classification;generating actionable knowledge by expert-guided subgroup discovery;multiscale comparison of temporal patterns in time-series medical databases;association rules for expressing gradual dependencies;support approximations using bonferroni-type inequalities;comparing two-phase rule induction to cost-sensitive boosting;dependency detection in mobimine and random matrices;long-term learning for web search engines;spatial subgroup mining integrated in an object-relational spatial database;involving aggregate functions in multi-relational search;information extraction in structured documents using tree automata induction;algebraic techniques for analysis of large discrete-valued datasets;geography of differences between two classes of data;rule induction for classification of gene expression array data and iteratively selecting feature subsets for mining from high-dimensional databases.
1 OpeningPKDD 2001, the 5theuropeanconference on principles of knowledgediscovery in databases (PKDD), was held in Freiburg, Baden-Württemberg, Germany, this year (Monday 3 to thursday 7 September), and co-loc...
1 OpeningPKDD 2001, the 5theuropeanconference on principles of knowledgediscovery in databases (PKDD), was held in Freiburg, Baden-Württemberg, Germany, this year (Monday 3 to thursday 7 September), and co-located withthe 12theuropeanconference on Machine Learning (ECML 2001). the proceedings comprised two volumes, one for PKDD (De Raedt & Siebes, 2001) and one for ECML (De Raedt & Flach, 2001); and form part of the Springer Lecture Notes on Artificial Intelligence (LNAI) series. the conference was held in the University buildings in the centre of the old town. Freiburg and the surrounding area were for many years part of the Austro-Hungarian empire and thus the university was described to us as being one of the oldest Austrian Universities.
the proceedings contain 41 papers. the topics discussed include: optimized substructure discovery for semi-structured data;fast outlier detection in high dimensional spaces;data mining in schizophrenia research - prel...
ISBN:
(纸本)3540440372
the proceedings contain 41 papers. the topics discussed include: optimized substructure discovery for semi-structured data;fast outlier detection in high dimensional spaces;data mining in schizophrenia research - preliminary analysis;fast algorithms for mining emerging patterns;on the discovery of weak periodicities in large time series;the need for low bias algorithms in classification learning from large data sets;mining all non-derivable frequent itemsets;iterative data squashing for boosting based on a distribution- sensitive distance;finding association rules with some very frequent attributes;unsupervised learning: self-aggregation in scaled principal component space;a classification approach for prediction of target events in temporal sequences;privacy-oriented data mining by proof checking;and choose your words carefully: an empirical study of feature selection metrics for text classification.
Many data mining tasks can be seen as an instance of the problem of finding the most interesting (according to some utility function) patterns in a large database. In recent years, significant progress has been achiev...
详细信息
this paper presents a method for analyzing time-series data on laboratory examinations based on phase-constraint multiscale matching and rough clustering. Multiscale matching compares two subsequences throughout vario...
详细信息
Much of the existing work in machine learning and data mining has relied on devising efficient techniques to build accurate models from the data. Research on how the accuracyof a model changes as a function of dynamic...
详细信息
暂无评论