Because of the fast technological progress, the amount of information which is stored in databases is rapidly increasing. In addition, new applications require the storage and retrieval of complex multimedia objects w...
详细信息
ISBN:
(纸本)1581131712
Because of the fast technological progress, the amount of information which is stored in databases is rapidly increasing. In addition, new applications require the storage and retrieval of complex multimedia objects which are often represented by high-dimensional feature vectors. Finding the valuable information hidden in those databases is a difficult task. Cluster analysis is one of the basic techniques which is often applied in analyzing large data sets. Originating from the area of statistics, most cluster analysis algorithms have originally been developed for relatively small data sets. In the recent years, the clustering algorithms have been extended to efficiently work on large data sets, and some of them even allow the clustering of high-dimensional feature vectors. Many such methods use some kind of an index structure for an efficient retrieval of the required data;other approaches are based on preprocessing for a more efficient clustering. the main goal of the tutorial is to provide an overview of the state-of-the-Art in cluster discovery methods for large databases, covering well-known clustering methods from related fields such as statistics, patternrecognition, and machinelearning, as well as database techniques which allow them to work efficiently on large databases. the target audience of the tutorial are researchers and practitioners from statistics, databases, and machinelearning, who are interested in the state-of-the art of cluster discovery methods and their applications to large databases. the tutorial especially addresses people from academia who are interested in developing new cluster discovery algorithms, and people from industry who want to apply cluster discovery methods in analyzing large databases. the tutorial is structured as follows: First, we give a brief motivation for clustering from modern datamining applications. We discuss important design decisions and explain the interdependencies withthe properties of data. We then intro
Feature subset selection refers to a datamining enhancement technique which aims to reduce the number of features to be used. this reduction is expected to improve the performance of datamining algorithms to be used...
详细信息
Feature subset selection refers to a datamining enhancement technique which aims to reduce the number of features to be used. this reduction is expected to improve the performance of datamining algorithms to be used, in aspects of speed, accuracy and simplicity. Although there has been some work on feature subset selection, research into the theoretically computational complexity of this problem and on the optimal selection of fuzzy-valued feature subsets has not been carried out. this paper focuses on a problem called optimal fuzzy-valued feature subset selection (OFFSS) which is regarded as being important but difficult in machinelearning and patternrecognition. the measure of the quality of a set of features is defined by the overall overlapping degree between two classes of examples and the size of feature subset. the main contributions of this paper are that: (1) the concept of fuzzy extension matrix is introduced; (2) the computational complexity of OFFSS is proved to be NP-hard; (3) a simple but powerful heuristic algorithm for OFFSS is given; and (4) the feasibility and simplicity of the proposed algorithm are demonstrated via applications of OFFSS to input selection of neuro-fuzzy systems and to fuzzy decision tree induction.
learning to predict rare events from sequences of events with categorical features is an important, real-world, problem that existing statistical and machinelearning methods are not well suited to solve. this paper d...
详细信息
In this paper, we explore the use of machinelearning and datamining to improve the prediction of travel times in an automobile. We consider two formulations of this problem, one that involves predicting speeds at di...
详细信息
Like model selection in statistics, the choice of appropriate datamining Algorithms (DM-Algorithms) is a very important task in the process of Knowledge Discovery. Due to this fact it is necessary to have sophisticat...
详细信息
Huge masses of digital data about products, customers and competitors have become available for companies in the services sector. In order to exploit its inherent (and often hidden) knowledge for improving business pr...
详细信息
Business users and analysts commonly use spreadsheets and 2D plots to analyze and understand their data. On-line Analytical Processing (OLAP) provides these users with added flexibility in pivoting data around differe...
详细信息
the proceedings contain 23 papers. the special focus in this conference is on Grammatical Inference. the topics include: Results of the abbadingo one DFA learning competition and a new evidence-driven state merging al...
ISBN:
(纸本)3540647767
the proceedings contain 23 papers. the special focus in this conference is on Grammatical Inference. the topics include: Results of the abbadingo one DFA learning competition and a new evidence-driven state merging algorithm;learning k-variable pattern languages efficiently stochastically finite on average from positive data;meaning helps learning syntax;a polynomial time incremental algorithm for learning DFA;the data driven approach applied to the OSTIA algorithm;grammar model and grammar induction in the system NL PAGE;approximate learning of random subsequential transducers;learning stochastic finite automata from experts;learning deterministic finite automaton with a recurrent neural network;applying grammatical inference in learning a language model for oral dialogue;real language learning;a stochastic search approach to grammar induction;transducer-learning experiments on language understanding;locally threshold testable languages in strict sense;learning a subclass of linear languages from positive structural information;grammatical inference in document recognition;stochastic inference of regular tree languages;how considering incompatible state mergings may reduce the DFA induction search tree;learning regular grammars to model musical style;learning a subclass of context-free languages;using symbol clustering to improve probabilistic automaton inference and a performance evaluation of automatic survey classifiers;pattern discovery in biosequences.
First, a short introduction to inductive logic programming and machinelearning is presented and then an inductive database mining query language RDM (Relational database mining language). RDM integrates concepts from...
详细信息
the discovery of the relationships between chemical structure and biological function is central to biological science and medicine. In this paper we apply datamining to the problem of predicting chemical carcinogeni...
详细信息
暂无评论