When descriptions of data values in a database are too concrete or too detailed, the computational complexity needed to discover useful knowledge from the database will be generally increased. Furthermore, discovered ...
详细信息
When descriptions of data values in a database are too concrete or too detailed, the computational complexity needed to discover useful knowledge from the database will be generally increased. Furthermore, discovered knowledge tends to become complicated. A notion of data abstraction seems useful to resolve this kind of problems, as we obtain a smaller and more general database after the abstraction, from which we can quickly extract more abstract knowledge that is expected to be easier to understand. In general, however, since there exist several possible abstractions, we have to carefully select one according to which the original database is generalized. An inadequate selection would make the accuracy of extracted knowledge worse. From this point of view, we propose in this paper a method of selecting an appropriate abstraction from possible ones, assuming that our task is to construct a decision tree from a relational database. Suppose that, for each attribute in a relational database, we have a class of possible abstractions for the attribute values. As an appropriate abstraction for each attribute, we prefer an abstraction such that, even after the abstraction, the distribution of target classes necessary to perform our classification task can be preserved within an acceptable error range given by user. By the selected abstractions, the original database can be transformed into a small generalized database written in abstract values. Therefore, it would be expected that, from the generalized database, we can construct a decision tree whose size is much smaller than one constructed from the original database. Furthermore, such a size reduction can be justified under some theoretical assumptions. The appropriateness of abstraction is precisely defined in terms of the standard information theory. Therefore, we call our abstraction framework Information Theoretical Abstraction. We show some experimental results obtained by a system ITA that is an implementation of
Web Usage mining (WUM) focus on the interaction behavior between Web users and requested Web pages in order to identify navigation patterns. This work describes a case study aimed at investigating the potential of WUM...
详细信息
Web Usage mining (WUM) focus on the interaction behavior between Web users and requested Web pages in order to identify navigation patterns. This work describes a case study aimed at investigating the potential of WUM as a framework for supporting the validation of learning site designs. The goal was to model the domain in terms of a WUM application, and to explore abstractions and types of patterns that can help site usage evaluation.
We describe a possible approach to the problem of extracting knowledge from the analysis of questionnaires through machinelearning. The idea guiding our research was to investigate the existence of association rules ...
详细信息
We describe a possible approach to the problem of extracting knowledge from the analysis of questionnaires through machinelearning. The idea guiding our research was to investigate the existence of association rules among the topics covered in a course. The data used came from the questionnaires administered to the freshmen in electronic engineering attending the course of foundation of computer science at our university. Each questionnaire was coded into feature vectors that were classified with respect to the grade obtained by the student and analysed with C4.5. Some statistical results and hints for further work are discussed.
When descriptions of data values in a database are too concrete or too detailed, the computational complexity needed to discover useful knowledge from the database will be generally increased. Furthermore, discovered ...
详细信息
When descriptions of data values in a database are too concrete or too detailed, the computational complexity needed to discover useful knowledge from the database will be generally increased. Furthermore, discovered knowledge tends to become complicated. A notion of data abstraction seems useful to resolve this kind of problems, as we obtain a smaller and more general database after the abstraction, from which we can quickly extract more abstract knowledge that is expected to be easier to understand. In general, however, since there exist several possible abstractions, we have to carefully select one according to which the original database is generalized. An inadequate selection would make the accuracy of extracted knowledge worse. From this point of view, we propose in this paper a method of selecting an appropriate abstraction from possible ones, assuming that our task is to construct a decision tree from a relational database. Suppose that, for each attribute in a relational database, we have a class of possible abstractions for the attribute values. As an appropriate abstraction for each attribute, we prefer an abstraction such that, even after the abstraction, the distribution of target classes necessary to perform our classification task can be preserved within an acceptable error range given by user. By the selected abstractions, the original database can be transformed into a small generalized database written in abstract values. Therefore, it would be expected that, from the generalized database, we can construct a decision tree whose size is much smaller than one constructed from the original database. Furthermore, such a size reduction can be justified under some theoretical assumptions. The appropriateness of abstraction is precisely defined in terms of the standard information theory. Therefore, we call our abstraction framework Information Theoretical Abstraction. We show some experimental results obtained by a system ITA that is an implementation of
We summarize our methods for the fusion of multisensor imagery based on concepts derived from neural models of visual processing and patternlearning and recognition. These methods have been applied to real-time fusio...
详细信息
We summarize our methods for the fusion of multisensor imagery based on concepts derived from neural models of visual processing and patternlearning and recognition. These methods have been applied to real-time fusion of night vision sensors in the field, airborne multispectral and hyperspectral imaging systems, and space-based multiplatform multimodality sensors. The methods enable color fused 3D visualization, as well as interactive exploitation and datamining in the form of human-guided machinelearning and search for targets and cultural features. Over the last year we have developed a user-friendly system integrated into a COTS exploitation environment known as ErdAS Imagine. We demonstrate fusion and interactive mining of low-light Visible/SWIR/MWIR/LWIR night imagery, and IKONOS multispectral imagery. We also demonstrate how target learning and search can be enabled over extended operating conditions by allowing training over multiple scenes. This is illustrated for detecting small boats in coastal waters using fused Visible/MWIR/LWIR imagery.
The construction industry is experiencing explosive growth in its capability to, generate and collect data. Advances in data storage technology have allowed the transformation of an enormous amount of data into comput...
详细信息
ISBN:
(纸本)1853129259
The construction industry is experiencing explosive growth in its capability to, generate and collect data. Advances in data storage technology have allowed the transformation of an enormous amount of data into computerized database systems. Nowadays, there are many efforts to convert the large amounts of data into useful patterns or trends. Knowledge Discovery in database (KDD) is a process that combines datamining (DM) techniques from machinelearning, patternrecognition, statistics, databases, and visualization to automatically extract concepts, interrelationships, and patterns of interest from a large database. By applying KDD and DM to the analysis of construction project data, this paper presents the results of a research that discovers the knowledge through KDD process to better identify recurring construction problems.
One of the major problems of datamining systems is the identification of classes, categories, and concepts. We introduce a new framework for categorization which is based on the concept of "pattern conception&qu...
详细信息
ISBN:
(纸本)1853129259
One of the major problems of datamining systems is the identification of classes, categories, and concepts. We introduce a new framework for categorization which is based on the concept of "pattern conception" (a term that may be contrasted to "patternrecognition", "pattern matching", "pattern perception", etc.). There are important distinctions between pattern conception and the mainstream patternrecognition models;furthermore, these distinctions lead us to new categorization information-processing architectures. The first major distinction tells us that there is more than one correct conception for each individual pattern. Each pattern may have numerous segmentations and descriptions which are fundamentally distinct but equally correct in a deep sense. Another striking distinction of pattern conception is the capability to "see as", in which context will guide the interpretation of data such as that one object may be seen as if it were another type of object, or as if it were occupying the position or role of other objects. A final and related distinction is that there should be a,relativity theory' view of concepts and categories, in which concepts are both defined by their relations to other concepts and activated from the spread of activation of other concepts. In this work, we analyze how these distinctions appear under three distinct application domains: (1) the notorious case of Bongard problems;(ii) letter-string analogies;and (iii) the game of chess (viewed as a pattern analysis problem). It may be concluded that datamining methods must be able to handle these distinctions if they are to be effective at pattern conception, and, thus, to a wide class of information categorization problems.
The term datamining refers to information elicitation. On the other hand, soft computing deals with information processing. If these two key properties can be combined in a constructive way, then this formation can e...
详细信息
ISBN:
(纸本)1853129259
The term datamining refers to information elicitation. On the other hand, soft computing deals with information processing. If these two key properties can be combined in a constructive way, then this formation can effectively be used for knowledge discovery in large databases. Referring to this synergetic combination, the basic merits of datamining and soft computing paradigms are pointed out and novel datamining implementation coupled to a soft computing approach for knowledge discovery is presented. Knowledge modeling by machinelearning together with the computer experiments is described and. the effectiveness of the machinelearning approach employed is demonstrated.
The classification problem is one of the typical problems encountered in datamining and machinelearning. In this paper, a rough genetic algorithm (RGA) is applied to the classification problem in an undetermined env...
详细信息
ISBN:
(纸本)3540440259
The classification problem is one of the typical problems encountered in datamining and machinelearning. In this paper, a rough genetic algorithm (RGA) is applied to the classification problem in an undetermined environment based on a fuzzy distance function by calculating attribute weights. The RGA, a genetic algorithm based on rough values, can complement the existing tools developed in rough computing. Computational experiments are conducted on benchmark problems downloaded from UCI machinelearningdatabases. Experimental results, compared with the usual GA [1] and C4.5 algorithms, verify the efficiency of the developed algorithm. Furthermore, the weights acquired by the proposed learning method are applicable not only to fuzzy similarity functions but also to any similarity functions. As an application, a new distance metric called weighted discretized value difference metric (WDVDM) is proposed. Experimental results show that WDVDM is an improvement on the discretized value difference metric (DVDM).
The proceedings contain 83 papers. The special focus in this conference is on Granular, Neuro Computing, Probabilistic Reasoning, datamining, machinelearning, and patternrecognition. The topics include: Modelling b...
ISBN:
(纸本)9783540442745
The proceedings contain 83 papers. The special focus in this conference is on Granular, Neuro Computing, Probabilistic Reasoning, datamining, machinelearning, and patternrecognition. The topics include: Modelling biological phenomena with rough sets;a proposed evolutionary, self-organizing automaton for the control of dynamic systems;fuzzy sets, multi-valued mappings, and rough sets;a quantitative analysis of preclusivity vs. similarity based rough approximations;heyting wajsberg algebras as an abstract environment linking fuzzy and rough sets;dominance-based rough set approach using possibility and necessity measures;generalized decision algorithms, rough inference rules, and flow graphs;towards a mereological system for direct products and relations;reasoning about information granules based on rough logic;a rough set framework for learning in a directed acyclic graph;functional dependencies in relational expressions based on or-sets;about tolerance and similarity relations in information systems;collaborative query processing in DKS controlled by reducts;a new method for determining of extensions and restrictions of information systems;an alternative to find meaningful clusters by using the reducts from a dataset;variable consistency monotonic decision trees;importance and interaction of conditions in decision rules;induction of decision rules and classification in the valued tolerance approach;time series model mining with similarity-based neuro-fuzzy networks and genetic algorithms;closeness of performance map information granules;measures of inclusion and closeness of information granules and using granular objects in multi-source data fusion.
暂无评论