We discuss information granule calculi as a basis of granular computing. They are defined by constructs like information granules, basic relations of inclusion and closeness between information granules as well as ope...
详细信息
We discuss information granule calculi as a basis of granular computing. They are defined by constructs like information granules, basic relations of inclusion and closeness between information granules as well as operations on them. The exact interpretation between granule languages of different information sources (agents) often does not exist. Hence (rough) inclusion and closeness of granules are considered instead of their equality. Examples of all the basic constructs of information granule calculi are presented. The construction of more complex information granules is described by expressions called terms. We discuss the synthesis problem of robust terms, i.e., descriptions of information granules, satisfying a given specification in a satisfactory degree. We also present a method for synthesis of information granules represented by robust terms (approximate schemes of reasoning) by means of decomposition of specifications for such granules. The discussed problems of granular computing are of special importance for many applications, in particular related to spatial reasoning as well as to knowledgediscovery and data mining.
In recent years, knowledge has received significant attention in manufacturing to built a competitive advantage in the sector. knowledge induction from data is an important issue in manufacturing to find the failure o...
详细信息
In recent years, knowledge has received significant attention in manufacturing to built a competitive advantage in the sector. knowledge induction from data is an important issue in manufacturing to find the failure of the process then predict and improve the future system performance. This research examines the improvement of manufacturing process via data mining. Not only do we detect and isolate machine breakdowns in carpet manufacturing, but also we propose a C4.5 decision tree model. In addition, we use attribute relevance analysis to select the qualitative attribute's variables. Consequently, manufacturing process is redeveloped. (C) 2010 Elsevier B.V. All rights reserved.
A method for object aggregation and cluster identification has been proposed for knowledge discovery in databases. By integrating conceptual clustering and machine learning (especially learning-from-examples) paradigm...
详细信息
A method for object aggregation and cluster identification has been proposed for knowledge discovery in databases. By integrating conceptual clustering and machine learning (especially learning-from-examples) paradigms, the method classifies the data into different clusters, extracts the characteristics of each cluster, and discovers knowledge rules based on the relationships among different clusters. Different kinds of knowledge rules, including hierarchical, equivalence an inheritance rules can be discovered efficiently.
Multimedia data, including sound databases, require signal processing and parameterization to enable automatic searching for a specific content. Indexing of musical audio material with high-level timbre information re...
详细信息
Multimedia data, including sound databases, require signal processing and parameterization to enable automatic searching for a specific content. Indexing of musical audio material with high-level timbre information requires extraction of low-level sound parameters first. In this paper, we analyze regularities in musical sound description, for the data representing musical instrument sounds by means of spectral and time-domain features. We examined digital audio recordings of singular sounds for 11 instruments of definite pitch. Woodwinds, brass, and strings used in contemporary orchestras were investigated, for various fundamental frequencies of sound and articulation techniques. General-purpose data mining system Forty-Niner was applied to investigate dependencies between the sound attributes, and the results of the experiments are presented and discussed. We also indicate a broad range of possible industry applications, which may influence directions of further research in this domain. We summarize our paper with conclusions on representation of musical instrument sound, and the emerging issue of exploration of audio databases.
The investigation of relations between protein tertiary structure and amino acid sequence is a topic of tremendous importance in molecular biology. The automated discovery of recurrent patterns of structure and sequen...
详细信息
The investigation of relations between protein tertiary structure and amino acid sequence is a topic of tremendous importance in molecular biology. The automated discovery of recurrent patterns of structure and sequence is an essential part of this investigation. These patterns, known as protein motifs, are abstractions of fragments drawn from proteins of known sequence and tertiary structure. This paper has two objectives. The first is to introduce and define protein motifs, and provide a survey of previous research on protein motif discovery. The second is to present and apply a novel approach to protein motif representation and discovery, which is based on a spatial description logic and the symbolic machine learning paradigm of structured concept formation. A large database of protein fragments is processed using this approach, and several interesting and significant protein motifs are discovered.
Data mining has become an important technique which has tremendous potential in many commercial and industrial applications. Attribute-oriented induction is a powerful mining technique and has been successfully implem...
详细信息
Data mining has become an important technique which has tremendous potential in many commercial and industrial applications. Attribute-oriented induction is a powerful mining technique and has been successfully implemented in the data mining system DBMiner (Han et al. Proc. 1996 Int'l Conf. on Data Mining and knowledgediscovery (KDD'96), Portland, Oregon, 1996). However, its induction capability is limited by the unconditional concept generalization. In this paper, we extend the concept generalization to rule-based concept hierarchy, which enhances greatly its induction power. When previously proposed induction algorithm is applied to the more general rule-based case, a problem of induction anomaly occurs which impacts its efficiency. We have developed an efficient algorithm to facilitate induction on the rule-based case which can avoid the anomaly. Performance studies have shown that the algorithm is superior than a previously proposed algorithm based on backtracking.
As the computing world moves from the information age into the knowledge-based age, it is beneficial to induce knowledge from the information superhighway formed from the Internet and intranet. The knowledge acquired ...
详细信息
As the computing world moves from the information age into the knowledge-based age, it is beneficial to induce knowledge from the information superhighway formed from the Internet and intranet. The knowledge acquired can be expressed in different knowledge representations such as computer programs, first-order logical relations, or fuzzy Petri nets (FPNs). In this paper, we present a flexible knowledgediscovery system called generic genetic programming (GGP) that applies genetic programming (GP) and logic grammars to learn knowledge in various knowledge representation formalisms. An experiment is performed to demonstrate that GGP can discover knowledge represented in FPNs that support fuzzy and approximate reasoning. To evaluate the performance of GGP in producing good FPNs, the classification accuracy of the FPN induced by GGP and that of the decision tree generated by C4.5 are compared. Moreover, the performance of GGP in inducing logic programs from noisy examples is evaluated. A detailed comparison to FOIL, a system that induces logic programs, has been conducted. These experiments demonstrate that GGP is a promising alternative to other knowledgediscovery systems and sometimes is superior for handling noisy and inexact data. (C) 2001 Elsevier Science B.V. All rights reserved.
In this paper we compare the usability of ESOM and MDS as text exploration instruments in police investigations. We combine them with traditional classification instruments such as the SVM and Nave Bayes. We perform a...
详细信息
In this paper we compare the usability of ESOM and MDS as text exploration instruments in police investigations. We combine them with traditional classification instruments such as the SVM and Nave Bayes. We perform a case of real-life data mining using a dataset consisting of police reports describing a wide range of violent incidents that occurred during the year 2007 in the Amsterdam-Amstelland police region (The Netherlands). We compare the possibilities offered by the ESOM and MDS for iteratively enriching our feature set, discovering confusing situations, faulty case labelings and significantly improving the classification accuracy. The results of our research are currently operational in the Amsterdam-Amstelland police region for upgrading the employed domestic violence definition, for improving the training of police officers and for developing a highly accurate and comprehensible case triage model. (C) 2011 Elsevier B.V. All rights reserved.
Effective project management is a key factor for successful knowledge discovery in databases (KDD) projects. The systematic documentation of previous knowledge, experiments, data and results is a helpful means of keep...
详细信息
Effective project management is a key factor for successful knowledge discovery in databases (KDD) projects. The systematic documentation of previous knowledge, experiments, data and results is a helpful means of keeping track of the project current status. Despite its value, documentation is most often perceived as an overhead. We propose a documentation infrastructure composed of a documentation model and a supporting environment that allows the capture, storage and retrieval of KDD process-related information and artifacts. The paper describes this infrastructure, and reports preliminary experiences on its use. Preliminary results reveal generalized satisfaction with regard to infrastructure expressiveness and functionality, and highlight the contributions of the documentation produced for improving project management, project execution and team communication. The role of documentation in learning and reuse was also identified. (C) 2004 Elsevier B.V. All rights reserved.
In this study, we are building a prototype of a machine-learning system using an inductive supervised approach to predict the logistical performance of a company. Focus lies on the learning phase, the handling of diff...
详细信息
In this study, we are building a prototype of a machine-learning system using an inductive supervised approach to predict the logistical performance of a company. Focus lies on the learning phase, the handling of different types of data, the creation of new concepts in order to provide better measurable information. In this system, numeric financial data are combined with categorical data creating symbolic data, distinguishing the phase of model generation from examples, and the phase of model classification and interpretation. The system has been implemented in vector spaces. Our data are benchmarking surveys on concurrent engineering (CE), measuring the usage of in total 302 best practices in Belgian manufacturing companies. The general purpose for implementing a best practice is the statement that the company will improve its product processing, and that in this way the company will establish its economical existence on the market. Our model processes a limited number of predefined steps, generating value factors for the 302 best practices. The best practices are grouped into 30 subjects, the value factors combined in linear combinations. These value factors and their linear combinations are then subject to pattern interpretation relating CE performance to the past financial state of the company and, also to the economical well-doing of the company in the longer term i.e., we also refer to the sustainability of the company in the market.
暂无评论