knowledge discovery in databases, or data mining, is an important direction in the development of data and knowledge-based systems. Because of the huge amount of data stored in large numbers of existing databases, and...
详细信息
knowledge discovery in databases, or data mining, is an important direction in the development of data and knowledge-based systems. Because of the huge amount of data stored in large numbers of existing databases, and because the amount of data generated in electronic forms is growing rapidly, it is necessary to develop efficient methods to extract knowledge from databases. An attribute-oriented rough set approach has been developed for knowledge discovery in databases. The method integrates machine-learning paradigm, especially learning-from-examples techniques, with rough set techniques. An attribute-oriented concept tree ascension technique is first applied in generalization, which substantially reduces the computational complexity of database learning processes. Then the cause-effect relationship among the attributes in the database is analyzed using rough set techniques, and the unimportant or irrelevant attributes are eliminated. Thus concise and strong rules with little or no redundant information can be learned efficiently. Our study shows that attribute-oriented induction combined with rough set theory provide an efficient and effective mechanism for knowledgediscovery in database systems.
Large collections of genomic information have been accumulated in recent years, and embedded latently in them is potentially significant knowledge for exploitation in medicine and in the pharmaceutical industry. The a...
详细信息
Large collections of genomic information have been accumulated in recent years, and embedded latently in them is potentially significant knowledge for exploitation in medicine and in the pharmaceutical industry. The approach taken here to the distillation of such knowledge is to detect strings in DNA sequences which appear frequently, either within a given sequence (e.g., for a particular patient) or across sequences (e.g., from different patients sharing a particular medical diagnosis). Motifs are strings that occur very frequently. We present basic theory and algorithms for finding very frequent and common strings. Strings which are maximally frequent are of particular interest and, having discovered such motifs, we show briefly how to mine association rules by an existing rough sets based technique. Further work and applications are in progress.
This paper proposes and evaluates a method for extracting interesting patterns from numerical time-series data which takes account of user subjectivity. The proposed method conducts irregular sampling on the data pres...
详细信息
This paper proposes and evaluates a method for extracting interesting patterns from numerical time-series data which takes account of user subjectivity. The proposed method conducts irregular sampling on the data preserving the subjectively noteworthy features using a user specified gradient. It also conducts irregular quantization, preserving the intrinsically objective characteristics of the data using statistical distributions. It then extracts representative patterns from the discretized data using group average clustering. Experimental results using benchmark datasets indicate that the proposed method does not destroy the intrinsically objective features, since it has the same performance as the basic subsequence clustering using K-Means algorithm. Results using a dataset from a clinical hepatitis study indicate that it extracts interesting patterns for a medical expert.
Low Birth Weight (LBW) babies have a high risk of developing certain health conditions throughout their lives that affect negatively their quality of life. Therefore, a Decision Support System (DSS) that predicts whet...
详细信息
ISBN:
(纸本)9783030161866;9783030161873
Low Birth Weight (LBW) babies have a high risk of developing certain health conditions throughout their lives that affect negatively their quality of life. Therefore, a Decision Support System (DSS) that predicts whether a baby will be born with LBW would be of great interest. In this study, six different Data Mining (DM) algorithms are tested for five different scenarios. The scenarios combine information about the mother's physical characteristics and habits, and the gestation. Results are promising and the best model achieved a sensitivity of 91,4% and a specificity of 99%. Good results were also achieved without considering the gestational age, which showed that the use of DM might be a good alternative to the traditional medical imaging exams in the prediction of LBW early in the pregnancy.
The massive use of Information and Communication Technology in education allowed to collect and store a huge amount of various data about all educational aspects. The analysis of these raw data could lead to new, unex...
详细信息
ISBN:
(纸本)9781509019939
The massive use of Information and Communication Technology in education allowed to collect and store a huge amount of various data about all educational aspects. The analysis of these raw data could lead to new, unexpected but valuable knowledge, useful for both teachers and students, and also for faculties and universities managers. In this paper a knowledge discovery in databases process, applied on data collected mainly from a Learning Management System implemented in "Stefan cel Mare" University of Suceava is presented.
This paper applies the preprocessing phases of the knowledge discovery in databases to the automated blood cell counter data and creates discrete ranges of blood cell counter data that can be used in grouping data usi...
详细信息
ISBN:
(纸本)9783642315992
This paper applies the preprocessing phases of the knowledge discovery in databases to the automated blood cell counter data and creates discrete ranges of blood cell counter data that can be used in grouping data using classification, clustering and association rule generation. The functions of an automated blood cell counter from a clinical pathology laboratory and the phases in knowledge discovery in databases are explained briefly. Twelve thousand records are taken from a clinical laboratory for processing. The preprocessing steps of the KDD process are applied on the blood cell counter data. This paper applies the Chi Merge algorithm on the blood cell counter data and generates discretized data representing ranges of values for the data.
knowledgediscovery process is intended to provide valid, novel, potentially useful and finally understandable patterns from data. An interesting research area concerns the identification and use of interestingness me...
详细信息
ISBN:
(纸本)9783319210247;9783319210230
knowledgediscovery process is intended to provide valid, novel, potentially useful and finally understandable patterns from data. An interesting research area concerns the identification and use of interestingness measures, in order to rank or filter results and provide what might be called better knowledge. For association rules mining, some research has been focused on how to filter itemsets and rules, in order to guide knowledge acquisition from the user's point of view, as well as to improve efficiency of the process. In this paper, we explain MOGACAR, an approach for ranking and filtering association rules when there are multiple technical and business interestingness measures;MOGACAR uses a multi-objective optimization method based on genetic algorithm for classification association rules, with the intention to find the most interesting, and still valid, itemsets and rules.
Commercial databases often contain critical business information concerning past performance which could be used to predict the future. However, the huge amounts of data can make the extraction of this business inform...
详细信息
ISBN:
(纸本)354065271X
Commercial databases often contain critical business information concerning past performance which could be used to predict the future. However, the huge amounts of data can make the extraction of this business information almost impossible by manual methods or standard software techniques. Data mining techniques can analyze, understand and visualize the huge amounts of stored data gathered from business applications and thus help companies stay competitive in today's marketplace. Currently, a number of data mining applications and prototypes have been developed for a variety of business domains. Most of these applications are targeted at predictive modeling that finds patterns of data to help predict the future trend and behaviors of some entities. Apart from predictive modeling, other data mining tasks such as summarization, association, classification and clustering could also be applied to business databases. In this paper, we will illustrate the different data mining tasks applied to a real-life business database for risk analysis and targeted marketing.
Predictive Toxicology (PT) is one of the newest targets of the knowledge discovery in databases (KDD) domain. Its goal is to describe the relationships between the chemical structure of chemical compounds and biologic...
详细信息
ISBN:
(纸本)9781424417391
Predictive Toxicology (PT) is one of the newest targets of the knowledge discovery in databases (KDD) domain. Its goal is to describe the relationships between the chemical structure of chemical compounds and biological and toxicological processes. In real PT problems there is a very important topic to be considered: the huge number of the chemical descriptors. Irrelevant, redundant, noisy and unreliable data have a negative impact, therefore one of the main goals in KDD is to detect these undesirable proprieties and to eliminate or correct them. This assumes data cleaning, noise reduction and feature selection because the performance of the applied Machine Learning algorithms is strongly related with the quality of the data used. In this paper, we present some of the issues that can be taken into account for preparing data before the actual knowledgediscovery is performed.
This article presents a knowledge discovery in databases (KDD) process to analyze data obtained by monitoring a fleet of eight electric vehicles with ZEBRA batteries during the years 2012 and 2013. Over 4,000 journeys...
详细信息
ISBN:
(纸本)9781479960750
This article presents a knowledge discovery in databases (KDD) process to analyze data obtained by monitoring a fleet of eight electric vehicles with ZEBRA batteries during the years 2012 and 2013. Over 4,000 journeys and 2,000 charging events have been detected. The analysis of such events shows the consumption of the battery and its aging, and how the electric vehicles are used.
暂无评论