检索结果-内蒙古大学图书馆

14th IEEE International Conference on data mining (IEEE ICDM)

作者： Leung, Carson Kai-Sang MacKinnon, Richard Kyle Tanbeer, Syed K. Univ Manitoba Dept Comp Sci Winnipeg MB Canada

ISBN: (纸本)9781479943036

The majority of existing data mining algorithms mine frequent itemsets from precise data. A well-known algorithm is FP-growth, which builds a compact FP-tree structure to capture important contents of precise data and mines frequent itemsets from the FP-tree. However, there are situations in which data are uncertain. To capture important contents (e.g., existential probabilities) of uncertain data for mining frequent itemsets, the UF-growth algorithm uses a UF-tree structure. However, the UF-tree can be large. Other tree structures for handling uncertain data may achieve compactness at the expense of looser upper bounds on expected supports. To solve this problem, we propose fast algorithms that use compact tree structures for capturing uncertain data with tightened upper bounds to expected support (tube) for frequent itemset mining from uncertain data. Experimental results show the tightness of tube provided by our algorithms and the compactness of our tree structures.

关键词： Association analysis data mining algorithms expected support frequent patterns tree structures uncertain data

来源：评论

学校读者我要写书评

暂无评论

Constructing Parallel Association algorithms from Function Blocks 15th

Constructing Parallel Association Algorithms from Function B...

引用

15th Industrial Conference on data mining (ICDM)

作者： Kholod, Ivan Kuprianov, Mikhail Shorov, Andrey St Petersburg Electrotech Univ LETI St Petersburg Russia

ISBN: (纸本)9783319209104;9783319209098

The article describes the method of construction of association rules retrieval algorithms out from function blocks having a unified interface and purely functional properties. The usage of function blocks to build association rules algorithms allows modifying the existing algorithms and building new algorithms with minimum effort. Besides, the function block properties allow to transform the algorithms into parallel form, thus improving their efficiency.

关键词： data mining Parallel data mining data mining algorithms Parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Discovering Complex Incomplete Periodic Patterns through Logical Derivations

Discovering Complex Incomplete Periodic Patterns through Log...

引用

2015 Workshop 11

作者： Janusz R.Getta Marcin Zimniak School of Computing and Information Technology University of Wollongong Faculty of Computer Science TU Chemnitz

Discovering complex and incomplete periodic patterns in the logs of events is a complicated and time consuming *** work shows that it is possible to discover complex and incomplete periodic patterns through finding simple patterns first and through logical derivations of complex and incomplete patterns later *** paper defines a syntax and semantics of a class of periodic patterns that frequently occur in the logs of events.A system of derivation rules proposed in the paper can be used to transform a set of periodic patterns into a logically equivalent set of *** rules are used in the algorithms that derive complex and incomplete periodic patterns.A prototype implementation of the algorithms that discover complex and incomplete periodic patterns in the logs of events is presented.

关键词： Periodic Pattern Complex Periodic Pattern Incomplete Periodic Pattern Derivation Rules data mining algorithms

来源：评论

学校读者我要写书评

暂无评论

Knowledge discovery in medicine: Current issue and future trend

引用

EXPERT SYSTEMS WITH APPLICATIONS 2014年第9期41卷 4434-4463页

作者： Esfandiari, Nura Babavalian, Mohammad Reza Moghadam, Amir-Masoud Eftekhari Tabar, Vahid Kashani Islamic Azad Univ Fac Comp & Informat Technol Qazvin Branch Qazvin Iran Kashan Univ Med Sci Trauma Res Ctr Kashan Iran

data mining is a powerful method to extract knowledge from data. Raw data faces various challenges that make traditional method improper for knowledge extraction. data mining is supposed to be able to handle various data types in all formats. Relevance of this paper is emphasized by the fact that data mining is an object of research in different areas. In this paper, we review previous works in the context of knowledge extraction from medical data. The main idea in this paper is to describe key papers and provide some guidelines to help medical practitioners. Medical data mining is a multidisciplinary field with contribution of medicine and data mining. Due to this fact, previous works should be classified to cover all users' requirements from various fields. Because of this, we have studied papers with the aim of extracting knowledge from structural medical data published between 1999 and 2013. We clarify medical data mining and its main goals. Therefore, each paper is studied based on the six medical tasks: screening, diagnosis, treatment, prognosis, monitoring and management. In each task, five data mining approaches are considered: classification, regression, clustering, association and hybrid. At the end of each task, a brief summarization and discussion are stated. A standard framework according to CRISP-DM is additionally adapted to manage all activities. As a discussion, current issue and future trend are mentioned. The amount of the works published in this scope is substantial and it is impossible to discuss all of them on a single work. We hope this paper will make it possible to explore previous works and identify interesting areas for future research. (C) 2014 Elsevier Ltd. All rights reserved.

关键词： data mining application Medical data mining Medicine Disease data mining algorithms

来源：评论

学校读者我要写书评

暂无评论

A statistical significance testing approach to mining the most informative set of patterns

引用

data mining AND KNOWLEDGE DISCOVERY 2014年第1期28卷 238-263页

作者： Lijffijt, Jefrey Papapetrou, Panagiotis Puolamaki, Kai Aalto Univ Dept Informat & Comp Sci Aalto 00076 Finland Univ London Dept Comp Sci & Informat Syst London WCIE 7HX England Finnish Inst Occupat Hlth FI-00025 Helsinki Finland

Hypothesis testing using constrained null models can be used to compute the significance of data mining results given what is already known about the data. We study the novel problem of finding the smallest set of patterns that explains most about the data in terms of a global p value. The resulting set of patterns, such as frequent patterns or clusterings, is the smallest set that statistically explains the data. We show that the newly formulated problem is, in its general form, NP-hard and there exists no efficient algorithm with finite approximation ratio. However, we show that in a special case a solution can be computed efficiently with a provable approximation ratio. We find that a greedy algorithm gives good results on real data and that, using our approach, we can formulate and solve many known data-mining tasks. We demonstrate our method on several data mining tasks. We conclude that our framework is able to identify in various settings a small set of patterns that statistically explains the data and to formulate data mining problems in the terms of statistical significance.

关键词： data mining algorithms Pattern mining Statistical significance testing

来源：评论

学校读者我要写书评

暂无评论

mining Students' Learning Behavior in Moodle System

引用

JOURNAL OF INFORMATION TECHNOLOGY RESEARCH 2014年第4期7卷 12-26页

作者： Touya, K. Fakir, Mohamed Sultan Moulay Slimane Univ Fac Sci & Technol Beni Mellal Morocco

In the last few years, Educational data mining has become an interesting area exploited to discover and extract hidden knowledge of students from educational environment data. During the establishment of this work an attempt was made to manage the extracted information using mining techniques. These methods took place in order to get groups of students with similar characteristics. The application of classification, clustering and association rules mining algorithms on the data stored on the e-learning (Moodle system) database allowed to extract knowledges that help to understand students' behaviors and patterns. Additionally, the development of a Web application for the educators is a tool to monitor their students learning behavior by monitoring the number of assignments taken, the number of quizzes taken, the number of forum post and read by students, etc. The knowledge obtained can help the instructors to make decision about their students' interacting with the courses activities in Moodle system, and to create an efficient educational environment. In this research, a data mining tool called RapidMiner was used for mining the data from the Moodle system database, and a web application written in PHP was established to aid teachers with statistics.

关键词： Association Rules Classification Clustering data mining algorithms Educational data mining (EDM) Moodle System RapidMiner SMoodle System Student Behavior

来源：评论

学校读者我要写书评

暂无评论

A Business Intelligence Solution for Frequent Pattern mining on Social Networks 14

A Business Intelligence Solution for Frequent Pattern Mining...

引用

14th IEEE International Conference on data mining (IEEE ICDM)

作者： Jiang, Fan Leung, Carson Kai-Sang Univ Manitoba Dept Comp Sci Winnipeg MB Canada

ISBN: (纸本)9781479942749

Frequent pattern mining is an important data mining task. Since its introduction, it has drawn attention from many researchers. Consequently, many frequent pattern mining algorithms have been proposed to mine large varieties of high-value data such as high volumes of shopper market basket data. In this paper, we propose a business intelligence (BI) solution for frequent pattern mining on social network data. Evaluation results show that our proposed BI solution is both space-and time-efficient. Moreover, we also discuss the benefits and practicality of our BI solution-which reveals frequent social patterns-in real-life business applications.

关键词： Association analysis business applications business intelligence data mining algorithms frequent patterns

来源：评论

学校读者我要写书评

暂无评论

DISC: Efficient Uncertain Frequent Pattern mining with Tightened Upper Bounds 14

DISC: Efficient Uncertain Frequent Pattern Mining with Tight...

引用

14th IEEE International Conference on data mining (IEEE ICDM)

作者： MacKinnon, Richard Kyle Strauss, Teagan D. Leung, Carson Kai-Sang Univ Manitoba Dept Comp Sci Winnipeg MB Canada

ISBN: (纸本)9781479942749

UF-growth is a tree-based exact algorithm for mining frequent patterns from uncertain data. While it directly calculates the expected support of a pattern, it requires a significant amount of storage space to capture all existential probability values among the items. To eliminate the extra space requirement of UF-growth, the CUF-growth algorithm combines nodes with the same item by storing an upper bound on expected support. In this paper, we (i) introduce a new concept of domain item-specific capping (DISC) and (ii) propose three new scalable data analytics algorithms that use this concept to achieve a tighter upper bound than CUF-growth. Experimental results show the effectiveness of uncertain frequent pattern mining with tightened upper bounds provided by using the concept of DISC.

关键词： Association analysis data mining algorithms expected support frequent itemsets tree structures uncertain data

来源：评论

学校读者我要写书评

暂无评论

data mining model adjustment control charts for cascade processes

引用

EUROPEAN JOURNAL OF INDUSTRIAL ENGINEERING 2013年第4期7卷 442-455页

作者： Kim, Seoung Bum Jitpitaklert, Weerawat Chen, Victoria C. P. Lee, Jinpyo Park, Sun-Kyoung Korea Univ Sch Ind Management Engn Seoul 136713 South Korea Univ Texas Arlington Dept Ind & Mfg Syst Engn Arlington TX 76019 USA Hongik Univ Sch Business Seoul 121791 South Korea Hanyang Cyber Univ Sch Business Adm Seoul 133791 South Korea

Control charts have been widely recognised as important tools in system monitoring of abnormal behaviour and quality improvement. Traditional control charts have a major assumption that successive observations are uncorrelated and normally distributed. When this assumption is violated, the traditional control charts do not perform well, but instead show increased false alarm rates. In this study, we propose a data mining model adjustment control chart to address autocorrelation problems for cascade processes. The basic idea of the proposed control chart is to monitor the residuals obtained by data mining models. The data mining models used in this study include support vector regression and artificial neural networks. A simulation study was conducted to evaluate the performance of the proposed control chart and compare it with the standard regression adjustment control chart and the observations-based control chart in terms of average run length performance. The results showed that the proposed data mining model adjustment control charts yielded better performance than the two other methods considered in this study.

来源：评论

学校读者我要写书评

暂无评论

A Method for mining Infrequent Causal Associations and Its Application in Finding Adverse Drug Reaction Signal Pairs

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND data ENGINEERING 2013年第4期25卷 721-733页

作者： Ji, Yanqing Ying, Hao Tran, John Dews, Peter Mansour, Ayman Massanari, R. Michael Gonzaga Univ Dept Elect & Comp Engn Spokane WA 99258 USA Wayne State Univ Dept Elect & Comp Engn Detroit MI 48202 USA Spokane Mental Hlth Spokane WA 99202 USA St Mary Mercy Hosp Trinity Hlth Dept Med Livonia MI 48154 USA Crit Junctures Inst Bellingham WA 98225 USA

In many real-world applications, it is important to mine causal relationships where an event or event pattern causes certain outcomes with low probability. Discovering this kind of causal relationships can help us prevent or correct negative outcomes caused by their antecedents. In this paper, we propose an innovative data mining framework and apply it to mine potential causal associations in electronic patient data sets where the drug-related events of interest occur infrequently. Specifically, we created a novel interestingness measure, exclusive causal-leverage, based on a computational, fuzzy recognition-primed decision (RPD) model that we previously developed. On the basis of this new measure, a data mining algorithm was developed to mine the causal relationship between drugs and their associated adverse drug reactions (ADRs). The algorithm was tested on real patient data retrieved from the Veterans Affairs Medical Center in Detroit, Michigan. The retrieved data included 16,206 patients (15,605 male, 601 female). The exclusive causal-leverage was employed to rank the potential causal associations between each of the three selected drugs (i.e., enalapril, pravastatin, and rosuvastatin) and 3,954 recorded symptoms, each of which corresponded to a potential ADR. The top 10 drug-symptom pairs for each drug were evaluated by the physicians on our project team. The numbers of symptoms considered as likely real ADRs for enalapril, pravastatin, and rosuvastatin were 8, 7, and 6, respectively. These preliminary results indicate the usefulness of our method in finding potential ADR signal pairs for further analysis (e. g., epidemiology study) and investigation (e. g., case review) by drug safety professionals.

关键词： Adverse drug reactions association rules data mining algorithms interestingness measure recognition primed decision model

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：