检索结果-内蒙古大学图书馆

3rd International Conference on Advanced Network Technologies and Intelligent Computing (ANTIC)

作者： Sharma, Shivani Thapar Inst Engn & Technol Dept Comp Sci & Engn Patiala India

ISBN: (纸本)9783031640759;9783031640766

This paper introduces a spark-based fast solution for privacy-preserving frequent pattern mining problems for big data. Spark Resilient Distributed dataset (RDD) framework has been used to implement the Mask approach, which uses the probabilistic distortion method for maintaining data privacy while mining frequent patterns. The masking technique shows very promising results in terms of privacy and utility both. However, due to sequential nature limits the application to small or medium size data. The spark-based proposed technique introduces two-level parallelization i.e. data and algorithmic level which in turn paves a way to gain faster analytical results in a bounded amount of time while dealing with a large volume of datasets. This makes the application feasible for the current growth of data size. A number of experiments have been conducted to compare the performance of the proposed scheme with benchmark parallel approaches in terms of privacy, utility, and time complexity over real and simulated data sets. It has been observed that the proposed scheme preserves the privacy of sensitive data while maintaining utility within a real bound of time. Experiments show that the proposed Spark-based scheme i.e. S-Mask gains 16 times speedup on average over different benchmark data sets and maintains a desired ratio between privacy and utility of the data.

关键词： Spark Distributed Computing Parallelization constraint-based data mining Sensitive Pattern Hiding Privacy Preservation

来源：评论

学校读者我要写书评

暂无评论

Frequent itemset hiding revisited: pushing hiding constraints into mining

引用

APPLIED INTELLIGENCE 2022年第3期52卷 2539-2555页

作者： Verykios, Vassilios S. Stavropoulos, Elias C. Krasadakis, Panteleimon Sakkopoulos, Evangelos Hellen Open Univ Sch Sci & Technol Patras Greece Univ Piraeus Dept Informat Piraeus Greece

This paper introduces a new theoretical scheme for the solution of the frequent itemset hiding problem. We propose an algorithmic approach that consists of a novel constraint-based hiding model which encompasses hiding into one pass mining, along with a solution methodology that relies on Linear Programming. The induced patterns by the constraint-based mining algorithm are, in this way, utilized to build a minimal linear program whose solution dictates the construction of a database extension that delivers the sought-for hiding. This extension should be appended to the original database and released as a whole for mining, with that resulting extended database hiding the sensitive knowledge that we want to protect. Our proposed theory outdoes both in space complexity and accuracy, all the existing approaches which have been proposed so far in this domain and we proved that superiority with a series of experiments against other existing approaches. Our proposal sheds a new light on the exploration of new algorithmic techniques which can be handily applied to model hiding problems by providing solutions that computationally outperform all existing modeling approaches for hiding.

关键词： Privacy preserving data mining Knowledge hiding Frequent itemset hiding constraint-based data mining Linear programming

来源：评论

学校读者我要写书评

暂无评论

Sequence mining under Multiple constraints 15

Sequence Mining under Multiple Constraints

引用

30th ACM Symposium on Applied Computing (SAC)

作者： Bechet, Nicolas Cellier, Peggy Charnois, Thierry Cremilleux, Bruno Univ Bretagne Sud IRISA Campus Tohann F-56017 Vannes France IRISA INSA Rennes F-35042 Rennes France Univ Paris 13 Sorbonne Paris Cite LIPN F-93430 Villetaneuse France Univ Caen Basse Normandie GREYC F-14032 Caen 5 France

ISBN: (纸本)9781450331968

In this paper, we address the problem of mining sequential patterns under multiple constraints. Unlike classical algorithms, our approach handles various types of constraints which are not only numeric but also symbolic and syntactic. These multiple constraints enable us to express a large scope of knowledge to focus on interesting patterns. We illustrate our approach with the detection of gene rare disease relationships from biomedical texts for the documentation of rare diseases.

关键词： Sequential data mining constraint-based data mining pattern discovery information extraction natural language processing

来源：评论

学校读者我要写书评

暂无评论

constraint-based concept mining and its application to microarray data analysis

引用

INTELLIGENT data ANALYSIS 2005年第1期9卷 59-82页

作者： Besson, Jeremy Robardet, Celine Boulicaut, Jean-Francois Rome, Sophie Inst Natl Sci Appl LIRIS CNRS FRE 2672 F-69621 Villeurbanne France INRA INSERM UMR 1235 F-69372 Lyon 08 France Inst Natl Sci Appl PRISMA F-69621 Villeurbanne France

We are designing new data mining techniques on boolean contexts to identify a priori interesting bi-sets, i.e., sets of objects (or transactions) and associated sets of attributes (or items). It improves the state of the art in many application domains where transactional/boolean data are to be mined (e. g., basket analysis, WWW usage mining, gene expression data analysis). The so-called (formal) concepts are important special cases of a priori interesting bi-sets that associate closed sets on both dimensions thanks to the Galois operators. Concept mining in boolean data is tractable provided that at least one of the dimensions (number of objects or attributes) is small enough and the data is not too dense. The task is extremely hard otherwise. Furthermore, it is important to enable user-defined constraints on the desired bi-sets and use them during the extraction to increase both the efficiency and the a priori interestingness of the extracted patterns. It leads us to the design of a new algorithm, called D-Miner, for mining concepts under constraints. We provide an experimental validation on benchmark data sets. Moreover, we introduce an original data mining technique for microarray data analysis. Not only boolean expression properties of genes are recorded but also we add biological information about transcription factors. In such a context, D-Miner can be used for concept mining under constraints and outperforms the other studied algorithms. We show also that data enrichment is useful for evaluating the biological relevancy of the extracted concepts.

关键词： pattern discovery constraint-based data mining closed sets formal concepts microarray data analysis

来源：评论

学校读者我要写书评

暂无评论

Generating closed frequent gensets under ' constraints based on FP-Tree structure

引用

IMACS Multiconference on Computational Engineering in Systems Applications (CESA 2006)

作者： Trabelsi, C. Latiri, C. Ghedira, K. Tunisian High Sch Management Res Grp SOIE 24 Ave Libert Tunis 2000 Tunisia Fac Sci Tunis Res Grp URPAH Comp Sci Dept Tunis Tunisia

ISBN: (纸本)9787302139225

The mechanism of gene regulation is of great interest for biologists, especially in the genomic field. One part of mechanisms controlling the genes expression is provided by the transcription factors, which are proteins that can either repress or stimulate the transcription of a gene. In this paper, we propose a new data mining algorithm, based on boolean contexts, in order to extract a priori relevant frequent closed gensets, i.e., sets of tissus and associated sets of genes and transcription factors which are useful for the biologist. The key feature of our algorithm is a better compromise between the size of the search space and the conveyed discovered knowledge in bioinformatics. For this, the proposed algorithm, called MC(2)G for mining Cconstraint Closed Gensets, uses the Frequent Pattern Tree (FP-Tree) structure, which is an extended Prefix-Tree structure, to prime the search space. Moreover MC(2)G enables to define statistical and syntaxic constraints on the desired frequent closed gensets and uses them during the extraction process. Experimental comparisons with other algorithms are achieved on real world datasets. http://***/stamp/***?arnumber=4281879

关键词： gene expression transcription factor closed frequent genset pattern discovery constraint-based data mining FP-Tree structure formal concepts

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：