检索结果-内蒙古大学图书馆

Fast detection of XML structural similarity

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2005年第2期17卷 160-175页

作者： Flesca, S Manco, G Masciari, E Pontieri, L Pugliese, A CNR ICAR Inst High Performance Comp & Networks I-87036 Arcavacata Di Rende CS Italy Univ Calabria I-87036 Arcavacata Di Rende CS Italy

Because of the widespread diffusion of semistructured data in XML format, much research effort is currently devoted to support the storage and retrieval of large collections of such documents. XML documents can be compared as to their structural similarity, in order to group them into clusters so that different storage, retrieval, and processing techniques can be effectively exploited. In this scenario, an efficient and effective similarity function is the key of a successful data management process. We present an approach for detecting structural similarity between XML documents which significantly differs from standard methods based on graph-matching algorithms, and allows a significant reduction of the required computation costs. Our proposal roughly consists of linearizing the structure of each XML document, by representing it as a numerical sequence and, then, comparing such sequences through the analysis of their frequencies. First, some basic strategies for encoding a document are proposed, which can focus on diverse structural facets. Moreover, the theory of Discrete Fourier Transform is exploited to effectively and efficiently compare the encoded documents (i.e., signals) in the domain of frequencies. Experimental results reveal the effectiveness of the approach, also in comparison with standard methods.

关键词： Web mining mining methods and algorithms XML/XSL/RDF text mining similarity measures

来源：评论

学校读者我要写书评

暂无评论

Indexing useful structural patterns for XML query processing

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2005年第7期17卷 997-1009页

作者： Lian, W Mamoulis, N Cheung, DWL Yiu, SM Macao Univ Sci & Technol Fac Informat Technol Taipa Peoples R China Univ Hong Kong Dept Comp Sci Hong Kong Hong Kong Peoples R China

Queries on semistructured data are hard to process due to the complex nature of the data and call for specialized techniques. Existing path- based indexes and query processing algorithms are not efficient for searching complex structures beyond simple paths, even when the queries are high- selective. We introduce the definition of minimal infrequent structures ( MIS), which are structures that 1) exist in the data, 2) are not frequent with respect to a support threshold, and 3) all substructures of them are frequent. By indexing the occurrences of MIS, we can efficiently locate the high- selective substructures of a query, improving search performance significantly. An efficient data mining algorithm is proposed, which finds the minimal infrequent structures. Their occurrences in the XML data are then indexed by a lightweight data structure and used as a fast filter step in query evaluation. We validate the efficiency and applicability of our methods through experimentation on both synthetic and real data.

关键词： query processing XML/XSL/RDF mining methods and algorithms document indexing

来源：评论

学校读者我要写书评

暂无评论

ClosedPROWL: Efficient mining of Closed Frequent Continuities by Projected Window List Technology

引用

5th SIAM International Conference on Data mining

作者： Huang, Kuo-Yu Chang, Chia-Hui Lin, Kuo-Zui Natl Cent Univ Dept Comp Sci & Informat Engn Chungli Taiwan

ISBN: (纸本)9780898715934

mining frequent patterns in databases is a fundamental and essential problem in data mining research. A continuity is a kind of causal relationship which describes a definite temporal factor with exact position between the records. Since continuities break the boundaries of records, the number of potential patterns will increase drastically. An alternative approach is to mine closed frequent continuities. mining closed frequent patterns has the same power as mining the complete set of frequent patterns, while substantially reducing redundant rules to be generated and increasing the effectiveness of mining. In this paper, we propose a method called projected window list technology for the mining of frequent continuities. We present a closed frequent continuity mining algorithm, ClosedPROWL. Experimental result shows that our algorithm is more efficient than previously proposed algorithms.

关键词： Temporal databases association rules mining methods and algorithms

来源：评论

学校读者我要写书评

暂无评论

mining Block Correlations to ImproveStorage Performance

引用

ACM Transactions on Storage 2005年第2期1卷 213-245页

作者： Li, Zhenmin Chen, Zhifeng Zhou, Yuanyuan University of Illinois at Urbana-Champaign Department of Computer Science Urbana IL 61801 United States

Block correlations are common semantic patterns in storage systems. They can be exploited for improving the effectiveness of storage caching, prefetching, data layout, and disk scheduling. Unfortunately, information about block correlations is unavailable at the storage system level. Previous approaches for discovering file correlations in file systems do not scale well enough for discovering block correlations in storage systems. In this article, we propose two algorithms, C-Miner and C-Miner*, that use a data mining technique called frequent sequence mining to discover block correlations in storage systems. Both algorithms run reasonably fast with feasible space requirement, indicating that they are practical for dynamically inferring correlations in a storage system. C-Miner is a direct application of a frequent-sequence mining algorithm with a few modifications;compared with C-Miner, C-Miner* is redesigned for mining block correlations by making concessions for the specific problem of long sequences in storage system traces. Therefore, C-Miner* can discover 7-109% more correlation rules within 2-15 times shorter time than C-Miner. Moreover, we have also evaluated the benefits of block correlation-directed prefetching and data layout through experiments. Our results using real system workloads show that correlation-directed prefetching and data layout can reduce average I/O response time by 12-30% compared to the base case, and 7-25% compared to the commonly used sequential prefetching scheme for most workloads. © 2005, ACM. All rights reserved.

关键词： algorithms block correlations file system management Management mining methods and algorithms Performance Storage management

来源：评论

学校读者我要写书评

暂无评论

HARP: A practical projected clustering algorithm

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2004年第11期16卷 1387-1397页

作者： Yip, KY Cheung, DW Ng, MK Univ Hong Kong Dept Comp Sci & Informat Syst Hong Kong Hong Kong Peoples R China Univ Hong Kong Dept Math Hong Kong Hong Kong Peoples R China

In high-dimensional data, clusters can exist in subspaces that hide themselves from traditional clustering methods. A number of algorithms have been proposed to identify such projected clusters, but most of them rely on some user parameters to guide the clustering process. The clustering accuracy can be seriously degraded if incorrect values are used. Unfortunately, in real situations, it is rarely possible for users to supply the parameter values accurately, which causes practical difficulties in applying these algorithms to real data. In this paper, we analyze the major challenges of projected clustering and suggest why these algorithms need to depend heavily on user parameters. Based on the analysis, we propose a new algorithm that exploits the clustering status to adjust the internal thresholds dynamically without the assistance of user parameters. According to the results of extensive experiments on real and synthetic data, the new method has excellent accuracy and usability. It outperformed the other algorithms even when correct parameter values were artificially supplied to them. The encouraging results suggest that projected clustering can be a practical tool for various kinds of real applications.

关键词： data mining mining methods and algorithms clustering bioinformatics

来源：评论

学校读者我要写书评

暂无评论

The combined technique for detection of artifacts in clinical electroencephalograms of sleeping newborns

引用

IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE 2004年第1期8卷 28-35页

作者： Schetinin, V Schult, J Univ Exeter Dept Comp Sci Exeter EX4 4QF Devon England Univ Jena TheorieLabor D-07740 Jena Germany

In this paper, we describe a new method combining the polynomial neural network and decision tree techniques in order to derive comprehensible classification rules from clinical electroencephalograms (EEGs) recorded from sleeping newborns. These EEGs are heavily corrupted by cardiac, eye movement, muscle, and noise artifacts and, as a consequence, some EEG features are irrelevant to classification problems. Combining the polynomial network and decision tree techniques, we discover comprehensible classification rules while also attempting to keep their classification error down. This technique is shown to outperform a number of commonly used machine learning technique applied to automatically recognize artifacts in the sleep EEGs.

关键词： feature evaluation and selection mining methods and algorithms neural nets

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：