Data warehousing continues to play an important role in global information systems for businesses. Meanwhile, applications of data warehousing have evolved from reporting and decision support systems to mission critic...
详细信息
ISBN:
(纸本)9781424451661
Data warehousing continues to play an important role in global information systems for businesses. Meanwhile, applications of data warehousing have evolved from reporting and decision support systems to mission critical decision making systems. This requires data warehouses to combine both historical and current data from operational systems. Since a join operation is one of the most expensive operations in query processing, it is vital to develop effective and efficient join techniques for a distributed warehouse environment. In this paper, we propose an agent-based adaptive join algorithm called Ajoin for effective and efficient online join operations in distributed data warehouses. Ajoin utilises intelligent agents for dynamic optimisation and coordination of join processing at run time. Key aspects of the Ajoin algorithm have been implemented and evaluated against other modern adaptive join algorithms. It has been shown that Ajoin exhibits better performance under various distributed and dynamic data warehouse environments in our study.
In this paper we experimentally study the performance of main-memory, parallel, multi-core join algorithms, focusing on sort-merge and (radix-) hash join. The relative performance of these two join approaches have bee...
详细信息
In this paper we experimentally study the performance of main-memory, parallel, multi-core join algorithms, focusing on sort-merge and (radix-) hash join. The relative performance of these two join approaches have been a topic of discussion for a long time. With the advent of modern multicore architectures, it has been argued that sort-merge join is now a better choice than radix-hash join. This claim is justified based on the width of SIMD instructions (sort-merge outperforms radix-hash join once SIMD is sufficiently wide), and NUMA awareness (sort-merge is superior to hash join in NUMA architectures). We conduct extensive experiments on the original and optimized versions of these algorithms. The experiments show that, contrary to these claims, radix-hash join is still clearly superior, and sort-merge approaches to performance of radix only when very large amounts of data are involved. The paper also provides the fastest implementations of these algorithms, and covers many aspects of modern hardware architectures relevant not only for joins but for any parallel data processing operator.
Existing workflow management systems assume that scientists have a well-specified workflow design before the execution. In reality, a lot of scientific discoveries are made as a result of a dynamic process, where scie...
详细信息
Existing workflow management systems assume that scientists have a well-specified workflow design before the execution. In reality, a lot of scientific discoveries are made as a result of a dynamic process, where scientists keep proposing new hypotheses and verifying them through multiple tries of various experiments before achieving successful experimental results. Consequently, not all the experiments in a workflow execution have necessarily contributed to the final result. In this paper, we investigate the problem of effectively reproducing the results of previous scientific workflow executions by discovering the critical experiments leading to the success and the logical constraints on their execution order. Relational schema and SQL queries have been designed for effectively recording the workflow execution log, efficiently identifying the critical experiments from the log, and recommending experiment reproduction strategies to users. Furthermore, we propose optimization techniques for evaluating such SQL queries according to the unique characteristics of the log data. Experimental evaluations demonstrate the performance speedup of our approach. (C) 2008 Elsevier B.V. All rights reserved.
P2P systems are highly dynamic in nature. Nodes may join in or leave the P2P system at any moment. Frequently joining or leaving must increase the maintenance overhead greatly in DHT-based P2P system. The main reason ...
详细信息
ISBN:
(纸本)9788955191356
P2P systems are highly dynamic in nature. Nodes may join in or leave the P2P system at any moment. Frequently joining or leaving must increase the maintenance overhead greatly in DHT-based P2P system. The main reason of causing the cost is the lookup cost that nodes build their fingers. In this paper we introduce an iterative join algorithm for Chord that is suitable for highly dynamic environments. Iterative join algorithm builds the finger of node by iterative lookup and by the help of fingers information of nodes in the lookup path. Theory analysis and simulation show that Iterative join algorithm decreases efficiently the maintenance overhead and improve the lookup performance.
When a multidatabase system contains textual database systems (i.e., information retrieval systems), queries against the global schema of the multidatabase system may contain a new type of joins-joins between attribut...
详细信息
When a multidatabase system contains textual database systems (i.e., information retrieval systems), queries against the global schema of the multidatabase system may contain a new type of joins-joins between attributes of textual type. Three algorithms for processing such a type of joins are presented and their I/O costs are analyzed in this paper. Since such a type of joins often involves document collections of very large size, it is very important to find efficient algorithms to process them. The three algorithms differ on whether the documents themselves or the inverted files on the documents are used to process the join. Our analysis and the simulation results indicate that the relative performance of these algorithms depends on the input document collections, system characteristics, and the input query. For each algorithm, the type of input document collections with which the algorithm is likely to perform well is identified. An integrated algorithm that automatically selects the best algorithm to use is also proposed.
We propose a new partition-based join algorithm, called select-partitioned join, which performs better than the sort-merge and hash-partitioned join. The proposed select-partitioned join algorithm consists of three ma...
详细信息
We propose a new partition-based join algorithm, called select-partitioned join, which performs better than the sort-merge and hash-partitioned join. The proposed select-partitioned join algorithm consists of three major steps. The first step is to determine a partitioning pattern by which the total join cost can be minimized and choose the bound values of the buckets by using a selection algorithm. The second is to partition both relations into ranged buckets according to the partitioning pattern and the bound values chosen in the previous step. And the last step is to apply the nested-block join on the partitioned bucket pairs. The selection algorithm is based on the cumulative distribution function and it is performed by a single scan of the smaller relation. The performance of the select-partitioned join is analyzed in terms of the number of I/Os and compared with the sort-merge and hash-partitioned join algorithms. Our join algorithm is better than a hash-partitioned join algorithm, which is, in general, known to be better for the join operation. Simulation experiments are conducted for the join algorithms.
This paper proposes an algorithm that improves the sort-based join method. Unlike the sort-based join, it employs both sorting and partitioning for avoiding two complete sorts of both relations, thus it will be referr...
详细信息
This paper proposes an algorithm that improves the sort-based join method. Unlike the sort-based join, it employs both sorting and partitioning for avoiding two complete sorts of both relations, thus it will be referred to as hybrid join. The algorithm consists of completely sorting only the smaller relation and partitioning the other one into ranged buckets according to the order statistics of the sorted relation. The final join is performed on the sorted relation and the ranged buckets.
A join operation consists of selecting from the set of all pairs of records of 2 files those pairs that possess some matching property. Because the join operation is so important in database applications, several alg...
详细信息
A join operation consists of selecting from the set of all pairs of records of 2 files those pairs that possess some matching property. Because the join operation is so important in database applications, several algorithms have been proposed for performing it efficiently. A sort-based join is performed by completely sorting both files, then joining the 2 sorted files in one pass. The one-way join-during-merge algorithm consists of completely sorting one file and only partially sorting the other file and then performing the join. Two join-during-merge algorithms are described and analyzed and their superiority with respect to the traditional sort-based algorithm is shown. It is suggested that the join-during-merge algorithm be used in all cases in which the sort-based algorithm was considered convenient. Since it is possible to choose the optimal algorithm before performing the join, the stated gains can be achieved in reality.
暂无评论