This paper investigates techniques for efficiently executing multiquery workloads from data and computation-intensive applications in parallel and/or distributed computing environments. In this context, we describe a ...
详细信息
This paper investigates techniques for efficiently executing multiquery workloads from data and computation-intensive applications in parallel and/or distributed computing environments. In this context, we describe a database optimization framework that supports data and computation reuse, query scheduling, and active semantic caching to speed up the evaluation of multiquery workloads. Its most striking feature is the ability of optimizing the execution of queries in the presence of application-specific constructs by employing a customizable data and computation reuse model. Furthermore, we discuss how the proposed optimization model is flexible enough to work efficiently irrespective of the parallel/distributed environment underneath. In order to evaluate the proposed optimization techniques, we present experimental evidence using real data analysis applications. For this purpose, a common implementation for the queries under study was provided according to the database optimization framework and deployed on top of three distinct experimental configurations: a shared memory multiprocessor, a cluster of workstations, and a distributed computational Grid-like environment.
With the rapid development of modern real-time applications, the need of high scalable and predictable real-time transaction processing technology becomes more and more urgent. In this paper we focus on real-time tran...
详细信息
ISBN:
(纸本)0769522165
With the rapid development of modern real-time applications, the need of high scalable and predictable real-time transaction processing technology becomes more and more urgent. In this paper we focus on real-time transaction scheduling algorithm in shared-nothing parallel database systems. We propose and evaluate a new time-stamp based scheduling protocol, which uses priority-based time-stamp to implement parallel sub-transactions synchronization. The experimental results show our new protocol can better resolve the conflict between the synchronization controlling and the communication overhead. Therefore the protocol performs well in the cases when the system overload is heavy or the skew problem is serious.
We propose a new declustering scheme for allocating uniform multidimensional data among parallel disks. The scheme, aimed at reducing disk access time for range queries, is based on Golden Ratio Sequences for two dime...
详细信息
We propose a new declustering scheme for allocating uniform multidimensional data among parallel disks. The scheme, aimed at reducing disk access time for range queries, is based on Golden Ratio Sequences for two dimensions and Kronecker Sequences for higher dimensions. Using exhaustive simulation, we show that, in-two dimensions, the worst-case (additive) deviation of the scheme from the optimal response time for any range query is one when the number of disks (M) is at most 22;its worst-case deviation is two when M less than or equal to 94;and its worst-case deviation is four when M less than or equal to 550. In two dimensions, we prove that whenever M is a. Fibonacci number, the average performance of the scheme is within 14 percent of the (generally, unachievable) strictly optimal scheme and its worst-case response time is within a multiplicative factor three of the optimal response time for any query, and within a factor 1.5 of the optimal for large queries. We also present comprehensive simulation results, on two-dimensional as well as on higher-dimensional data, that compare and demonstrate the advantages of our scheme over some recently proposed schemes in the literature.
Spatial database systems have been introduced in order to support non-traditional data types and more complex queries. Although bulk-loading techniques for access methods have been studied in the spatial database lite...
详细信息
Spatial database systems have been introduced in order to support non-traditional data types and more complex queries. Although bulk-loading techniques for access methods have been studied in the spatial database literature, parallel bulk-loading has not been addressed in a parallel spatial database context. Therefore, we study the problem of parallel bulk-loading, assuming that an R-tree like access method need to be constructed, from a spatial relation that is distributed to a number of processors. Analytical cost models and experimental evaluation based on real-life and synthetic datasets demonstrate that the index construction time can be reduced considerably by exploiting parallelism. I/O costs, CPU time and communication costs are taken into consideration in order to investigate the efficiency of the proposed algorithm. (C) 2003 Elsevier B.V. All rights reserved.
This paper focuses on parallel query optimization. We consider the operator problem and introduce a new class of execution strategies called Linear-oriented Bushy Trees (LBT). Compared to the related approach of the G...
详细信息
This paper focuses on parallel query optimization. We consider the operator problem and introduce a new class of execution strategies called Linear-oriented Bushy Trees (LBT). Compared to the related approach of the General Bushy Trees (GBT) a significant complexity reduction of the operator ordering problem can be derived theoretically and demonstrated experimentally (e.g, compared with GBTs, LBTs authorize optimization time improvement that can reach up to 49%) without losing quality. Finally we demonstrate that existing commercial parallel query optimizers need little extension mod ifications in order to handle LBTs. (C) 2000 Elsevier Science B.V. All rights reserved.
The LOGFLOW parallel Prolog system is similar to the recent parallel database systems concerning its dataflow execution model and its capability of running on shared-nothing architectures. In this paper the abstract e...
详细信息
The LOGFLOW parallel Prolog system is similar to the recent parallel database systems concerning its dataflow execution model and its capability of running on shared-nothing architectures. In this paper the abstract execution and abstract machine models of LOGFLOW are examined from a database point of view. Transformations of relational operators into the Logicflow Graph representation of Prolog programs an explained. Thus, LOGFLOW can operate as a relational database machine. (C) 2000 Published by Elsevier Science B.V. All rights reserved.
In order to re-adjust the parallel execution of SQL queries in case of metric estimation or discretization errors, we propose an incremental parallelization method which carries out simultaneously both scheduling and ...
详细信息
In order to re-adjust the parallel execution of SQL queries in case of metric estimation or discretization errors, we propose an incremental parallelization method which carries out simultaneously both scheduling and mapping in co-operation with two incremental memory allocation heuristics (ParAd: parallelism degree adjustment, and MaCRelax: mapping clues relaxation) in a dynamic multi-user context. The two incremental memory allocation heuristics are integrated in the mapping method which attempt to avoid time-consuming multi-bucket join execution generating numerous additional I/O. A performance evaluation of the ParAd heuristic shows: (i) a significant join response time savings (from 16.11% to 35.62%), and (ii) with many complex queries, a more significant gain in response time (from 29% to 54%). (C) 2002 Elsevier Science B.V. All rights reserved.
Applications that explore, query, analyze, visualize, and, in general, process very large scale data sets are known as Data Intensive Applications. Large scale data intensive computing plays an increasingly important ...
详细信息
Applications that explore, query, analyze, visualize, and, in general, process very large scale data sets are known as Data Intensive Applications. Large scale data intensive computing plays an increasingly important role in many scientific activities and commercial applications, whether it involves data mining of commercial transactions, experimental data analysis and visualization, or intensive simulation such as climate modeling. By combining high performance computation, very large data storage, high bandwidth access, and high-speed local and wide area networking, data intensive computing enhances the technical capabilities and usefulness of most systems. The integration of parallel and distributed computational environments will produce major improvements in performance for both computing intensive and data intensive applications in the future. The purpose of this introductory article is to provide an overview of the main issues in parallel data intensive computing in scientific and commercial applications and to encourage the reader to go into the more in-depth articles later in this special issue. (C) 2002 Elsevier Science B.V. All rights reserved.
Sorting in database processing is frequently required through the use of Order By and Distinct clauses in SQL. Sorting is also widely known in computer science community at large. Sorting in general covers internal an...
详细信息
Sorting in database processing is frequently required through the use of Order By and Distinct clauses in SQL. Sorting is also widely known in computer science community at large. Sorting in general covers internal and external sorting. Past published work has extensively focused on external sorting on uni-processors (serial external sorting), and internal sorting on multi-processors (parallel internal sorting). External sorting on multi-processors (parallel external sorting) has received surprisingly little attention;furthermore, the way current parallel database systems do sorting is far from optimal in many scenarios. In this paper, we present a taxonomy for parallel sorting in parallel database systems, which covers five sorting methods: namely parallel merge-all sort, parallel binary-merge sort, parallel redistribution binary-merge sort, parallel redistribution merge-all sort, and parallel partitioned sort. The first two methods are previously proposed approaches to parallel external sorting which have been adopted as status quo of parallel database sorting, whereas the latter three methods which are based on redistribution and repartitioning are new that have not been discussed in the literature of parallel external sorting. Performance of these five methods is investigated and the results are reported. (C) 2002 Elsevier Science Inc. All rights reserved.
In this paper, we present a taxonomy of indexing schemes in parallel database systems. Index partitioning is not recognized widely as yet. One of the reasons is that most of index structures are trees, not flat struct...
详细信息
In this paper, we present a taxonomy of indexing schemes in parallel database systems. Index partitioning is not recognized widely as yet. One of the reasons is that most of index structures are trees, not flat structures like tables, and consequently, index partitioning imposes some degree of complexity compared with common data partitioning for tables. We present three parallel indexing schemes, and discuss their maintenance strategies. We also analyze their storage requirements.
暂无评论