检索结果-内蒙古大学图书馆

Optimizing the execution of multiple data analysis queries on parallel and distributed environments

IEEE TRANSACTIONS ON parallel AND DISTRIBUTED SYSTEMS 2004年第6期15卷 520-532页

作者： Andrade, H Kurc, T Sussman, A Saltz, J Univ Maryland Dept Comp Sci College Pk MD 20742 USA Ohio State Univ Dept Biomed Informat Columbus OH 43210 USA

This paper investigates techniques for efficiently executing multiquery workloads from data and computation-intensive applications in parallel and/or distributed computing environments. In this context, we describe a database optimization framework that supports data and computation reuse, query scheduling, and active semantic caching to speed up the evaluation of multiquery workloads. Its most striking feature is the ability of optimizing the execution of queries in the presence of application-specific constructs by employing a customizable data and computation reuse model. Furthermore, we discuss how the proposed optimization model is flexible enough to work efficiently irrespective of the parallel/distributed environment underneath. In order to evaluate the proposed optimization techniques, we present experimental evidence using real data analysis applications. For this purpose, a common implementation for the queries under study was provided according to the database optimization framework and deployed on top of three distinct experimental configurations: a shared memory multiprocessor, a cluster of workstations, and a distributed computational Grid-like environment.

关键词： multiquery optimization parallel databases data analysis applications symmetric multiprocessing cluster computing grid computing

来源：评论

学校读者我要写书评

暂无评论

A study on parallel real-time transaction scheduling

A study on parallel real-time transaction scheduling

引用

4th International Conference on Computer and Information Technology

作者： Pan, Y Lu, YS Huazhong Univ Sci & Technol Wuhan 430074 Peoples R China

ISBN: (纸本)0769522165

With the rapid development of modern real-time applications, the need of high scalable and predictable real-time transaction processing technology becomes more and more urgent. In this paper we focus on real-time transaction scheduling algorithm in shared-nothing parallel database systems. We propose and evaluate a new time-stamp based scheduling protocol, which uses priority-based time-stamp to implement parallel sub-transactions synchronization. The experimental results show our new protocol can better resolve the conflict between the synchronization controlling and the communication overhead. Therefore the protocol performs well in the cases when the system overload is heavy or the skew problem is serious.

关键词： transaction processing real-time systems parallel databases protocols processor scheduling parallel transaction scheduling real-time transaction scheduling shared-nothing parallel database systems time-stamp based scheduling protocol priority-based time-stamp parallel sub-transactions synchronization communication overhead system overload skew problem

来源：评论

学校读者我要写书评

暂无评论

Multidimensional declustering schemes using Golden Ratio and Kronecker sequences

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2003年第3期15卷 659-670页

作者： Chen, CM Bhatia, R Sinha, RK Telcordia Technol Morristown NJ 07960 USA Bell Labs Murray Hill NJ 07974 USA AT&T Labs Res Middletown NJ 07748 USA

We propose a new declustering scheme for allocating uniform multidimensional data among parallel disks. The scheme, aimed at reducing disk access time for range queries, is based on Golden Ratio Sequences for two dimensions and Kronecker Sequences for higher dimensions. Using exhaustive simulation, we show that, in-two dimensions, the worst-case (additive) deviation of the scheme from the optimal response time for any range query is one when the number of disks (M) is at most 22;its worst-case deviation is two when M less than or equal to 94;and its worst-case deviation is four when M less than or equal to 550. In two dimensions, we prove that whenever M is a. Fibonacci number, the average performance of the scheme is within 14 percent of the (generally, unachievable) strictly optimal scheme and its worst-case response time is within a multiplicative factor three of the optimal response time for any query, and within a factor 1.5 of the optimal for large queries. We also present comprehensive simulation results, on two-dimensional as well as on higher-dimensional data, that compare and demonstrate the advantages of our scheme over some recently proposed schemes in the literature.

关键词： declustering disk allocation parallel databases

来源：评论

学校读者我要写书评

暂无评论

parallel bulk-loading of spatial data

引用

parallel COMPUTING 2003年第10期29卷 1419-1444页

作者： Papadopoulos, A Manolopoulos, Y Aristotle Univ Thessaloniki Dept Informat GR-54006 Thessaloniki Greece

Spatial database systems have been introduced in order to support non-traditional data types and more complex queries. Although bulk-loading techniques for access methods have been studied in the spatial database literature, parallel bulk-loading has not been addressed in a parallel spatial database context. Therefore, we study the problem of parallel bulk-loading, assuming that an R-tree like access method need to be constructed, from a spatial relation that is distributed to a number of processors. Analytical cost models and experimental evaluation based on real-life and synthetic datasets demonstrate that the index construction time can be reduced considerably by exploiting parallelism. I/O costs, CPU time and communication costs are taken into consideration in order to investigate the efficiency of the proposed algorithm. (C) 2003 Elsevier B.V. All rights reserved.

关键词： parallel databases spatial access methods bulk-loading query processing

来源：评论

学校读者我要写书评

暂无评论

Managing the operator ordering problem in parallel databases

引用

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 2000年第6期16卷 665-676页

作者： Kosch, H Univ Klagenfurt Dept Informat Technol A-9020 Klagenfurt Austria

This paper focuses on parallel query optimization. We consider the operator problem and introduce a new class of execution strategies called Linear-oriented Bushy Trees (LBT). Compared to the related approach of the General Bushy Trees (GBT) a significant complexity reduction of the operator ordering problem can be derived theoretically and demonstrated experimentally (e.g, compared with GBTs, LBTs authorize optimization time improvement that can reach up to 49%) without losing quality. Finally we demonstrate that existing commercial parallel query optimizers need little extension mod ifications in order to handle LBTs. (C) 2000 Elsevier Science B.V. All rights reserved.

关键词： parallel databases parallel query optimization Linear-oriented Bushy Trees extending existing optimizers

来源：评论

学校读者我要写书评

暂无评论

Logicflow execution model for parallel databases

引用

FUTURE GENERATION COMPUTER SYSTEMS 2000年第6期16卷 677-692页

作者： Kacsuk, P Podhorszki, N Hungarian Acad Sci MTA SZTAKI Comp & Automat Res Inst H-1132 Budapest Hungary

The LOGFLOW parallel Prolog system is similar to the recent parallel database systems concerning its dataflow execution model and its capability of running on shared-nothing architectures. In this paper the abstract execution and abstract machine models of LOGFLOW are examined from a database point of view. Transformations of relational operators into the Logicflow Graph representation of Prolog programs an explained. Thus, LOGFLOW can operate as a relational database machine. (C) 2000 Published by Elsevier Science B.V. All rights reserved.

关键词： parallel databases dataflow execution model shared-nothing architecture

来源：评论

学校读者我要写书评

暂无评论

CPU and incremental memory allocation in dynamic parallelization of SQL queries

引用

parallel COMPUTING 2002年第4期28卷 525-556页

作者： Hameurlain, A Morvan, F Univ Toulouse 3 IRIT F-31062 Toulouse France

In order to re-adjust the parallel execution of SQL queries in case of metric estimation or discretization errors, we propose an incremental parallelization method which carries out simultaneously both scheduling and mapping in co-operation with two incremental memory allocation heuristics (ParAd: parallelism degree adjustment, and MaCRelax: mapping clues relaxation) in a dynamic multi-user context. The two incremental memory allocation heuristics are integrated in the mapping method which attempt to avoid time-consuming multi-bucket join execution generating numerous additional I/O. A performance evaluation of the ParAd heuristic shows: (i) a significant join response time savings (from 16.11% to 35.62%), and (ii) with many complex queries, a more significant gain in response time (from 29% to 54%). (C) 2002 Elsevier Science B.V. All rights reserved.

关键词： parallel databases dynamic query optimization scheduling mapping memory allocation

来源：评论

学校读者我要写书评

暂无评论

parallel data intensive computing in scientific and commercial applications

引用

parallel COMPUTING 2002年第5期28卷 673-704页

作者： Cannataro, M Talia, D Srimani, PK Univ Calabria DEIS I-87036 Arcavacata Di Rende CS Italy Univ Calabria ICAR CNR I-87036 Arcavacata Di Rende CS Italy Clemson Univ Dept Comp Sci Clemson SC 29634 USA

Applications that explore, query, analyze, visualize, and, in general, process very large scale data sets are known as Data Intensive Applications. Large scale data intensive computing plays an increasingly important role in many scientific activities and commercial applications, whether it involves data mining of commercial transactions, experimental data analysis and visualization, or intensive simulation such as climate modeling. By combining high performance computation, very large data storage, high bandwidth access, and high-speed local and wide area networking, data intensive computing enhances the technical capabilities and usefulness of most systems. The integration of parallel and distributed computational environments will produce major improvements in performance for both computing intensive and data intensive applications in the future. The purpose of this introductory article is to provide an overview of the main issues in parallel data intensive computing in scientific and commercial applications and to encourage the reader to go into the more in-depth articles later in this special issue. (C) 2002 Elsevier Science B.V. All rights reserved.

关键词： data intensive algorithms parallel computing parallel databases parallel I/O

来源：评论

学校读者我要写书评

暂无评论

parallel database sorting

引用

INFORMATION SCIENCES 2002年第1-4期146卷 171-219页

作者： Taniar, D Rahayu, JW Monash Univ Sch Business Syst Clayton Vic 3800 Australia La Trobe Univ Dept Comp Sci & Comp Engn Bundoora Vic 3083 Australia

Sorting in database processing is frequently required through the use of Order By and Distinct clauses in SQL. Sorting is also widely known in computer science community at large. Sorting in general covers internal and external sorting. Past published work has extensively focused on external sorting on uni-processors (serial external sorting), and internal sorting on multi-processors (parallel internal sorting). External sorting on multi-processors (parallel external sorting) has received surprisingly little attention;furthermore, the way current parallel database systems do sorting is far from optimal in many scenarios. In this paper, we present a taxonomy for parallel sorting in parallel database systems, which covers five sorting methods: namely parallel merge-all sort, parallel binary-merge sort, parallel redistribution binary-merge sort, parallel redistribution merge-all sort, and parallel partitioned sort. The first two methods are previously proposed approaches to parallel external sorting which have been adopted as status quo of parallel database sorting, whereas the latter three methods which are based on redistribution and repartitioning are new that have not been discussed in the literature of parallel external sorting. Performance of these five methods is investigated and the results are reported. (C) 2002 Elsevier Science Inc. All rights reserved.

关键词： external sorting internal sorting sorting in database queries parallel sorting parallel databases

来源：评论

学校读者我要写书评

暂无评论

A taxonomy of indexing schemes for parallel database systems

引用

DISTRIBUTED AND parallel databases 2002年第1期12卷 73-106页

作者： Taniar, D Rahayu, JW Monash Univ Sch Business Syst Clayton Vic 3800 Australia La Trobe Univ Dept Comp Sci & Comp Engn Bundoora Vic 3083 Australia

In this paper, we present a taxonomy of indexing schemes in parallel database systems. Index partitioning is not recognized widely as yet. One of the reasons is that most of index structures are trees, not flat structures like tables, and consequently, index partitioning imposes some degree of complexity compared with common data partitioning for tables. We present three parallel indexing schemes, and discuss their maintenance strategies. We also analyze their storage requirements.

关键词： parallel databases indexing B plus trees data partitioning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：