检索结果-内蒙古大学图书馆

Peak power modeling for join algorithms in DBMS

JOURNAL OF COMPUTER AND SYSTEM SCIENCES 2015年第3期81卷 599-614页

作者： Yang, Liang Huai Zhao, Yanzhu Fan, Yulei Zhu, Yihua Yu, Jian Zhejiang Univ Technol Sch Comp Sci & Technol Hangzhou Zhejiang Peoples R China Auckland Univ Technol Sch Comp & Math Sci Auckland New Zealand

Accurate peak power evaluation of query processing is fundamental to a power-aware DBMS running in large data centers. To estimate the peak power of the core operator join in query processing, the concept of CPU-boundedness was introduced, i.e., the ratio of CPU-intensive operations in unit time. The power prediction models were constructed with the piecewise and continuous fitting methods;and the multivariate model was further developed by incorporating both CPU-boundedness and CPU-frequency into the model via surface fitting. A non-runtime peak power estimation method is proposed for four most commonly used join algorithms in DBMS. To the best of our knowledge, our work is the first attempt towards modeling and estimating the peak power of query processing. Extensive experiments have demonstrated the effectiveness of our proposed methods with acceptable mean relative errors. (C) 2014 Elsevier Inc. All rights reserved.

关键词： Peak power CPU-boundedness join algorithms Power-aware DBMS Data center

来源：评论

学校读者我要写书评

暂无评论

Optimizations for filter-based join algorithms in MapReduce

引用

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021年第5期40卷 8963-8980页

作者： Rababa, Salahaldeen Al-Badarneh, Amer Jordan Univ Sci & Technol Comp Engn Dept Irbid Jordan Jordan Univ Sci & Technol Comp Informat Syst Dept Irbid Jordan

Large-scale datasets collected from heterogeneous sources often require a join operation to extract valuable information. MapReduce is an efficient programming model for processing large-scale data. However, it has some limitations in processing heterogeneous datasets. This is because of the large amount of redundant intermediate records that are transferred through the network. Several filtering techniques have been developed to improve the join performance, but they require multiple MapReduce jobs to process the input datasets. To address this issue, the adaptive filter-based join algorithms are presented in this paper. Specifically, three join algorithms are introduced to perform the processes of filters creation and redundant records elimination within a single MapReduce job. A cost analysis of the introduced join algorithms shows that the I/O cost is reduced compared to the state-of-the-art filter-based join algorithms. The performance of the join algorithms was evaluated in terms of the total execution time and the total amount of I/O data transferred. The experimental results show that the adaptive Bloom join, semi-adaptive intersection Bloom join, and adaptive intersection Bloom join decrease the total execution time by 30%, 25%, and 35%, respectively;and reduce the total amount of I/O data transferred by 18%, 25%, and 50%, respectively.

关键词： join algorithms big data management query optimization MapReduce

来源：评论

学校读者我要写书评

暂无评论

AN EVALUATION OF RELATIONAL join algorithms IN A PIPELINED QUERY-PROCESSING ENVIRONMENT

引用

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 1988年第6期14卷 838-848页

作者： MIKKILINENI, KP SU, SYW UNIV FLORIDA CTR DATABASE RES & DEVGAINESVILLEFL 32611

The performance of query execution in distributed data- base systems depends mainly on the efficiency of the distributed algorithms used for executing query operations and the efficiency of the query processing strategies. In this paper, we present and evaluate a query processing strategy which is based on pipelining and data-flow techniques, We develop timing equations for calculating the performance of four join algorithms: 1) rusted block, 2) hash, 3) sort-merge, and 4) pipelined sort-merge. They are used to execute the join operation in a query in distributed fashion and in pipelined fashion. Based on these equations and similar sets of equations developed for other relational algebraic operations, we evaluated the performance of query execution using the different join algorithms. The effects of varying the values of 1) processing time, 2) I/O time, 3) communication time, 4) buffer size, and 5) join selectivity on the performance of the pipelined join algorithms are investigated. The results are compared to the results obtained by employing the same algorithms for executing queries using the distributed processing approach which does not exploit the vertical concurrency of the pipelining approach. These results clearly establish the benefits of pipelining. [ABSTRACT FROM AUTHOR]

关键词： execution time analysis Index Terms: Distributed database query processing join algorithms performance evaluation pipelined query processing relational databases tinting equations

来源：评论

学校读者我要写书评

暂无评论

An analysis of two-way equi-join algorithms under MapReduce

引用

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES 2022年第4期34卷 1074-1085页

作者： Al-Badarneh, Amer F. Rababa, Salahaldeen Atef Jordan Univ Sci & Technol Comp Informat Syst Dept POB 3030 Irbid 22110 Jordan

Large-scale datasets collected from heterogeneous sources often require a join operation to extract valuable information. MapReduce is an efficient programming model for large-scale data processing. However, it has some limitations in processing heterogeneous datasets. In this study, we review the state-of-the-art strategies for joining two datasets based on an equi-join condition and provide a detail implementation for each strategy. We also present an in-depth analysis of the join strategies and discuss their feasibilities and limitations to assist the reader in selecting the most efficient strategy. Concluding, we outline interesting directions for future join processing systems. (c) 2020 The Authors. Production and hosting by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://***/licenses/by-nc-nd/4.0/).

关键词： Big data join algorithms MapReduce Equi-join Hadoop

来源：评论

学校读者我要写书评

暂无评论

On join Sampling and the Hardness of Combinatorial Output-Sensitive join algorithms 23

On Join Sampling and the Hardness of Combinatorial Output-Se...

引用

42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS)

作者： Deng, Shiyuan Lu, Shangqi Tao, Yufei Chinese Univ Hong Kong Hong Kong Peoples R China

ISBN: (纸本)9798400701276

We present a dynamic index structure for join sampling. Built for an (equi-) join Q - let IN be the total number of tuples in the input relations of Q - the structure uses (O) over tilde( IN) space, supports a tuple update of any relation in (O) over tilde (1) time, and returns a uniform sample from the join result in (O) over tilde (IN rho* /max{1, OUT}) time with high probability (w.h.p.), where OUT and rho* are the join's output size and fractional edge covering number, respectively;notation (O) over tilde(.) hides a factor polylogarithmic to IN. We further show how our result justifies the (O) over tilde (IN rho*) running time of existing worst-case optimal join algorithms (for full result reporting) even when OUT << IN rho*. Specifically, unless the combinatorial K-clique hypothesis is false, no combinatorial algorithms (i.e., algorithms not relying on fast matrix multiplication) can compute the join result in O( IN rho*-epsilon) time w.h.p. even if OUT = IN epsilon, regardless of how small the constant epsilon > 0 is.

关键词： join algorithms Sampling Conjunctive Queries Lower Bounds

来源：评论

学校读者我要写书评

暂无评论

Worst-case Optimal join algorithms

引用

JOURNAL OF THE ACM 2018年第3期65卷 16-16页

作者： Ngo, Hung Q. Porat, Ely Re, Christopher Rudra, Atri Univ Buffalo SUNY Buffalo NY 14260 USA Bar Ilan Univ IL-5290002 Ramat Gan Israel Stanford Univ Stanford CA 94305 USA Univ Buffalo 338 Davis Hall Buffalo NY 14214 USA Gates Comp Sci Bldg 353 Serra Mall Stanford CA 94305 USA

Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a new algorithm to process these queries optimally in terms of worst-case data complexity. Our result builds on recent work by Atserias, Grohe, and Marx, who gave bounds on the size of a natural join query in terms of the sizes of the individual relations in the body of the query. These bounds, however, are not constructive: they rely on Shearer's entropy inequality, which is information-theoretic. Thus, the previous results leave open the question of whether there exist algorithms whose runtimes achieve these optimal bounds. An answer to this question may be interesting to database practice, as we show in this article that any project-join style plans, such as ones typically employed in a relational database management system, are asymptotically slower than the optimal for some queries. We present an algorithm whose runtime is worst-case optimal for all natural join queries. Our result may be of independent interest, as our algorithm also yields a constructive proof of the general fractional cover bound by Atserias, Grohe, and Marx without using Shearer's inequality. This bound implies two famous inequalities in geometry: the Loomis-Whitney inequality and its generalization, the Bollobas-Thomason inequality. Hence, our results algorithmically prove these inequalities as well. Finally, we discuss how our algorithm can be used to evaluate full conjunctive queries optimally, to compute a relaxed notion of joins and to optimally (in the worst-case) enumerate all induced copies of a fixed subgraph inside of a given large graph.

关键词： join algorithms fractional cover bound Loomis-Whitney inequality Bollobas-Thomason inequality

来源：评论

学校读者我要写书评

暂无评论

Adaptive algorithms for join processing in distributed database systems

引用

DISTRIBUTED AND PARALLEL DATABASES 1997年第3期5卷 233-269页

作者： Scheuermann, P Chong, EI ORACLE CORP NEW ENGLAND R&D CTR NASHUA NH 03062 USA

Distributed query processing algorithms usually perform data reduction by using a semijoin program, but the problem with these approaches is that they still require an explicit join of the reduced relations in the final phase. We introduce an efficient algorithm for join processing in distributed database systems that makes use of bipartite graphs in order to reduce data communication costs and local processing costs. The bipartite graphs represent the tuples that can be joined in two relations taking also into account the reduction state of the relations. This algorithm fully reduces the relations at each site. We then present an adaptive algorithm for response time optimization that takes into account the system configuration, i.e., the additional resources available and the data characteristics, in order to select the best strategy for response time minimization. We also report on the results of a set of experiments which show that our algorithms outperform a number of the recently proposed methods for total processing time and response time minimization.

关键词： distributed query processing join algorithms adaptive algorithms bipartite graphs

来源：评论

学校读者我要写书评

暂无评论

Optimizing main-memory join on modern hardware

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2002年第4期14卷 709-730页

作者： Manegold, S Boncz, P Kersten, M CWI NL-1098 SJ Amsterdam Netherlands

In the past decade, the exponential growth in commodity CPU's speed has far outpaced advances in memory latency. A second trend is that CPU performance advances are not only brought by increased clock rate, but also by increasing parallelism inside the CPU. Current database systems have not yet adapted to these trends and show poor utilization of both CPU and memory resources on current hardware. In this paper, we show how these resources can be optimized for large joins and translate these insights into guidelines for future database architectures, encompassing data structures, algorithms, cost modeling, and implementation. In particular, we discuss how vertically fragmented data structures optimize cache performance on sequential data access. On the algorithmic side, we refine the partitioned hash-join with a new partitioning algorithm called radix-cluster, which is specifically designed to optimize memory access. The performance of this algorithm is quantified using a detailed analytical model that incorporates memory access costs in terms of a limited number of parameters, such as cache sizes and miss penalties. We also present a calibration. tool that extracts such parameters automatically from any computer hardware. The accuracy of our models is proven by exhaustive experiments conducted with the Monet database system on three different hardware platforms. Finally, we investigate the effect of implementation techniques that optimize CPU resource usage. Our experiments show that large joins can be accelerated almost an-order of magnitude on modern RISC hardware when both memory and CPU resources are optimized.

关键词： main-memory databases query processing memory access optimization decomposed storage model join algorithms implementation techniques

来源：评论

学校读者我要写书评

暂无评论

Rank join Queries in NoSQL Databases

引用

PROCEEDINGS OF THE VLDB ENDOWMENT 2014年第7期7卷 493-504页

作者： Ntarmos, Nikos Patlakas, Ioannis Triantafillou, Peter Univ Glasgow Sch Comp Sci Glasgow Lanark Scotland Max Planck Inst Informat Saarbrucken Germany

Rank (i.e., top-k) join queries play a key role in modern analytics tasks. However, despite their importance and unlike centralized settings, they have been completely overlooked in cloud NoSQL settings. We attempt to fill this gap: We contribute a suite of solutions and study their performance comprehensively. Baseline solutions are offered using SQL-like languages (like Hive and Pig), based on MapReduce jobs. We first provide solutions that are based on specialized indices, which may themselves be accessed using either MapReduce or coordinator-based strategies. The first index-based solution is based on inverted indices, which are accessed with MapReduce jobs. The second index-based solution adapts a popular centralized rank-join algorithm. We further contribute a novel statistical structure comprising histograms and Bloom filters, which forms the basis for the third index-based solution. We provide (i) MapReduce algorithms showing how to build these indices and statistical structures, (ii) algorithms to allow for online updates to these indices, and (iii) query processing algorithms utilizing them. We implemented all algorithms in Hadoop (HDFS) and HBase and tested them on TPC-H datasets of various scales, utilizing different queries on tables of various sizes and different score-attribute distributions. We ported our implementations to Amazon EC2 and "in-house" lab clusters of various scales. We provide performance results for three metrics: query execution time, network bandwidth consumption, and dollar-cost for query execution.

关键词： Query languages Inverted indices Network bandwidth On line updates Query execution Query execution time Query processing algorithms Rank join algorithms Statistical structures

来源：评论

学校读者我要写书评

暂无评论

HEAP-FILTER MERGE join - A NEW ALGORITHM FOR joinING MEDIUM-SIZE INPUTS

引用

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 1991年第9期17卷 979-982页

作者： GRAEFE, G Department of Computer Science University of Colorado ECOT 7-22 Engineering Center Campus Box 430 Boulder CO 80309

We present a new algorithm for relational equijoin. The algorithm is a modification of merge join, but promises superior performance for medium-size inputs. In many cases it even compares favorably with both merge joi... 详细信息

关键词： RELATIONAL DATABASES join algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：