检索结果-内蒙古大学图书馆

SELECT-PARTITIONED join - AN IMPROVED PARTITION-BASED join algorithm

INFORMATION SYSTEMS 1991年第2期16卷 199-209页

作者： HO, C JONG, SP MYUNGHWAN, K CORNELL UNIV SCH ELECT ENGN ITHACA NY 14853 USA

We propose a new partition-based join algorithm, called select-partitioned join, which performs better than the sort-merge and hash-partitioned join. The proposed select-partitioned join algorithm consists of three major steps. The first step is to determine a partitioning pattern by which the total join cost can be minimized and choose the bound values of the buckets by using a selection algorithm. The second is to partition both relations into ranged buckets according to the partitioning pattern and the bound values chosen in the previous step. And the last step is to apply the nested-block join on the partitioned bucket pairs. The selection algorithm is based on the cumulative distribution function and it is performed by a single scan of the smaller relation. The performance of the select-partitioned join is analyzed in terms of the number of I/Os and compared with the sort-merge and hash-partitioned join algorithms. Our join algorithm is better than a hash-partitioned join algorithm, which is, in general, known to be better for the join operation. Simulation experiments are conducted for the join algorithms.

关键词： join algorithm RELATIONAL DATABASE SELECTION algorithm CUMULATIVE DISTRIBUTION FUNCTION

来源：评论

学校读者我要写书评

暂无评论

HYBRID join - AN IMPROVED SORT-BASED join algorithm

引用

INFORMATION PROCESSING LETTERS 1989年第2期32卷 51-56页

作者： CHOI, HK KIM, M Dep. Electr. Eng. Korea Adv. Inst. Sci. and Technol. P.O. Box 150 Cheongryang Seoul 130-650 Rep. Korea

This paper proposes an algorithm that improves the sort-based join method. Unlike the sort-based join, it employs both sorting and partitioning for avoiding two complete sorts of both relations, thus it will be referred to as hybrid join. The algorithm consists of completely sorting only the smaller relation and partitioning the other one into ranged buckets according to the order statistics of the sorted relation. The final join is performed on the sorted relation and the ranged buckets.

关键词： join algorithm relational database

来源：评论

学校读者我要写书评

暂无评论

An Agent-based Adaptive join algorithm for Distributed Data Warehousing

An Agent-based Adaptive Join Algorithm for Distributed Data ...

引用

Computation World - Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns Conference

作者： Yu, Qicheng McCann, Julie A. Cai, Fang Fang London Metropolitan Univ Fac Comp London England Imperial Coll Dept Comp London SW7 2AZ England

ISBN: (纸本)9781424451661

Data warehousing continues to play an important role in global information systems for businesses. Meanwhile, applications of data warehousing have evolved from reporting and decision support systems to mission critical decision making systems. This requires data warehouses to combine both historical and current data from operational systems. Since a join operation is one of the most expensive operations in query processing, it is vital to develop effective and efficient join techniques for a distributed warehouse environment. In this paper, we propose an agent-based adaptive join algorithm called Ajoin for effective and efficient online join operations in distributed data warehouses. Ajoin utilises intelligent agents for dynamic optimisation and coordination of join processing at run time. Key aspects of the Ajoin algorithm have been implemented and evaluated against other modern adaptive join algorithms. It has been shown that Ajoin exhibits better performance under various distributed and dynamic data warehouse environments in our study.

关键词： Adaptive join software agents data warehousing join algorithm

来源：评论

学校读者我要写书评

暂无评论

O2ijoin： An Efficient Index-Based algorithm for Overlap Interval join

引用

Journal of Computer Science & Technology 2018年第5期33卷 1023-1038页

作者： Ji-Zhou Luo Sheng-Fei Shi Guang Yang Hong-Zhi Wang Jian-Zhong Li School of Computer Science and Technology Harbin Institute of Technology Harbin 150001 China Guangdong Key Laboratory of Popular High Performance Computers Key Laboratory of Service Computing and Application Shenzhen 518000 China

Time intervals are often associated with tuples to represent their valid time in temporal relations, where overlap join is crucial for various kinds of queries. Many existing overlap join algorithms use indices based on tree structures such as quad-tree, B＋-tree and interval tree. These algorithms usually have high CPU cost since deep path traversals are unavoidable, which makes them not so competitive as data-partition or plane-sweep based algorithms. This paper proposes an efficient overlap join algorithm based on a new two-layer flat index named as Overlap Interval Inverted Index （i.e., O2i Index）. It uses an array to record the end points of intervals and approximates the nesting structures of intervals via two functions in the first layer, and the second layer uses inverted lists to trace all intervals satisfying the approximated nesting structures. With the help of the new index, the join algorithm only visits the must-be-scanned lists and skips all others. Analyses and experiments on both real and synthetic datasets show that the proposed algorithm is as competitive as the state-of-the-art algorithms.

关键词： overlap interval join temporal relation overlap inverted index join algorithm

来源：评论

学校读者我要写书评

暂无评论

join DURING MERGE - AN IMPROVED SORT BASED algorithm

引用

INFORMATION PROCESSING LETTERS 1985年第1期21卷 11-16页

作者： NEGRI, M PELAGATTI, G Dipartimento di Elettronica Politecnico di Milano Piazza Leonardo da Vinci 32 20133 Milano Italy

A join operation consists of selecting from the set of all pairs of records of 2 files those pairs that possess some matching property. Because the join operation is so important in database applications, several algorithms have been proposed for performing it efficiently. A sort-based join is performed by completely sorting both files, then joining the 2 sorted files in one pass. The one-way join-during-merge algorithm consists of completely sorting one file and only partially sorting the other file and then performing the join. Two join-during-merge algorithms are described and analyzed and their superiority with respect to the traditional sort-based algorithm is shown. It is suggested that the join-during-merge algorithm be used in all cases in which the sort-based algorithm was considered convenient. Since it is possible to choose the optimal algorithm before performing the join, the stated gains can be achieved in reality.

关键词： Relational join join algorithm

来源：评论

学校读者我要写书评

暂无评论

join algorithms under Apache Spark: Revisited 19

Join Algorithms under Apache Spark: Revisited

引用

5th International Conference on Computer and Technology Applications (ICCTA)

作者： Al-Badarneh, Amer Jordan Univ Sci & Technol Comp Informat Syst Dept Irbid 22110 Jordan

ISBN: (纸本)9781450371810

Currently, we are dealing with large scale applications, which in turn generate massive amount of data and information. Large amount of data often requires processing algorithms using massive parallelism, where the main performance metrics is the communication cost. Apache Spark is highly scalable, fault-tolerance, and can be used across many computers. So join algorithm is one of the most widely used algorithms in database systems, but it is also a heavily time consuming operation. In this work, we will survey and criticize several implementations of Spark join algorithms and discuss their strengths and weaknesses, present a detailed comparison of these algorithms, and introduce optimization approaches to enhance and tune the performance of join algorithms.

关键词： Mapreduce join algorithm Apache Spark Hadoop

来源：评论

学校读者我要写书评

暂无评论

Performance analysis of three text-join algorithms

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 1998年第3期10卷 477-492页

作者： Meng, WY Yu, C Wang, W Rishe, N SUNY Binghamton Dept Comp Sci Binghamton NY 13902 USA Univ Illinois Dept Elect Engn & Comp Sci Chicago IL 60607 USA Univ Calif Los Angeles Dept Comp Sci Los Angeles CA 90024 USA Florida Int Univ Sch Comp Sci Miami FL 33199 USA

When a multidatabase system contains textual database systems (i.e., information retrieval systems), queries against the global schema of the multidatabase system may contain a new type of joins-joins between attributes of textual type. Three algorithms for processing such a type of joins are presented and their I/O costs are analyzed in this paper. Since such a type of joins often involves document collections of very large size, it is very important to find efficient algorithms to process them. The three algorithms differ on whether the documents themselves or the inverted files on the documents are used to process the join. Our analysis and the simulation results indicate that the relative performance of these algorithms depends on the input document collections, system characteristics, and the input query. For each algorithm, the type of input document collections with which the algorithm is likely to perform well is identified. An integrated algorithm that automatically selects the best algorithm to use is also proposed.

关键词： query processing textual database information retrieval join algorithm multidatabase

来源：评论

学校读者我要写书评

暂无评论

Load balancing in join algorithms for skewed data in MapReduce systems

引用

JOURNAL OF SUPERCOMPUTING 2019年第1期75卷 228-254页

作者： Gavagsaz, Elaheh Rezaee, Ali Javadi, Hamid Haj Seyyed Islamic Azad Univ Sci & Res Branch Dept Comp Engn Tehran Iran Shahed Univ Fac Math & Comp Sci Dept Appl Math Tehran Iran

join is an essential tool for data analysis which collected from different data sources. MapReduce has emerged as a prominent programming model for processing of massive data. However, traditional join algorithms based on MapReduce are not efficient when handling skewed data. The presence of data skew in input data leads to considerable load imbalance and performance degradation. This paper proposes a new skew-insensitive method, called fine-grained partitioning for skew data (FGSD) which can improve the load balancing for reduce tasks. The proposed method considers the properties of both input and output data through a proposed stream sampling algorithm. FGSD introduces a new approach for distribution of input data which leads to efficiently handling redistribution and join product skew. The experimental results confirm that our solution can not only achieve higher balancing performance, but also reduce the execution time of a job with varying degrees of the data skew. Furthermore, FGSD does not require any modification to the MapReduce environment and is applicable to complex join.

关键词： Load balancing join algorithm Data skew MapReduce Spark

来源：评论

学校读者我要写书评

暂无评论

GPU-based efficient join algorithms on Hadoop

引用

JOURNAL OF SUPERCOMPUTING 2021年第1期77卷 292-321页

作者： Wang, Hongzhi Li, Ning Wang, Zheng Li, Jianing Harbin Inst Technol POB 750 Harbin Peoples R China

The growing data have brought tremendous pressure for query processing and storage, so there are many studies that focus on using GPU to accelerate join operation, which is one of the most important operations in modern database systems. However, existing GPU acceleration join operation researches are not very suitable for the join operation on big data. Based on this, this paper speeds up nested loop join, hash join and theta join, combining Hadoop with GPU, which is also the first to use GPU to accelerate theta join. At the same time, after the data pre-filtering and pre-processing, using MapReduce and HDFS in Hadoop proposed in this paper, the larger data table can be handled, compared to existing GPU acceleration methods. Also with MapReduce in Hadoop, the algorithm proposed in this paper can estimate the number of results more accurately and allocate the appropriate storage space without unnecessary costs, making it more efficient. Experimental results show that comparing with GPU-based approach without Hadoop, our approach increases the speed by 1.5-2 times, and comparing with the Hadoop-based approaches without GPU, our approach increases the speed by 1.3-2 times.

关键词： GPU Hadoop join algorithm

来源：评论

学校读者我要写书评

暂无评论

Handling data skew in join algorithms using MapReduce

引用

EXPERT SYSTEMS WITH APPLICATIONS 2016年 51卷 286-299页

作者： Myung, Jaeseok Shim, Junho Yeon, Jongheum Lee, Sang-goo Samsung Elect Co Ltd Corp Design Ctr Seoul South Korea Sookmyung Womens Univ Div Comp Sci Seoul South Korea Seoul Natl Univ Sch Comp Sci & Engn Seoul 151 South Korea Seoul Natl Univ Seoul 151 South Korea

One of the major obstacles hindering effective join processing on MapReduce is data skew. Since MapReduce's basic hash-based partitioning method cannot solve the problem properly, two alternatives have been proposed: range-based and randomized methods. However, they still remain some drawbacks: the range-based method does not handle join product skew, and the randomized method performs worse than the basic hash-based partitioning when input relations are not skewed. In this paper, we present a new skew handling method, called multi-dimensional range partitioning (MDRP). The proposed method overcomes the limitations of traditional algorithms in two ways: 1) the number of output records expected at each, machine is considered, which leads to better handling of join product skew, and 2) a small number of input records are sampled before the actual join begins so that an efficient execution plan considering the degree of data skew can be created. As a result, in a scalar skew experiment, the proposed join algorithm is about 6.76 times faster than the range-based algorithm when join product skew exists and about 5.14 times than the randomized algorithm when input relations are not skewed. Moreover, through the worst-case analysis, we show that the input and the output imbalances are less than or equal to 2. The proposed algorithm does not require any modification to the original MapReduce environment and is applicable to complex join operations such as theta joins and multi-way joins. (C) 2016 Elsevier Ltd. All rights reserved.

关键词： MapReduce join algorithm Skew handling Multi-dimensional range partitioning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：