检索结果-内蒙古大学图书馆

Cabin: A Compressed Adaptive Binned Scan Index

Proceedings of the ACM on Management of Data 2024年第1期2卷 1-26页

作者： Yiyuan Chen Shimin Chen SKLP and Center for Advanced Computer Systems Institute of Computing Technology CAS & University of Chinese Academy of Sciences Beijing China

Scan is a fundamental operation widely used in main-memory analytical database systems. To accelerate scans, previous studies build either record-order or sort-order structures known as scan indices. While achieving good performance, scan indices often incur significant space overhead, limiting their use in main-memory databases. For example, the most recent and best performing scan index, BinDex, consists of a sort-order position array, which is an array of rowIDs in the value order, and a set of record-order bit vectors, representing records in pre-defined value intervals. The structures can be much larger than the base data column *** this paper, we propose a novel scan index, Cabin, that exploits the following three techniques for better time-space tradeoff. 1) filter sketches that represent every 2^w-2 value intervals with a w-bit sketched vector, thereby exponentially reducing the space for the bit vectors; 2) selective position array that removes the rowID array for a fraction of intervals in order to lower the space overhead for the position array; and 3) data-aware intervals that judiciously select interval boundaries based on the data characteristics to better support popular values in skewed data distributions or categorical attributes. Experimental results show that compared with state-of-the-art scan solutions, Cabin achieves better time-space tradeoff, and attains 1.70 -- 4.48x improvement for average scan performance given the same space budget.

关键词： data awareness filter sketches scan index time-space tradeoff

来源：评论

学校读者我要写书评

暂无评论

CPU and GPU Hash Joins on Skewed Data

CPU and GPU Hash Joins on Skewed Data

引用

International Conference on Data Engineering Workshops

作者： Yuzhou Cai Shimin Chen SKLP and Center for Advanced Computer Systems ICT CAS University of Chinese Academy of Sciences

ISBN: (数字)9798350384031

ISBN: (纸本)9798350384048

Hash join is one of the most important and widely used query processing operations. In many real-world applications, such as graph data analysis, join keys can be highly skewed. However, data skew is often only a secondary design consideration in existing CPU and GPU hash join algorithms. We find that the data structures and algorithm steps in existing hash joins are less efficient for handling highly skewed join keys, leading to significant performance degradation. In this paper, we optimize hash joins for skewed data, and propose a CPU Skewconscious Hash join (CSH) and a GPU Skew-conscious Hash join (GSH). Our main idea is to detect skewed join keys and handle skewed vs. normal keys in separate routines optimized for their target cases. Our preliminary experiments show that compared to state-of-the-art CPU and GPU hash joins with existing skew-handling techniques, our proposed CSH and GSH achieve up to 8.0x and 13.5x improvement, respectively.

关键词： Degradation Data analysis Databases Query processing Conferences Graphics processing units Data structures

来源：评论

学校读者我要写书评

暂无评论

Pea Hash: A Performant Extendible Adaptive Hashing Index

引用

Proceedings of the ACM on Management of Data 2023年第1期1卷 1-25页

作者： Zhuoxuan Liu Shimin Chen SKLP and Center for Advanced Computer Systems Institute of Computing Technology CAS University of Chinese Academy of Sciences Beijing China

Hashing index is widely used to support efficient point operations. We observe that there is a conflict between performance and memory utilization goals. Existing hashing indices often have to trade off hash table access latency for better memory utilization. Moreover, many designs support only unique keys, and their performance is often suboptimal with skew *** this paper, we propose Pea Hash with two techniques to address the above two problems: (i) adaptive hashing strategy that holistically optimizes both access latency and memory utilization, and (ii) data-aware adaptive buckets that accommodate unique keys, and keys with various numbers of duplicates. We develop both an NVM-optimized Pea Hash and a DRAM-based Pea Hash index. Experiments on a machine equipped with Intel Optane DC Persistent memory show that compared to state-of-the-art NVM-optimized hashing indices, the NVM-optimized Pea Hash achieves up to 13.8x performance improvements with similar memory utilization. The DRAM-based Pea Hash outperforms existing in-DRAM hashing index designs, showing the generality of the proposed techniques.

关键词： duplicate keys hash index hashing strategy persistent memory

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：