检索结果-内蒙古大学图书馆

Split-bucket partition (SBP): a novel execution model for top-k and selection algorithms on GPUs

JOURNAL OF SUPERCOMPUTING 2024年第11期80卷 15122-15160页

作者： Yang, Yiqing Zhang, Guoyin Wu, Yanxia Zhao, Zhixiang Fu, Yan Harbin Engn Univ Dept Comp Sci Harbin 150001 Heilongjiang Peoples R China

top-k and selection operations are critical in data processing and analysis, and their efficient implementation on GPUs is increasingly important due to the growing demands of data analysis. Existing methods, primarily relying on the bucket partition execution model, encounter challenges such as uneven bucket distribution and latency in merging processes. To address these issues, we introduce a novel Split-Bucket Partition (SBP) execution model that specifically addresses these challenges. Additionally, we propose task and control flow optimizations targeted at top-k and selection algorithms, which further contribute to performance improvements. Our optimized algorithms significantly outperform existing approaches, delivering performance gains of up to 2.3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2.3$$\end{document} times and 1.6 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1.6$$\end{document} times for different bucket partitioning rules. Our algorithms show robust performance improvements in non-uniform data scenarios, with gains ranging from 1.9 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1.9$$\end{document} times to 15.5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$15.5$$\end{document} times. However, it should be noted that the SB

关键词： Bucket partition top-k algorithms Selection algorithms GPU parallel computing

来源：评论

学校读者我要写书评

暂无评论

A generic framework for efficient computation of top-k diverse results

引用

VLDB JOURNAL 2023年第4期32卷 737-761页

作者： Islam, Md Mouinul Asadi, Mahsa Amer-Yahia, Sihem Roy, Senjuti Basu New Jersey Inst Technol Newark NJ 07102 USA Univ Grenoble Alpes CNRS Grenoble France

Result diversification is extensively studied in the context of search, recommendation, and data exploration. There are numerous algorithms that return top-k results that are both diverse and relevant. These algorithms typically have computational loops that compare the pairwise diversity of records to decide which ones to retain. We propose an access primitive DivGetBatch() that replaces repeated pairwise comparisons of diversity scores of records by pairwise comparisons of "aggregate" diversity scores of a group of records, thereby improving the running time of these algorithms while preserving the same results. We integrate the access primitive inside three representative diversity algorithms and prove that the augmented algorithms leveraging the access primitive preserve original results. We analyze the worst and expected case running times of these algorithms. We propose a computational framework to design this access primitive that has a pre-computed index structure I-tree that is agnostic to the specific details of diversity algorithms. We develop principled solutions to construct and maintain I-tree. Our experiments on multiple large real-world datasets corroborate our theoretical findings, while ensuring up to a 24x speedup.

关键词： Diversification top-k algorithms Query processing

来源：评论

学校读者我要写书评

暂无评论

Fast Best-Effort Search on Graphs with Multiple Attributes

引用

IEEE TRANSACTIONS ON kNOWLEDGE AND DATA ENGINEERING 2015年第3期27卷 755-768页

作者： Roy, Senjuti Basu Eliassi-Rad, Tina Papadimitriou, Spiros Univ Washington Inst Technol Tacoma WA 98402 USA Rutgers State Univ Dept Comp Sci Piscataway NJ 08854 USA Rutgers State Univ Dept Business Piscataway NJ 08854 USA

We address the problem of search on graphs with multiple nodal attributes. We call such graphs weighted attribute graphs (WAGs). Nodes of a WAG exhibit multiple attributes with varying, non-negative weights. WAGs are ubiquitous in real-world applications. For example, in a co-authorship WAG, each author is a node;each attribute corresponds to a particular topic (e.g., databases, data mining, and machine learning);and the amount of expertise in a particular topic is represented by a non-negative weight on that attribute. A typical search in this setting specifies both connectivity between nodes and constraints on weights of nodal attributes. For example, a user's search may be: find three coauthors (i.e., a triangle) where each author's expertise is greater than 50 percent in at least one topic area (i.e., attribute). We propose a ranking function which unifies ranking between the graph structure and attribute weights of nodes. We prove that the problem of retrieving the optimal answer for graph search on WAGs is NP-complete. Moreover, we propose a fast and effective top-k graph search algorithm for WAGs. In an extensive experimental study with multiple real-world graphs, our proposed algorithm exhibits significant speed-up over competing approaches. On average, our proposed method is more than 7x faster in query processing than the best competitor.

关键词： Weighted attribute graph graph search top-k algorithms

来源：评论

学校读者我要写书评

暂无评论

top-k algorithms and Applications

Top-k Algorithms and Applications

引用

14th International Conference on Database Systems for Advanced Applications

作者： Das, Gautam Univ Texas Arlington Dept Comp Sci & Engn Arlington TX 75063 USA

ISBN: (纸本)9783642008863

In recent years, there has been a great deal of interest in developing effective techniques for ad-hoc search and retrieval in relational databases, document and multimedia databases, scientific information systems, and so on. A popular paradigm for tackling this problem is top-k querying, i.e., the rankinf of the results and returning the k results with the highest scores. Numerous variants of the top-k retrieval problem and several algorithms have been introduced in recent years. In this tutorial we shall discuss the top-k problem in detail, especially the fundamental algorithms such as FA and TA, important variants such as algorithms operating under restricted sorted/random access, deterministic and probabilistic approximations, as well as distributed and streaming top-k computations. A significant portion of the tutorial will be focused on applications of these top-k algorithms, especially in the context of the Web services and online databases, multimedia, documents and relational databases.

关键词： relational databases top-k algorithms search engines keyword queries

来源：评论

学校读者我要写书评

暂无评论

MULTI-CRITERIA SEARCH ALGORITHM: AN EFFICIENT APPROXIMATE k-NN ALGORITHM FOR IMAGE RETRIEVAL

MULTI-CRITERIA SEARCH ALGORITHM: AN EFFICIENT APPROXIMATE K-...

引用

20th IEEE International Conference on Image Processing (ICIP)

作者： Badr, Mehdi Vodislav, Dan Picard, David Yin, Shaoyi Gosselin, Philippe-Henri Univ Cergy Pontoise ENSEA CNRS Cergy Pontoise France INRIA Rennes TexMex Team Rennes France

ISBN: (纸本)9781479923410

We propose a new method for approximate k-NN search in large scale image databases, based on top-k multi-criteria search techniques. The method defines a simple index structure based on sorted lists, which provides a good compromise between fast retrieval, storage requirements and update cost. The search algorithm delivers approximate results with guarantees about false negatives, with fast emergence of good approximations, monotonically improved and leading if necessary to an exact result. Experiments with the on-disk implementation show that our method produces very good approximate results several times faster than the Baseline method.

关键词： CBIR k-NN search image databases multi-criteria search top-k algorithms

来源：评论

学校读者我要写书评

暂无评论

MULTI-CRITERIA SEARCH ALGORITHM: AN EFFICIENT APPROXIMATE k-NN ALGORITHM FOR IMAGE RETRIEVAL

MULTI-CRITERIA SEARCH ALGORITHM: AN EFFICIENT APPROXIMATE K-...

引用

IEEE International Conference on Image Processing

作者： Mehdi Badr Dan Vodislav David Picard Shaoyi Yin Philippe-Henri Gosselin ETIS University of Cergy-Pontoise ENSEA CNRS France TexMex team INRIA Rennes France

ISBN: (纸本)9781479923427

关键词： CBIR k-NN search image databases multi-criteria search top-k algorithms visual databases content-based image retrieval SEARCHING ALGORITHM image retrieval retrieval algorithms False Negative nearest neighbor algorithm images Approximate

来源：评论

学校读者我要写书评

暂无评论

Finding top-k elements in a time-sliding window

引用

EVOLVING SYSTEMS 2011年第1期2卷 51-70页

作者： Homem, Nuno Carvalho, Joao Paulo Univ Tecn Lisboa Inst Super Tecn INESC ID R Alves Redol 9 P-1000029 Lisbon Portugal

Identifying the top-k most frequent elements is one of the many problems associated with data streams analysis. It is a well-known and difficult problem, especially if the analysis is to be performed and maintained up to date in near real time. Analyzing data streams in time sliding window model is of particular interest as only the most recent, more relevant events are considered. Approximate answers are usually adequate when dealing with this problem. This paper presents a new and innovative algorithm, the Filtered Space-Saving with Sliding Window Algorithm (FSW) that addresses this problem by introducing in the Filtered Space Saving ( FSS) algorithm an approximated time sliding window counter. The algorithm provides the top-k list of elements, their frequency and an error estimate for each frequency value within the sliding window. It provides strong guarantees on the results, depending on the elements real frequencies. Experimental results detail performance on real life cases.

关键词： Approximate algorithms top-k algorithms Most frequent Estimation Data stream frequencies Sliding window

来源：评论

学校读者我要写书评

暂无评论

Finding top-k elements in data streams

引用

INFORMATION SCIENCES 2010年第24期180卷 4958-4974页

作者： Homem, Nuno Carvalho, Joao Paulo INESC ID TULisbon Inst Super Tecn P-1000029 Lisbon Portugal

Identifying the most frequent elements in a data stream is a well known and difficult problem. Identifying the most frequent elements for each individual, especially in very large populations, is even harder. The use of fast and small memory footprint algorithms is paramount when the number of individuals is very large. In many situations such analysis needs to be performed and kept up to date in near real time. Fortunately, approximate answers are usually adequate when dealing with this problem. This paper presents a new and innovative algorithm that addresses this problem by merging the commonly used counter-based and sketch-based techniques for top-k identification. The algorithm provides the top-k list of elements, their frequency and an error estimate for each frequency value. It also provides strong guarantees on the error estimate, order of elements and inclusion of elements in the list depending on their real frequency. Additionally the algorithm provides stochastic bounds on the error and expected error estimates. Telecommunications customer's behavior and voice call data is used to present concrete results obtained with this algorithm and to illustrate improvements over previously existing algorithms. (C) 2010 Elsevier Inc. All rights reserved.

关键词： Approximate algorithms top-k algorithms Most frequent Estimation Data stream frequencies

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：