检索结果-内蒙古大学图书馆

3rd International Conference on Advanced Computing and Intelligent Engineering (ICACIE)

作者： Pandey, Sriniwas Samal, Mamata Mohanty, Sraban Kumar PDPM Indian Inst Informat Technol Design & Mfg Comp Sci & Engn Jabalpur 482005 Madhya Pradesh India

ISBN: (纸本)9789811510816;9789811510809

Clustering is a technique to partition data into different groups in such a way that data items in a group are more similar to each other than the data points in any other group. The assumption of infinite main memory is very usual while designing most of the clustering algorithms but this assumption fails when the size of data set starts increasing. In this scenario, data needs to be stored in the secondary memory and time spent in the input/outputs (I/O) dominates the actual computational time. Therefore by reducing the I/O, the efficiency of the clustering techniques can be improved. In this paper, one shared near neighbor based algorithm is devised by minimizing its I/O complexity to make it suitable for the Big Data in external memory model proposed by Aggarwal and Vitter. There is no change in the computational steps, hence cluster quality remains the same. We implement the algorithm in the STXXL library to show its efficacy for Big Data sets.

关键词： Clustering SNN clustering external memory algorithms Big data clustering

来源：评论

学校读者我要写书评

暂无评论

A Unified Approach for Computing Top-k Pairs in Multidimensional Space

A Unified Approach for Computing Top-k Pairs in Multidimensi...

引用

IEEE 27th International Conference on Data Engineering (ICDE 2011)

作者： Cheema, Muhammad Aamir Lin, Xuemin Wang, Haixun Wang, Jianmin Zhang, Wenjie Univ New South Wales Sydney NSW 2052 Australia Microsoft Res Asia Beijing Peoples R China Tsinghua Univ Tsinghu Natl Lab Informat Sci & Technol Sch Software Beijing Peoples R China

ISBN: (纸本)9781424489589

Top-k pairs queries have many real applications. k closest pairs queries, k furthest pairs queries and their bichromatic variants are some of the examples of the top-k pairs queries that rank the pairs on distance functions. While these queries have received significant research attention, there does not exist a unified approach that can efficiently answer all these queries. Moreover, there is no existing work that supports top-k pairs queries based on generic scoring functions. In this paper, we present a unified approach that supports a broad class of top-k pairs queries including the queries mentioned above. Our proposed approach allows the users to define a local scoring function for each attribute involved in the query and a global scoring function that computes the final score of each pair by combining its scores on different attributes. We propose efficient internal and external memory algorithms and our theoretical analysis shows that the expected performance of the algorithms is optimal when two or less attributes are involved. Our approach does not require any pre-built indexes, is easy to implement and has low memory requirement. We conduct extensive experiments to demonstrate the efficiency of our proposed approach.

关键词： Algorithm design and analysis Color Complexity theory Image color analysis Indexes memory management Spatial databases bichromatic variants distance functions external memory algorithms generic scoring functions multidimensional space query processing set theory top-k pair queries unified approach visual databases

来源：评论

学校读者我要写书评

暂无评论

external memory BWT and LCP computation for sequence collections with applications

引用

algorithms FOR MOLECULAR BIOLOGY 2019年第1期14卷 6-6页

作者： Egidi, Lavinia Louza, Felipe A. Manzini, Giovanni Telles, Guilherme P. Univ Piemonte Orientale DiSIT Viale Michel 11 I-15121 Alessandria Italy Univ Sao Paulo Dept Comp & Math Av Bandeirantes 3900 BR-14040901 Ribeirao Preto Brazil CNR IIT Via Moruzzi 1 I-56124 Pisa Italy Univ Estadual Campinas Inst Comp Av Albert Einstein 1251 BR-13083852 Campinas SP Brazil

BackgroundSequencing technologies produce larger and larger collections of biosequences that have to be stored in compressed indices supporting fast search operations. Many compressed indices are based on the Burrows-Wheeler Transform (BWT) and the longest common prefix (LCP) array. Because of the sheer size of the input it is important to build these data structures in external memory and time using in the best possible way the available *** propose a space-efficient algorithm to compute the BWT and LCP array for a collection of sequences in the external or semi-external memory setting. Our algorithm splits the input collection into subcollections sufficiently small that it can compute their BWT in RAM using an optimal linear time algorithm. Next, it merges the partial BWTs in external or semi-external memory and in the process it also computes the LCP values. Our algorithm can be modified to output two additional arrays that, combined with the BWT and LCP array, provide simple, scan-based, external memory algorithms for three well known problems in bioinformatics: the computation of maximal repeats, the all pairs suffix-prefix overlaps, and the construction of succinct de Bruijn *** prove that our algorithm performs O(nmaxlcp) sequential I/Os, where n is the total length of the collection and maxlcp is the maximum LCP value. The experimental results show that our algorithm is only slightly slower than the state of the art for short sequences but it is up to 40 times faster for longer sequences or when the available RAM is at least equal to the size of the input.

关键词： Burrows-Wheeler Transform Longest common prefix array Maximal repeats All pairs suffix-prefix overlaps Succinct de Bruijn graph external memory algorithms

来源：评论

学校读者我要写书评

暂无评论

An I/O Efficient Algorithm for Minimum Spanning Trees 9th

An I/O Efficient Algorithm for Minimum Spanning Trees

引用

9th Annual International Conference on Combinatorial Optimization and Applications (COCOA)

作者： Bhushan, Alka Sajith, Gopalan Indian Inst Technol Guwahati Dept Comp Sci & Engn Gauhati 781039 India Indian Inst Technol Dept Comp Sci & Engn GISE Lab Bombay 400076 Maharashtra India

ISBN: (纸本)9783319266268;9783319266251

An O(Sort(E).log log(E/V) B) I/Os algorithm for computing a minimum spanning tree of a graph G = (V, E) is presented, where Sort(E) = (E/B) log(M/B)(E/B), M is the main memory size, and B is the block size. This improves on the previous bound of O(Sort(E).log log(VB/E)) I/Os by Arge et al. for all values of V, E and B, for which I/O optimality is still open. In particular, our algorithm matches the lowerbound Omega(E/***(V)), when E/V >= B-epsilon for a constant epsilon > 0, an O(log log B) factor improvement over the algorithm of Arge et al. Our algorithm can compute the connected components too, for the same number of I/Os, which is an improvement on the best known upper bound.

关键词： external memory algorithms Minimum spanning trees Graph algorithms

来源：评论

学校读者我要写书评

暂无评论

Spatial join techniques

引用

ACM TRANSACTIONS ON DATABASE SYSTEMS 2007年第1期32卷 7.1-7.44页

作者： Jacox, Edwin H. Samet, Hanan Univ Maryland Dept Comp Sci Ctr Automat Res Inst Adv Comp Studies College Pk MD 20742 USA

A variety of techniques for performing a spatial join are reviewed. Instead of just summarizing the literature and presenting each technique in its entirety, distinct components of the different techniques are described and each is decomposed into an overall framework for performing a spatial join. A typical spatial join technique consists of the following components: partitioning the data, performing internal-memory spatial joins on subsets of the data, and checking if the full polygons intersect. Each technique is decomposed into these components and each component addressed in a separate section so as to compare and contrast similar aspects of each technique. The goal of this survey is to describe the algorithms within each component in detail, comparing and contrasting competing methods, thereby enabling further analysis and experimentation with each component and allowing the best algorithms for a particular situation to be built piecemeal, or, even better, enabling an optimizer to choose which algorithms to use.

关键词： algorithms design external memory algorithms plane-sweep spatial join

来源：评论

学校读者我要写书评

暂无评论

I/O-Efficient Generation of Massive Graphs Following the LFR Benchmark

引用

ACM Journal of Experimental Algorithmics 2018年第PP1–33期23卷 1–33页

作者： Michael Hamann Ulrich Meyer Manuel Penschuck Hung Tran Dorothea Wagner Karlsruhe Institute of Technology Karlsruhe Germany Goethe University Frankfurt Frankfurt Germany

LFR is a popular benchmark graph generator used to evaluate community detection algorithms. We present EM-LFR, the first external memory algorithm able to generate massive complex networks following the LFR benchmark. Its most expensive component is the generation of random graphs with prescribed degree sequences which can be divided into two steps: the graphs are first materialized deterministically using the Havel-Hakimi algorithm, and then randomized. Our main contributions are EM-HH and EM-ES, two I/O-efficient external memory algorithms for these two steps. We also propose EM-CM/ES, an alternative sampling scheme using the Configuration Model and rewiring steps to obtain a random simple graph. In an experimental evaluation, we demonstrate their performance; our implementation is able to handle graphs with more than 37 billion edges on a single machine, is competitive with a massively parallel distributed algorithm, and is faster than a state-of-the-art internal memory implementation even on instances fitting in main memory. EM-LFR’s implementation is capable of generating large graph instances orders of magnitude faster than the original implementation. We give evidence that both implementations yield graphs with matching properties by applying clustering algorithms to generated instances. Similarly, we analyze the evolution of graph properties as EM-ES is executed on networks obtained with EM-CM/ES and find that the alternative approach can accelerate the sampling process.

关键词： LFR benchmark community detection complex network external memory algorithms random graph generator

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：