Binary partition hierarchies (BPH) and minimum spanning trees are essential data structures for hierarchical analysis, such as quasi-flat zones and watershed segmentation. Traditional BPH construction algorithms are l...
详细信息
Binary partition hierarchies (BPH) and minimum spanning trees are essential data structures for hierarchical analysis, such as quasi-flat zones and watershed segmentation. Traditional BPH construction algorithms are limited by their requirement to load the data entirely into memory, making them impractical for processing large images whose processing exceeds the capacity of the computer's main memory. To overcome this limitation, an algebraic framework was introduced, enabling the out-of-core computation of BPH leveraging three key operations: select, join, and insert. In this publication, we present two distinct calculi based on these operations: one designed for general spatial partitions and another optimized for causal partitioning. The second calculus is specifically tailored to meet out-of-core constraints, ensuring efficient processing of large-scale data. We provide detailed algorithms, including pseudo-code and complexity analysis, and conduct experimental comparisons between the two approaches.
The Burrows-Wheeler transform (BWT) and the suffix array (SA) of an input string are important data structures widely used in modern bioinformatics researches such as full-text search, alignment etc. In this paper, we...
详细信息
ISBN:
(纸本)9789811064425;9789811064418
The Burrows-Wheeler transform (BWT) and the suffix array (SA) of an input string are important data structures widely used in modern bioinformatics researches such as full-text search, alignment etc. In this paper, we present a lightweight external memory algorithm for computing the BWT from a given suffix array and the input string. The algorithm has a linear I/O complexity O(n) and a workspace of at most n/2 integers. An experiment study is conducted to evaluate the time and space performance of the proposed algorithm on a number of realistic datasets. The experimental results are consistent with the theoretical complexities of the algorithm.
Recently there has been significant development and innovation in both frontiers of Heterogeneous memory and Heterogeneous Compute domains. This paper summarizes the challenges, surveys related work, and proposes poss...
详细信息
ISBN:
(纸本)9781450343053
Recently there has been significant development and innovation in both frontiers of Heterogeneous memory and Heterogeneous Compute domains. This paper summarizes the challenges, surveys related work, and proposes possible research directions to exploit both heterogeneous memory and compute resources in a computer system. We focus our discussion on the memory system and also touch issues related to heterogeneous compute.
The problem of finding the minimum cut of an undirected unweighted graph is studied on the externalmemory model. First, a lower bound of Omega((E/V)Sort(V)) on the number of I/Os is shown for the problem, where V is ...
详细信息
The problem of finding the minimum cut of an undirected unweighted graph is studied on the externalmemory model. First, a lower bound of Omega((E/V)Sort(V)) on the number of I/Os is shown for the problem, where V is the number of vertices and E is the number of edges. Then the following are presented, (1) a deterministic minimum cut algorithm that uses O(min{c(7) (log(3+is an element of) E)(MST(V, E)), c(log E)(MST(V, E)) + (V /root M) Sort(V)}) I/Os;here is an element of > 0 is a small constant, MST(V, E) is the number of I/Os needed to compute a minimum spanning tree of the graph, and c is the value of the minimum cut. The algorithm performs better on dense graphs than the algorithm of [1], which requires O(E + c(2)V log(V/c)) I/Os, when executed on the externalmemory model. For a delta-fat graph (for delta > 0, the maximum tree packing of the graph is at least (I + delta)c/2), our algorithm computes a minimum cut in O(c(log E)MST(V, E)) I/Os. (2) A randomized algorithm that computes minimum cut with high probability in O(c(log E).MST(V, E) Sort(E) log(2) V + V/B Sort(V) log V) I/Os. (3) A (2 + is an element of)-minimum cut algorithm that requires O((E/V) MST(V, E)) I/Os. (C) 2014 Elsevier B.V. All rights reserved.
Maxima-finding problems are the fundamental problem in computational geometry with a great deal of application in many areas, and it has resurfaced with the advent of Skyline Queries for relational databases and data ...
详细信息
ISBN:
(纸本)9780769535579
Maxima-finding problems are the fundamental problem in computational geometry with a great deal of application in many areas, and it has resurfaced with the advent of Skyline Queries for relational databases and data mining recently. The existed algorithms in the field of Maxima-finding problem (Skyline Queries) have been summarized in this paper. But for the massive data sets, there has no I/O linear algorithm yet A new kind of external memory algorithm of Maxima-finding problem (EMMF) has been presented, the I/O complexity of algorithm is linear, the corresponding reliability has been validated from experiments, and the status of 2-dimensional space has been proved in theory too.
In this paper, we present, to our knowledge, the first known I/O efficient solutions for computing the k-bisimulation partition of a massive directed graph, and performing maintenance of such a partition upon updates ...
详细信息
ISBN:
(纸本)9781450322638
In this paper, we present, to our knowledge, the first known I/O efficient solutions for computing the k-bisimulation partition of a massive directed graph, and performing maintenance of such a partition upon updates to the underlying graph. Ubiquitous in the theory and application of graph data, bisimulation is a robust notion of node equivalence which intuitively groups together nodes in a graph which share fundamental structural features. k-bisimulation is the standard variant of bisimulation where the topological features of nodes are only considered within a local neighborhood of radius k >= 0. The I/O cost of our partition construction algorithm is bounded by O(k . sort(vertical bar E-t vertical bar) + k . scan(vertical bar N-t vertical bar) + sort(vertical bar N-t vertical bar)), while our maintenance algorithms are bounded by O(k . sort(vertical bar E-t vertical bar) + k . sort vertical bar N-t vertical bar)). The space complexity bounds are O(vertical bar N-t vertical bar + vertical bar E-t vertical bar) and O(k . vertical bar N-t vertical bar + k vertical bar E-t vertical bar) resp. Here, vertical bar E-t vertical bar and vertical bar N-t vertical bar are the number of disk pages occupied by the input graph's edge set and node set, resp., and sort(n) and scan(n) are the cost of sorting and scanning, resp., a file occupying n pages in externalmemory. Empirical analysis on a variety of massive real-world and synthetic graph datasets shows that our algorithms perform efficiently in practice, scaling gracefully as graphs grow in size.
In this paper we present external memory algorithms for some string problems. external memory algorithms have been developed in many research areas, as the speed gap between fast internal memory and slow external memo...
详细信息
In this paper we present external memory algorithms for some string problems. external memory algorithms have been developed in many research areas, as the speed gap between fast internal memory and slow externalmemory continues to grow. The goal of external memory algorithms is to minimize the number of input/output operations between internal memory and externalmemory. These years the sizes of strings such as DNA sequences are rapidly increasing. However, external memory algorithms have been developed for only a few string problems. In this paper we consider five string problems and present external memory algorithms for them. They are the problems of finding the maximum suffix, string matching, period finding, Lyndon decomposition, and finding the minimum of a circular string. Every algorithm that we present here runs in a linear number of I/Os in the externalmemory model with one disk, and they run in an optimal number of disk I/Os in the externalmemory model with multiple disks.
The verification of VLSI layouts is an important and expensive step in physical design process and has significant contribution in overall design cycle time. Design rule checking, connectivity extraction and device ex...
详细信息
The verification of VLSI layouts is an important and expensive step in physical design process and has significant contribution in overall design cycle time. Design rule checking, connectivity extraction and device extraction are important steps in layout analysis. Efficient incremental algorithms for these steps are crucial for fast development as well as small time-to-market of the design. If the size of layout is so large that it cannot fit entirely into available main memory, the main performance bottleneck is communication between internal memory and the externalmemory due to the slow access speed of externalmemory.
We consider the simple problem of scanning multiple sequences. There are k sequences of total length N which are to be scanned concurrently. One pointer into each sequence is maintained and an adversary specifies whic...
详细信息
We consider the simple problem of scanning multiple sequences. There are k sequences of total length N which are to be scanned concurrently. One pointer into each sequence is maintained and an adversary specifies which pointer is to be advanced. The concept of scanning multiple sequence is ubiquitous in algorithms designed for hierarchical memory. In the externalmemory model of computation with block size B, a memory consisting of m blocks, and at most in sequences the problem is trivially solved with N/B memory misses by reserving one block of memory for each sequence. However, in a cache memory with associativity a, every access may lead to a cache fault if k > a. For a direct mapped cache (a = 1) two sequences suffice. We show that by randomizing the starting addresses of the sequences the number of cache misses can be kept to O (N/B) provided that k = O (m/B-l/u), i.e., the number of sequences that can be supported is decreased by a factor B-l/a. We also show a corresponding lower bound. Our result leads to a general method for converting sequence based algorithms designed for the externalmemory model of computation to cache memory even for caches with small associativity.
暂无评论