The purpose of this special issue of algorithms was to attract papers presenting original research in the area of algorithm engineering. In particular, submissions concerning the design, analysis, implementation, tuni...
详细信息
The purpose of this special issue of algorithms was to attract papers presenting original research in the area of algorithm engineering. In particular, submissions concerning the design, analysis, implementation, tuning, and experimental evaluation of discrete algorithms and data structures, and/or addressing methodological issues and standards in algorithmic experimentation were encouraged. Papers dealing with advanced models of computing, including memory hierarchies, cloud architectures, and parallel processing were also welcome. In this regard, we solicited contributions from all most prominent areas of applied algorithmic research, which include but are not limited to graphs, databases, computational geometry, big data, networking, combinatorial aspects of scientific computing, and computational problems in the natural sciences or engineering.
Partitioning the vertices of a (hyper)graph into k roughly balanced blocks such that few (hyper)edges run between blocks is a key problem for large-scale distributed processing. A current trend for partitioning huge (...
详细信息
Partitioning the vertices of a (hyper)graph into k roughly balanced blocks such that few (hyper)edges run between blocks is a key problem for large-scale distributed processing. A current trend for partitioning huge (hyper)graphs using low computational resources are streaming algorithms. In this work, we propose FREIGHT: a Fast stREamInG Hypergraph parTitioning algorithm which is an adaptation of the widely-known graph-based algorithm Fennel. By using an efficient data structure, we make the overall running of FREIGHT linearly dependent on the pin-count of the hypergraph and the memory consumption linearly dependent on the numbers of nets and blocks. The results of our extensive experimentation showcase the promising performance of FREIGHT as a highly efficient and effective solution for streaming hypergraph partitioning. Our algorithm demonstrates competitive running time with the Hashing algorithm, with a geometric mean runtime within a factor of four compared to the Hashing algorithm. Significantly, our findings highlight the superiority of FREIGHT over all existing (buffered) streaming algorithms and even the in-memory algorithm HYPE, with respect to both cut-net and connectivity measures. This indicates that our proposed algorithm is a promising hypergraph partitioning tool to tackle the challenge posed by large-scale and dynamic data processing.
We study the problem of engineering space-time efficient data structures that support membership and rank queries on very large static dictionaries of strings. Our solution is based on a very simple approach that deco...
详细信息
We study the problem of engineering space-time efficient data structures that support membership and rank queries on very large static dictionaries of strings. Our solution is based on a very simple approach that decouples string storage and string indexing by means of a block-wise compression of the sorted dictionary strings (to be stored in external memory) and a succinct implementation of a Patricia trie (to be stored in internal memory) built on the first string of each block. On top of this, we design an in-memory cache that, given a sample of the query workload, augments the Patricia trie with additional information to reduce the number of I/Os of future queries. Our experimental evaluation on two new datasets, which are at least one order of magnitude larger than the ones used in the literature, shows that (i) the state-of-the-art compressed string dictionaries, compared to Patricia tries, do not provide significant benefits when used in a large-scale indexing setting, and (ii) our two-level approach enables the indexing and storage of 3.5 billion strings taking 273 GB in just less than 200 MB of internal memory and 83 GB of compressed disk space, while still guaranteeing comparable or faster query performance than those offered by array-based solutions used in modern storage systems, such as RocksDB, thus possibly influencing their future design.
Lua (Ierusalimschy et al., 1996) is a well-known scripting language, popular among many programmers, most notably in the gaming industry. Remarkably, the only data-structuring mechanism in Lua is given by associative ...
详细信息
Lua (Ierusalimschy et al., 1996) is a well-known scripting language, popular among many programmers, most notably in the gaming industry. Remarkably, the only data-structuring mechanism in Lua is given by associative arrays, called tables. With Lua 5.0, the reference implementation of Lua introduced hybrid tables to implement tables using both a hashmap and a dynamically growing array combined together: the values associated with integer keys are stored in the array part, when suitable, everything else is stored in the hashmap. All this is transparent to the user, who gets a unique simple interface to handle tables. In this paper we carry out a theoretical analysis of the performance of Lua's tables, by considering various worst-case and probabilistic scenarios. In particular, we uncover some problematic situations for the simple probabilistic model where we add a new key with some fixed probability p> (1)/2 and delete a key with probability 1-p: the cost of performing T such operations is proved to be Q(T log T) with high probability, where linear complexity would be assumed for such data structure. If there is no deletion, we prove that the tables behave better overall. In particular, we establish that inserting the T integers from 1 to T in a random order is done in an essentially linear time.
We study the problem of engineering space-time efficient indexes that support membership and lexicographic (rank) queries on very large static dictionaries of strings. Our solution is based on a very simple approach t...
详细信息
ISBN:
(数字)9783031439803
ISBN:
(纸本)9783031439797;9783031439803
We study the problem of engineering space-time efficient indexes that support membership and lexicographic (rank) queries on very large static dictionaries of strings. Our solution is based on a very simple approach that consists of decoupling string storage and string indexing by means of a blockwise compression of the sorted dictionary strings (to be stored in external memory) and a succinct implementation of a Patricia trie (to be stored in internal memory) built on the first string of each block. Our experimental evaluation on two new datasets, which are at least one order of magnitude larger than the ones used in the literature, shows that (i) the state-of-the-art compressed string dictionaries (such as FST, PDT, CoCo-trie) do not provide significant benefits if used in an indexing setting compared to Patricia tries, and (ii) our two-level approach enables the indexing of 3.5 billion strings taking 273 GB in less than 200 MB of internal memory, which is available on any commodity machine, while still guaranteeing comparable or faster query performance than those offered by array-based solutions used in modern storage systems, such as RocksDB, thus possibly influencing their future designs.
Mining graphs, upon query, for k shortest paths between vertex pairs is a prominent primitive to support several analytics tasks on complex networked datasets. The state-of-the-art method to implement this primitive i...
详细信息
ISBN:
(纸本)9783031785405;9783031785412
Mining graphs, upon query, for k shortest paths between vertex pairs is a prominent primitive to support several analytics tasks on complex networked datasets. The state-of-the-art method to implement this primitive is KPLL, a framework that provides very fast query answering, even for large inputs and volumes of queries, by pre-computing and exploiting an appropriate index of the graph. However, if the graph's topology undergoes changes over time, such index might become obsolete and thus yield incorrect query results. Re-building the index from scratch, upon every modification, induces unsustainable time overheads, incompatible with applications using k shortest paths for analytics purposes. Motivated by this limitation, in this paper, we introduce DECKPLL, the first dynamic algorithm to maintain a KPLL index under decremental modifications. We assess the effectiveness and scalability of our algorithm through extensive experimentation and show it updates KPLL indices orders of magnitude faster than the re-computation from scratch, while preserving its compactness and query performance. We also combine DECKPLL with INCKPLL, the only known dynamic algorithm to maintain a KPLL index under incremental modifications, and hence showcase, on real-world datasets, the first method to support fast extraction of k shortest paths from graphs that evolve by arbitrary topological changes.
Fast extraction of top- k distances from graph data is a primitive of paramount importance in the fields of data mining, network analytics and machine learning, where ranked distances are exploited for several purpose...
详细信息
Fast extraction of top- k distances from graph data is a primitive of paramount importance in the fields of data mining, network analytics and machine learning, where ranked distances are exploited for several purposes (e.g., link prediction or network classification). While investigation on computational methods to address this retrieval task for regularly sized, static inputs has been extensive, much less is known when managed graphs are massive, i.e., having millions of vertices/edges, and time-evolving, i.e., when their structure can grow over time, a scenario that introduces a number of scalability and effectiveness issues otherwise not arising. Since, nowadays, most real-world applications exploiting top- k distances have to handle inherently dynamic and rapidly growing graphs, in this paper we present the first dynamic indexing scheme that supports very fast queries on top- k distances when graphs are massive and incrementally time-evolving. We assess the scalability and effectiveness of our method through extensive experimentation on both real-world and artificial graph datasets.
We give an overview of the 2022 Computational Geometry Challenge targeting the problem Minimum Partition into Plane Subsets, which consists of partitioning a given set of line segments into a minimum number of non-cro...
详细信息
We give an overview of the 2022 Computational Geometry Challenge targeting the problem Minimum Partition into Plane Subsets, which consists of partitioning a given set of line segments into a minimum number of non-crossing subsets.
Tournament selection is a popular parent selection mechanism in evolutionary algorithms. Bian and Qian (PPSN 2022) proved that choosing the tournament size uniformly at random, called stochastic tournament selection, ...
详细信息
ISBN:
(纸本)9783031700705;9783031700712
Tournament selection is a popular parent selection mechanism in evolutionary algorithms. Bian and Qian (PPSN 2022) proved that choosing the tournament size uniformly at random, called stochastic tournament selection, in combination with crossover significantly improves the performance of NSGA-II on some benchmark functions. We show that this selection mechanism is asymptotically equivalent to the power-law ranking selection proposed in Covantes Osuna et al. (Theor. Comput. Sci. 832, 2020) with the exponent of 2. Thus asymptotic runtime bounds proven for one operator also hold when one operator is replaced with the other. We also investigate how to implement these operators efficiently for NSGA-II on the problems considered in the previous papers. We propose to implement the stochastic tournament with a pre-computed selection distribution to save on random numbers. Experiments on high dimensional problems demonstrate the superiority of this method compared to the standard implementation. Overall, the power-law ranking selection is the most efficient selection mechanism for the studied problems. Remarkably, we also find that the way ties are broken between equally fit solutions can make the difference between the best and the worst approach, especially when crossover is involved.
We present a thorough investigation of the All Nearest Smaller Values (ANSV) problem from a practical perspective. The ANSV problem is defined as follows: given an array A consisting of n values, for each entry A(i) c...
详细信息
ISBN:
(纸本)9798400704161
We present a thorough investigation of the All Nearest Smaller Values (ANSV) problem from a practical perspective. The ANSV problem is defined as follows: given an array A consisting of n values, for each entry A(i) compute the largest index l < i and the smallest index r > i such that A(i) > A(l) and A(i) > A(r), i.e., the indices of the nearest smaller values to the left and to the right of A(i). The ANSV problem was solved by Berkman, Schieber, and Vishkin [J. algorithms, 1993] in the PRAM model. Their solution in the CREW PRAM model, which we will refer to as the BSV algorithm, achieves optimal O(n) work and O(log n) span. Until now, the BSV algorithm has been perceived as too complicated for practical use, and we are not aware of any publicly available implementations. Instead, the best existing practical solution to the ANSV problem is the implementation by Shun and Zhao presented at DCC'13. They implemented a simpler O(n log n)-work algorithm with an additional heuristic first proposed by Blelloch and Shun at ALENEX'11. We refer to this implementation as the BSZ algorithm. In this paper, we implement the original BSV algorithm and demonstrate its practical efficiency. Despite its perceived complexity, our results show that its performance is comparable to the BSZ algorithm. We also present the first theoretical analysis of the heuristic implemented in the BSZ algorithm and show that it provides a tunable trade-off between optimal work and optimal span. In particular, we show that it achieves O(n(1 + log n/k)) work and O(k(1 + log n/k)) span, for any integer parameter 1 <= k <= n. Thus, for k = Theta(log n), the BSZ algorithm can be made to be work-optimal, albeit at the expense of increased span compared to BSV. Our discussion includes a detailed examination of different input types, particularly highlighting that for random inputs, the low expected distance between values and their nearest smaller values renders simple algorithms efficient. Finally, we analy
暂无评论