The cover tree is the canonical data structure that efficiently maintains a dynamic set of points on a metric space and supports nearest and k-nearest neighbor searches. For most real-world datasets with reasonable di...
详细信息
ISBN:
(纸本)9781450391467
The cover tree is the canonical data structure that efficiently maintains a dynamic set of points on a metric space and supports nearest and k-nearest neighbor searches. For most real-world datasets with reasonable distributions (constant expansion rate and bounded aspect ratio mathematically), single-point insertion, single-point deletion, and nearest neighbor search (NNS) only cost logarithmically to the size of the point set. Unfortunately, due to the complication and the use of depth-first traversal order in the cover tree algorithms, we were unaware of any parallel approaches for these cover tree algorithms. This paper shows highly parallel and work-efficient cover tree algorithms that can handle batch insertions (and thus construction) and batch deletions. Assuming constant expansion rate and bounded aspect ratio, inserting or deleting m points into a cover tree with n points takes O(m log n) expected work and polylogarithmic span with high probability. Our algorithms rely on some novel algorithmic insights. We model the insertion and deletion process as a graph and use a maximal independent set (MIS) to generate tree nodes without conflicts. We use three key ideas to guarantee work-efficiency: the prefix-doubling scheme, a careful design to limit the graph size on which we apply MIS, and a strategy to propagate information among different levels in the cover tree. We also use path-copying to make our parallel cover tree a persistent data structure, which is useful in several applications. Using our parallel cover trees, we show work-efficient (or near-work-efficient) and highly parallel solutions for a list of problems in computational geometry and machine learning, including Euclidean minimum spanning tree (EMST), single-linkage clustering, bichromatic closest pair (BCP), density-based clustering and its hierarchical version, and others. To the best of our knowledge, many of them are the first solutions to achieve work-efficiency and polylogarithmic span ass
The ubiquitous usage of social media and other internet platforms has led to the widespread proliferation of naturally occurring social networks. This has caused a revolution in marketing and information dissemination...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
The ubiquitous usage of social media and other internet platforms has led to the widespread proliferation of naturally occurring social networks. This has caused a revolution in marketing and information dissemination due to the rise of influencers, and their innate ability to broadcast information to large audiences. Influence maximization, a key problem in identifying optimal influential individuals, can be used to identify influencers in a social network, or can be superimposed on other settings which exhibit relational information. Popular methods, such as greedy algorithms, face scalability issues, performing poorly on larger networks, and incurring a large memory overhead. To address these challenges, this paper introduces a novel methodology, RIMR, which leverages statistical modeling and parallel programming to circumvent the need for storing large collections of data, achieving scalability and efficiency. Experimental results demonstrate the effectiveness and scalability of the proposed approach compared to state-of-the-art methods, showcasing significant speedups. This paper provides insights into the development of the proposed statistical process, the heterogeneous algorithm implementation, and a prototype demonstrating its efficacy.
Genome Informatics (GI), executed over a multi-stage data analytics pipeline, has evolved to be an interdisciplinary approach to discover hidden structured information in the unstructured genomic raw data. As the mass...
详细信息
ISBN:
(纸本)9798350386813;9798350386820
Genome Informatics (GI), executed over a multi-stage data analytics pipeline, has evolved to be an interdisciplinary approach to discover hidden structured information in the unstructured genomic raw data. As the massive genomic data keeps growing exponentially, HPC platforms are increasingly sought after to handle the associated complex computations. GI is characterized by heavy data dependency that triggers irregular computing on the fly, thereby demanding heavy fine-grain synchronization. Modern computing architectures, built on subterranean memory hierarchies supporting a temporal and spatial mapping of data, would fail to support GI. In this paper, we present the irregular genomic computing capabilities of ReneGENE-GI, our novel GI pipeline, evaluated across scalable accelerator platforms, ranging from silicon clusters to Quantum Emulators. This pipeline exploits the intrinsic parallel performance of the hardware, coupled with dynamic data structures and scalable algorithms. Seamless data streaming techniques helps to keep heavy memory footmark at bay, while enabling run time analytics with minimum Input/Output bottlenecks. High precision computing ensures genetic reliability of the inferences.
We present a novel parallel algorithm for drawing balanced samples from large populations. When auxiliary variables about the population units are known, balanced sampling improves the quality of the estimations obtai...
详细信息
ISBN:
(纸本)9781577358763
We present a novel parallel algorithm for drawing balanced samples from large populations. When auxiliary variables about the population units are known, balanced sampling improves the quality of the estimations obtained from the sample. Available algorithms, e.g., the cube method, are inherently sequential, and do not scale to large populations. Our parallel algorithm is based on a variant of the cube method for stratified populations. It has the same sample quality as sequential algorithms, and almost ideal parallel speedup.
Contemporary accelerator designs exhibit a high degree of spatial localization, wherein two-dimensional physical distance determines communication costs between processing elements. This situation presents considerabl...
详细信息
ISBN:
(纸本)9798350387117;9798350387124
Contemporary accelerator designs exhibit a high degree of spatial localization, wherein two-dimensional physical distance determines communication costs between processing elements. This situation presents considerable algorithmic challenges, particularly when managing sparse data, a pivotal component in progressing data science. The spatial computer model quantifies communication locality by weighting processor communication costs by distance, introducing a term named energy. Moreover, it integrates depth, a widely-utilized metric, to promote high parallelism. We propose and analyze a framework for efficient spatial tree algorithms within the spatial computer model. Our primary method constructs a spatial tree layout that optimizes the locality of the neighbors in the compute grid. This approach thereby enables locality-optimized messaging within the tree. Our layout achieves a polynomial factor improvement in energy compared to utilizing a PRAM approach. Using this layout, we develop energy-efficient treefix sum and lowest common ancestor algorithms, which are both fundamental building blocks for other graph algorithms. With high probability, our algorithms exhibit near-linear energy and poly-logarithmic depth. Our contributions augment a growing body of work demonstrating that computations can have both high spatial locality and low depth. Moreover, our work constitutes an advancement in the spatial layout of irregular and sparse computations.
This study proposes a novel factorization method for the DCT IV algorithm that allows for breaking it into four or eight sections that can be run in parallel. Moreover, the arithmetic complexity has been significantly...
详细信息
This study proposes a novel factorization method for the DCT IV algorithm that allows for breaking it into four or eight sections that can be run in parallel. Moreover, the arithmetic complexity has been significantly reduced. Based on the proposed new algorithm for DCT IV, the speed performance has been improved substantially. The performance of this algorithm was verified using two different GPU systems produced by the NVIDIA company. The experimental results show that the novel proposed DCT algorithm achieves an impressive reduction in the total processing time. The proposed method is very efficient, improving the algorithm speed by more than 4-times-that was expected by segmenting the DCT algorithm into four sections running in parallel. The speed improvements are about five-times higher-at least 5.41 on Jetson AGX Xavier, and 10.11 on Jetson Orin Nano-if we compare with the classical implementation (based on a sequential approach) of DCT IV. Using a parallel formulation with eight sections running in parallel, the improvement in speed performance is even higher, at least 8.08-times on Jetson AGX Xavier and 11.81-times on Jetson Orin Nano.
Monte -Carlo Tree Search (MCTS) is an adaptive and heuristic tree -search algorithm designed to uncover sub -optimal actions at each decision -making point. This method progressively constructs a search tree by gather...
详细信息
Monte -Carlo Tree Search (MCTS) is an adaptive and heuristic tree -search algorithm designed to uncover sub -optimal actions at each decision -making point. This method progressively constructs a search tree by gathering samples throughout its execution. Predominantly applied within the realm of gaming, MCTS has exhibited exceptional achievements. Additionally, it has displayed promising outcomes when employed to solve NP -hard combinatorial optimization problems. MCTS has been adapted for distributed -memory parallel platforms. The primary challenges associated with distributed -memory parallel MCTS are the substantial communication overhead and the necessity to balance the computational load among various processes. In this work, we introduce a novel distributed -memory parallel MCTS algorithm with partial backpropagations, referred to as parallel Partial-Backpropagation MCTS (PPB-MCTS). Our design approach aims to significantly reduce the communication overhead while maintaining, or even slightly improving, the performance in the context of combinatorial optimization problems. To address the communication overhead challenge, we propose a strategy involving transmitting an additional backpropagation message. This strategy avoids attaching an information table to the communication messages exchanged by the processes, thus reducing the communication overhead. Furthermore, this approach contributes to enhancing the decision -making accuracy during the selection phase. The load balancing issue is also effectively addressed by implementing a shared transposition table among the parallel processes. Furthermore, we introduce two primary methods for managing duplicate states within distributed -memory parallel MCTS, drawing upon techniques utilized in addressing duplicate states within sequential MCTS. Duplicate states can transform the conventional search tree into a Directed Acyclic Graph (DAG). To evaluate the performance of our proposed parallel algorithm, we conduct
Computer-aided design (CAD) models of industrial design are often plagued with a series of defects, including minute features, gaps, self-intersections, and misalignments. The development of schemes in automatic defea...
详细信息
Computer-aided design (CAD) models of industrial design are often plagued with a series of defects, including minute features, gaps, self-intersections, and misalignments. The development of schemes in automatic defeaturing, gaps repairing, and intersection removal usually requires a discrete representation of the geometry. However, existing surface meshing methods could not effectively handle sliver surfaces and assemblies with complex contact conditions, including multiple misaligned curves/surfaces and degenerated or free-form-shaped interface of contact. A surface meshing method based on a meshing-and-synchronizing strategy is proposed, which tackles the mesh generation of sliver surfaces and the cleanup of misaligned assemblies by means of mesh alignment. The mesh generation is performed in a hierarchical manner with curves and surfaces being meshed in sequence. By incorporating the synchronization strategy into the curve/surface meshing process, curve/surface mesh alignment is achieved automatically without compromising the mesh quality. Thanks to the alignment, the curve mesh generated by aligned curve meshing (ACM) is free of intersections, which enhances the meshing of sliver surfaces. By enforcing mesh alignment, aligned surface meshing (ASM) can handle misaligned assemblies with complex contact conditions. ASM is parallelized by using OpenMP. Various assemblies characterized with difficult misaligned features and touches are well processed by parallel ASM.
An important combinatorial problem is subgraph isomorphism, which formalizes the task of searching for occurrences of a known substructure within a larger structure represented by a graph: applications are in the fiel...
详细信息
An important combinatorial problem is subgraph isomorphism, which formalizes the task of searching for occurrences of a known substructure within a larger structure represented by a graph: applications are in the fields of chemistry, biology, medicine, databases, social network analysis. Subgraph isomorphism has been proven to be NP-complete in the general case, but several algorithms exist that use heuristics to achieve an affordable run time for common classes of graphs. The need of working with larger and larger graphs makes attractive the idea of parallelizing this task;however, a consensus has not yet been reached on what is the best strategy for doing so. In this paper, we present two versions of a new, parallel algorithm that is based on a re-design of the well known VF3 algorithm. We discuss the changes that were made to efficiently distribute the work among multiple processors. The algorithms have been evaluated with a comprehensive experimentation, using several publicly available graph datasets, to demonstrate their effectiveness in exploiting the parallelism. (c) 2021 Elsevier B.V. All rights reserved.
Herein, a parallel implementation of Discrete Orthogonal moments on block represented images is investigated. Moments and moment functions have been used widely as features for image analysis and pattern recognition t...
详细信息
Herein, a parallel implementation of Discrete Orthogonal moments on block represented images is investigated. Moments and moment functions have been used widely as features for image analysis and pattern recognition tasks. The main disadvantage of all moment sets, is the high computational cost which is increased as higher-order moments are involved in the computations. In image block representation (IBR) the image is represented by homogeneous areas which are called blocks. The IBR allows moment computation with zero computational error for binary images, low computational error for gray images, low computational complexity, while can achieve high processing rates. The results from parallel implementation on a multicore computer using OpenMP, exhibit significant performance.
暂无评论