检索结果-内蒙古大学图书馆

34th ACM Symposium on parallelism in algorithms and Architectures (SPAA)

作者： Gu, Yan Napier, Zachary Sun, Yihan Wang, Letong UC Riverside Riverside CA 92521 USA

ISBN: (纸本)9781450391467

The cover tree is the canonical data structure that efficiently maintains a dynamic set of points on a metric space and supports nearest and k-nearest neighbor searches. For most real-world datasets with reasonable distributions (constant expansion rate and bounded aspect ratio mathematically), single-point insertion, single-point deletion, and nearest neighbor search (NNS) only cost logarithmically to the size of the point set. Unfortunately, due to the complication and the use of depth-first traversal order in the cover tree algorithms, we were unaware of any parallel approaches for these cover tree algorithms. This paper shows highly parallel and work-efficient cover tree algorithms that can handle batch insertions (and thus construction) and batch deletions. Assuming constant expansion rate and bounded aspect ratio, inserting or deleting m points into a cover tree with n points takes O(m log n) expected work and polylogarithmic span with high probability. Our algorithms rely on some novel algorithmic insights. We model the insertion and deletion process as a graph and use a maximal independent set (MIS) to generate tree nodes without conflicts. We use three key ideas to guarantee work-efficiency: the prefix-doubling scheme, a careful design to limit the graph size on which we apply MIS, and a strategy to propagate information among different levels in the cover tree. We also use path-copying to make our parallel cover tree a persistent data structure, which is useful in several applications. Using our parallel cover trees, we show work-efficient (or near-work-efficient) and highly parallel solutions for a list of problems in computational geometry and machine learning, including Euclidean minimum spanning tree (EMST), single-linkage clustering, bichromatic closest pair (BCP), density-based clustering and its hierarchical version, and others. To the best of our knowledge, many of them are the first solutions to achieve work-efficiency and polylogarithmic span ass

关键词： cover tree parallel algorithms parallel data structures nearest neighbor search euclidean minimum spanning tree single-linkage clustering

来源：评论

学校读者我要写书评

暂无评论

RIMR: Reverse Influence Maximization Rank

RIMR: Reverse Influence Maximization Rank

引用

1st International Conference on Smart Energy Systems and Artificial Intelligence (SESAI)

作者： Vap, Jay Kogge, Peter Univ Notre Dame Dept Comp Sci & Engn Notre Dame IN 46556 USA

ISBN: (纸本)9798350364613;9798350364606

The ubiquitous usage of social media and other internet platforms has led to the widespread proliferation of naturally occurring social networks. This has caused a revolution in marketing and information dissemination due to the rise of influencers, and their innate ability to broadcast information to large audiences. Influence maximization, a key problem in identifying optimal influential individuals, can be used to identify influencers in a social network, or can be superimposed on other settings which exhibit relational information. Popular methods, such as greedy algorithms, face scalability issues, performing poorly on larger networks, and incurring a large memory overhead. To address these challenges, this paper introduces a novel methodology, RIMR, which leverages statistical modeling and parallel programming to circumvent the need for storing large collections of data, achieving scalability and efficiency. Experimental results demonstrate the effectiveness and scalability of the proposed approach compared to state-of-the-art methods, showcasing significant speedups. This paper provides insights into the development of the proposed statistical process, the heterogeneous algorithm implementation, and a prototype demonstrating its efficacy.

关键词： GPUs Heterogeneous Architectures Influence Maximization parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Irregular Computing on Genomic Big Data: Design Space Exploration from Silicon Clusters to Quantum Accelerators 5

Irregular Computing on Genomic Big Data: Design Space Explor...

引用

5th International Conference on Innovative Trends in Information Technology (ICITIIT)

作者： Cheriyan, Cherisha Ronney Madhankumar, P. Sundar, Shyam S. Haripriya, N. Kandappan, V. A. Natarajan, Santhi Shiv Nadar Univ Chennai Sch Engn Dept CSE Cognit Lab Kalavakkam 603110 Tamil Nadu India

ISBN: (纸本)9798350386813;9798350386820

Genome Informatics (GI), executed over a multi-stage data analytics pipeline, has evolved to be an interdisciplinary approach to discover hidden structured information in the unstructured genomic raw data. As the massive genomic data keeps growing exponentially, HPC platforms are increasingly sought after to handle the associated complex computations. GI is characterized by heavy data dependency that triggers irregular computing on the fly, thereby demanding heavy fine-grain synchronization. Modern computing architectures, built on subterranean memory hierarchies supporting a temporal and spatial mapping of data, would fail to support GI. In this paper, we present the irregular genomic computing capabilities of ReneGENE-GI, our novel GI pipeline, evaluated across scalable accelerator platforms, ranging from silicon clusters to Quantum Emulators. This pipeline exploits the intrinsic parallel performance of the hardware, coupled with dynamic data structures and scalable algorithms. Seamless data streaming techniques helps to keep heavy memory footmark at bay, while enabling run time analytics with minimum Input/Output bottlenecks. High precision computing ensures genetic reliability of the inferences.

关键词： HPC Genome Informatics Irregular Bio Computing Quantum Accelerators parallel algorithms Accuracy Fine grain synchronization Load balancing

来源：评论

学校读者我要写书评

暂无评论

A Scalable parallel Algorithm for Balanced Sampling 36

A Scalable Parallel Algorithm for Balanced Sampling

引用

36th AAAI Conference on Artificial Intelligence / 34th Conference on Innovative Applications of Artificial Intelligence / 12th Symposium on Educational Advances in Artificial Intelligence

作者： Lee, Alexander Walzer-Goldfeld, Stefan Zablah, Shukry Riondato, Matteo Amherst Coll Dept Comp Sci Box 2232 Amherst MA 01002 USA Pallet Labs Inc Boulder CO USA

ISBN: (纸本)9781577358763

We present a novel parallel algorithm for drawing balanced samples from large populations. When auxiliary variables about the population units are known, balanced sampling improves the quality of the estimations obtained from the sample. Available algorithms, e.g., the cube method, are inherently sequential, and do not scale to large populations. Our parallel algorithm is based on a variant of the cube method for stratified populations. It has the same sample quality as sequential algorithms, and almost ideal parallel speedup.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Low-Depth Spatial Tree algorithms 38

Low-Depth Spatial Tree Algorithms

引用

International parallel and Distributed Processing Symposium (IPDPS)

作者： Baumann, Yves Ben-Nun, Tal Besta, Maciej Gianinazzi, Lukas Hoefler, Torsten Luczynski, Piotr Swiss Fed Inst Technol Zurich Switzerland

ISBN: (纸本)9798350387117;9798350387124

Contemporary accelerator designs exhibit a high degree of spatial localization, wherein two-dimensional physical distance determines communication costs between processing elements. This situation presents considerable algorithmic challenges, particularly when managing sparse data, a pivotal component in progressing data science. The spatial computer model quantifies communication locality by weighting processor communication costs by distance, introducing a term named energy. Moreover, it integrates depth, a widely-utilized metric, to promote high parallelism. We propose and analyze a framework for efficient spatial tree algorithms within the spatial computer model. Our primary method constructs a spatial tree layout that optimizes the locality of the neighbors in the compute grid. This approach thereby enables locality-optimized messaging within the tree. Our layout achieves a polynomial factor improvement in energy compared to utilizing a PRAM approach. Using this layout, we develop energy-efficient treefix sum and lowest common ancestor algorithms, which are both fundamental building blocks for other graph algorithms. With high probability, our algorithms exhibit near-linear energy and poly-logarithmic depth. Our contributions augment a growing body of work demonstrating that computations can have both high spatial locality and low depth. Moreover, our work constitutes an advancement in the spatial layout of irregular and sparse computations.

关键词： Graph algorithms parallel algorithms Trees Routing and layout Distributed data structures Dataflow architectures Adaptable architectures

来源：评论

学校读者我要写书评

暂无评论

A Novel Low-Complexity and parallel Algorithm for DCT IV Transform and Its GPU Implementation

引用

APPLIED SCIENCES-BASEL 2024年第17期14卷 7491页

作者： Chiper, Doru Florin Dobrea, Dan Marius Gheorghe Asachi Tech Univ Fac Elect Telecommun & Informat Technol Iasi 700506 Romania Tech Sci Acad Romania ASTR Iasi 700050 Romania Acad Romanian Scientists AOSR Bucharest 030167 Romania

This study proposes a novel factorization method for the DCT IV algorithm that allows for breaking it into four or eight sections that can be run in parallel. Moreover, the arithmetic complexity has been significantly reduced. Based on the proposed new algorithm for DCT IV, the speed performance has been improved substantially. The performance of this algorithm was verified using two different GPU systems produced by the NVIDIA company. The experimental results show that the novel proposed DCT algorithm achieves an impressive reduction in the total processing time. The proposed method is very efficient, improving the algorithm speed by more than 4-times-that was expected by segmenting the DCT algorithm into four sections running in parallel. The speed improvements are about five-times higher-at least 5.41 on Jetson AGX Xavier, and 10.11 on Jetson Orin Nano-if we compare with the classical implementation (based on a sequential approach) of DCT IV. Using a parallel formulation with eight sections running in parallel, the improvement in speed performance is even higher, at least 8.08-times on Jetson AGX Xavier and 11.81-times on Jetson Orin Nano.

关键词： parallel algorithms discrete trigonometric transforms DCT-IV GPU

来源：评论

学校读者我要写书评

暂无评论

PPB-MCTS: A novel distributed-memory parallel partial-backpropagation Monte Carlo tree search algorithm

引用

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2024年 193卷

作者： Naderzadeh, Yashar Grosu, Daniel Chinnam, Ratna Babu Wayne State Univ Dept Comp Sci 5057 Woodward Ave Detroit MI 48202 USA Wayne State Univ Dept Ind & Syst Engn 4815 Fourth St Detroit MI 48202 USA

Monte -Carlo Tree Search (MCTS) is an adaptive and heuristic tree -search algorithm designed to uncover sub -optimal actions at each decision -making point. This method progressively constructs a search tree by gathering samples throughout its execution. Predominantly applied within the realm of gaming, MCTS has exhibited exceptional achievements. Additionally, it has displayed promising outcomes when employed to solve NP -hard combinatorial optimization problems. MCTS has been adapted for distributed -memory parallel platforms. The primary challenges associated with distributed -memory parallel MCTS are the substantial communication overhead and the necessity to balance the computational load among various processes. In this work, we introduce a novel distributed -memory parallel MCTS algorithm with partial backpropagations, referred to as parallel Partial-Backpropagation MCTS (PPB-MCTS). Our design approach aims to significantly reduce the communication overhead while maintaining, or even slightly improving, the performance in the context of combinatorial optimization problems. To address the communication overhead challenge, we propose a strategy involving transmitting an additional backpropagation message. This strategy avoids attaching an information table to the communication messages exchanged by the processes, thus reducing the communication overhead. Furthermore, this approach contributes to enhancing the decision -making accuracy during the selection phase. The load balancing issue is also effectively addressed by implementing a shared transposition table among the parallel processes. Furthermore, we introduce two primary methods for managing duplicate states within distributed -memory parallel MCTS, drawing upon techniques utilized in addressing duplicate states within sequential MCTS. Duplicate states can transform the conventional search tree into a Directed Acyclic Graph (DAG). To evaluate the performance of our proposed parallel algorithm, we conduct

关键词： parallel algorithms Monte Carlo tree search Job shop scheduling

来源：评论

学校读者我要写书评

暂无评论

PASM: parallel aligned surface meshing

引用

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING 2021年第15期122卷 3705-3732页

作者： Yu, Fei Cao, Jie Shan, Julin Lo, S. H. Guan, Zhenqun Dalian Univ Technol Dept Engn Mech State Key Lab Struct Anal Ind Equipment Dalian Peoples R China Shanghai Geyu Software Co Ltd Shanghai Peoples R China Univ Hong Kong Dept Civil Engn Hong Kong Peoples R China

Computer-aided design (CAD) models of industrial design are often plagued with a series of defects, including minute features, gaps, self-intersections, and misalignments. The development of schemes in automatic defeaturing, gaps repairing, and intersection removal usually requires a discrete representation of the geometry. However, existing surface meshing methods could not effectively handle sliver surfaces and assemblies with complex contact conditions, including multiple misaligned curves/surfaces and degenerated or free-form-shaped interface of contact. A surface meshing method based on a meshing-and-synchronizing strategy is proposed, which tackles the mesh generation of sliver surfaces and the cleanup of misaligned assemblies by means of mesh alignment. The mesh generation is performed in a hierarchical manner with curves and surfaces being meshed in sequence. By incorporating the synchronization strategy into the curve/surface meshing process, curve/surface mesh alignment is achieved automatically without compromising the mesh quality. Thanks to the alignment, the curve mesh generated by aligned curve meshing (ACM) is free of intersections, which enhances the meshing of sliver surfaces. By enforcing mesh alignment, aligned surface meshing (ASM) can handle misaligned assemblies with complex contact conditions. ASM is parallelized by using OpenMP. Various assemblies characterized with difficult misaligned features and touches are well processed by parallel ASM.

关键词： assembly cleanup finite element mesh generation mesh alignment parallel algorithms surface patch meshing

来源：评论

学校读者我要写书评

暂无评论

Two parallel versions of VF3: Performance analysis on a wide database of graphs

引用

PATTERN RECOGNITION LETTERS 2021年 146卷 150-157页

作者： Carletti, Vincenzo Foggia, Pasquale Percannella, Gennaro Ritrovato, Pierluigi Vento, Mario Univ Salerno Dept Informat Engn Elect Engn & Appl Math Salerno Italy

An important combinatorial problem is subgraph isomorphism, which formalizes the task of searching for occurrences of a known substructure within a larger structure represented by a graph: applications are in the fields of chemistry, biology, medicine, databases, social network analysis. Subgraph isomorphism has been proven to be NP-complete in the general case, but several algorithms exist that use heuristics to achieve an affordable run time for common classes of graphs. The need of working with larger and larger graphs makes attractive the idea of parallelizing this task;however, a consensus has not yet been reached on what is the best strategy for doing so. In this paper, we present two versions of a new, parallel algorithm that is based on a re-design of the well known VF3 algorithm. We discuss the changes that were made to efficiently distribute the work among multiple processors. The algorithms have been evaluated with a comprehensive experimentation, using several publicly available graph datasets, to demonstrate their effectiveness in exploiting the parallelism. (c) 2021 Elsevier B.V. All rights reserved.

关键词： Exact graph matching Subgraph isomorphism parallel algorithms VF3

来源：评论

学校读者我要写书评

暂无评论

parallel Computation of Discrete Orthogonal Moment on Block Represented Images Using OpenMP

引用

INTERNATIONAL JOURNAL OF parallel PROGRAMMING 2021年第3期49卷 440-462页

作者： Spiliotis, Iraklis M. Sitaridis, Charalampos Bekakos, Michael P. Democritus Univ Thrace Rept Elect & Comp Engn Xanthi 67100 Greece

Herein, a parallel implementation of Discrete Orthogonal moments on block represented images is investigated. Moments and moment functions have been used widely as features for image analysis and pattern recognition tasks. The main disadvantage of all moment sets, is the high computational cost which is increased as higher-order moments are involved in the computations. In image block representation (IBR) the image is represented by homogeneous areas which are called blocks. The IBR allows moment computation with zero computational error for binary images, low computational error for gray images, low computational complexity, while can achieve high processing rates. The results from parallel implementation on a multicore computer using OpenMP, exhibit significant performance.

关键词： Discrete orthogonal moments Image block representation Fast image analysis parallel algorithms OpenMP

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：