检索结果-内蒙古大学图书馆

International Symposium on parallel Architectures, algorithms and Programming (PAAP)

作者： Yijie Han Sanjeev Saxena School of Computing and Engineering University of Missouri at Kansas City Kansas City MO USA Dept. of Computer Science and Engineering Indian Institute of Technology Kanpur INDIA

We present new parallel algorithms for testing pattern involvement for all length 4 permutations. Our algorithmshave the complexity of O(log n) time with n/log nprocessors on the CREW PRAM model, O(logloglog n) timewith n/logloglog n processors or constant time and nlog3 nprocessors on a CRCW PRAM model. parallel algorithms werenot designed before for some of these patterns and for otherpatters the previous best algorithms require O(log n) time and n processors on the CREW PRAM model.

关键词： Program processors Arrays Phase change random access memory Silicon Indexes parallel algorithms Testing

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms for Hard Combinatorial Optimisation Problems in Multi-Agent Systems 14

Parallel Algorithms for Hard Combinatorial Optimisation Prob...

引用

International Conference on Autonomous Agents and Multiagent Systems

作者： Filippo Bistaffa University of Verona

ISBN: (纸本)9781634391313

Multi-agent systems represent a powerful tool to model several interesting real-world problems. Unfortunately, the limited scalability of many state-of-the-art algorithms hinders their applicability in practical situations: in fact, complex dynamics and interactions among a large number of agents often make the search for an optimal solution an unfeasible task. Against this background, the study and design of new highly parallel computational models could greatly improve solution techniques in the above mentioned fields. In particular, I will introduce two parallel approaches to the coalition formation problem in the context of multi-agent systems, detailing how their performances can benefit from the use of modern parallel architectures.

关键词： Multi-Core GPGPU Coalition Formation Multi-agent systems parallel algorithms Computational modeling parallel architectures

来源：评论

学校读者我要写书评

暂无评论

Local and parallel finite element methods based on two-grid discretizations for unsteady convection-diffusion problem

引用

NUMERICAL METHODS FOR PARTIAL DIFFERENTIAL EQUATIONS 2021年第6期37卷 3023-3041页

作者： Li, Qingtao Du, Guangzhi Shandong Normal Univ Sch Math & Stat Jinan 250014 Peoples R China

In this article, some local and parallel finite element methods are proposed and investigated for the time-dependent convection-diffusion problem. With backward Euler scheme for the temporal discretization, the basic idea of the present methods is that for a solution to the considered equations, low frequency components can be approximated well by a relatively coarse grid and high frequency components can be computed on a fine grid by some local and parallel procedure at each time step. The partition of unity is used to collect the local high frequency components to assemble a global continuous approximation. Theoretical results are obtained and numerical tests are reported to support the theoretical findings.

关键词： backward Euler scheme finite element method parallel algorithms partition of unity two-grid method

来源：评论

学校读者我要写书评

暂无评论

algorithms to Speed up Contour Tracing in Real Time Image Processing Systems

引用

IEEE ACCESS 2022年 10卷 127365-127376页

作者： Gupta, Sonal Kar, Subrat Indian Inst Technol Delhi IIT Delhi Dept Elect Engn New Delhi 110016 India

Contour tracing is an important pre-processing step in many image-processing applications such as feature recognition, biomedical imaging, security and surveillance. As single-processor architectures reach their performance limits, parallel processing architectures offer energy-efficient and high-performance solutions for real-time applications. parallel processing architectures are thus used for several real-time image processing applications. Among the several interconnection schemes available, Cayley graph-based interconnections offer easy routing and symmetric implementation capabilities. For parallel processing systems with a Cayley graph-based interconnection scheme, torus, we developed three accelerated algorithms corresponding to three existing families of contour tracing algorithms. We simulated these algorithms on a parallel processing framework to quantify the normalized speed-up possible in any torus-connected parallel processing system. We also compared our best-performing algorithm with the existing parallel processing implementations for Nvidia GPUs. We observed a speed-up of up to 468 times using our algorithms on a parallel processing architecture in comparison to the corresponding algorithm on a single processor architecture. We evaluated a speedup of 194 (and 47) compared to the existing parallel processing contour tracing implementation on Tesla K40c (and Quadro RTX 5000 GPU hardware, respectively). We observe that for torus-connected parallel processing architectures used for image processing, our algorithms can speed up contour tracing without any hardware modification.

关键词： parallel processing Computer architecture Hardware Graphics processing units Image processing Random access memory Image segmentation Accelerated contour tracing image processing parallel algorithms multiprocessors torus GPU

来源：评论

学校读者我要写书评

暂无评论

It’s Hard to HAC Average Linkage! 51

It’s Hard to HAC Average Linkage!

引用

51st International Colloquium on Automata, Languages, and Programming, ICALP 2024

作者： Bateni, MohammadHossein Dhulipala, Laxman Gowda, Kishen N. Hershkowitz, D. Ellis Jayaram, Rajesh Lącki, Jakub Google Research New YorkNY United States University of Maryland College ParkMD United States Brown University ProvidenceRI United States

ISBN: (纸本)9783959773225

Average linkage Hierarchical Agglomerative Clustering (HAC) is an extensively studied and applied method for hierarchical clustering. Recent applications to massive datasets have driven significant interest in near-linear-time and efficient parallel algorithms for average linkage HAC. We provide hardness results that rule out such algorithms. On the sequential side, we establish a runtime lower bound of n3/2−ϵ on n node graphs for sequential combinatorial algorithms under standard fine-grained complexity assumptions. This essentially matches the best-known running time for average linkage HAC. On the parallel side, we prove that average linkage HAC likely cannot be parallelized even on simple graphs by showing that it is CC-hard on trees of diameter 4. On the possibility side, we demonstrate that average linkage HAC can be efficiently parallelized (i.e., it is in NC) on paths and can be solved in near-linear time when the height of the output cluster hierarchy is small. © MohammadHossein Bateni, Laxman Dhulipala, Kishen N. Gowda, D. Ellis Hershkowitz, Rajesh Jayaram, and Jakub Lącki.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Numerical simulation of temperature fields in an open geothermal system on multicore processors

引用

GEOMECHANICS AND GEOPHYSICS FOR GEO-ENERGY AND GEO-RESOURCES 2022年第2期8卷 1-11页

作者： Filimonov, Mikhail Yu Akimova, Elena N. Misilov, Vladimir E. Vaganova, Nataliia A. RAS Krasovskii Inst Math & Mech Ural Branch Ekaterinburg Russia Ural Fed Univ Ekaterinburg Russia

One of the types of renewable energy sources is geothermal energy, obtained from the bowels of the Earth, which use the heat of the water pumped from underground reservoirs. As a rule, the produced water is highly saline and contains chemical compounds that can be hazardous to the environment. In this regard, the actual problems are mathematical models that describe the process of propagation of temperature fields in a geothermal reservoir. The paper presents one of such mathematical models of an open geothermal system, consisting of production and injection wells. The novelty of the proposed model lies in considering the most significant technical features of wells and specific thermophysical parameters of the geothermal reservoir. On the basis of an implicit finite-difference scheme, a numerical algorithm for modeling non-stationary heat transfer processes in a three-dimensional area of an aquifer has been developed. Justification for applicability of the numerical method is presented. Since the complete numerical simulation requires significant computer time, a parallel version of the computational algorithm and a program for multicore processors using the OpenMP technology were also developed. The results of a series of numerical experiments and evaluation of the efficiency of the parallel algorithm are presented. Article Highlights A mathematical model is proposed to study the influence of the location of wells in a geothermal system to estimate the effectiveness. parallel numerical algorithm for solving the systems of difference equations for heat and mass transfer is constructed and implemented on multicore CPU. Investigation of efficiency and speedup of the parallel algorithm is performed.

关键词： Thermodynamics and heat transfer Numerical methods Geothermal system parallel algorithms OpenMP

来源：评论

学校读者我要写书评

暂无评论

Multi-GPU Efficient Indexing For Maximizing parallelism of High Dimensional Range Query Services

引用

IEEE TRANSACTIONS ON SERVICES COMPUTING 2022年第5期15卷 2910-2924页

作者： Kim, Mincheol Liu, Ling Choi, Wonik Things Workshop Inc Incheon 22322 South Korea Georgia Inst Technol Coll Comp Atlanta GA 30332 USA Inha Univ Sch Informat & Commun Engn Incheon 22212 South Korea

Numerous research efforts have been proposed for efficient processing of range queries in high-dimensional space by either redesigning R-tree access structure for exploring massive parallelism on single GPU or exploring distributed framework of R-tree. However, none of the existing efforts explores the integration of the parallelization of the R-tree on a single GPU with a distributed framework for the R-tree. The problem of designing an efficient multi-GPU indexing method, which can effectively combine the parallelism maximization with distributed processing of the R-tree, remains an open challenge. In this article, we present a novel multi-GPU efficient parallel and distributed indexing method, called LBPG-tree. The rationale of the LBPG-tree is to combine the advantages of an instruction pipeline in CPU with the massive parallel processing potential of multiple GPUs by introducing two new optimization strategies: First, we exploit the GPU L2 cache for accelerating both index search and index node access on GPUs. Second, we further improve utilization of L2 cache on GPUs by compacting and sorting candidate nodes called Compact-and-Sort. Our experimental results show that the LBPG-tree outperforms G-tree, the previous representative GPU index method and effectively support multiple GPUs for providing efficient high dimensional range query service.

关键词： Graphics processing units parallel processing Pipelines Indexing Servers Query processing Periodic structures GPU parallel algorithms indexing methods

来源：评论

学校读者我要写书评

暂无评论

Scalable distributed Louvain algorithm for community detection in large graphs

引用

JOURNAL OF SUPERCOMPUTING 2022年第7期78卷 10275-10309页

作者： Sattar, Naw Safrin Arifuzzaman, Shaikh Univ New Orleans Dept Comp Sci New Orleans LA 70148 USA

Community detection (or clustering) in large-scale graphs is an important problem in graph mining. Communities reveal interesting organizational and functional characteristics of a network. Louvain algorithm is an efficient sequential algorithm for community detection. However, such sequential algorithms fail to scale for emerging large-scale data. Scalable parallel algorithms are necessary to process large graph datasets. In this work, we show a comparative analysis of our different parallel implementations of Louvain algorithm. We design parallel algorithms for Louvain method in shared memory and distributed memory settings. Developing distributed memory parallel algorithms is challenging because of inter-process communication and load balancing issues. We incorporate dynamic load balancing in our final algorithm DPLAL (Distributed parallel Louvain Algorithm with Load-balancing). DPLAL overcomes the performance bottleneck of the previous algorithms and shows around 12-fold speedup scaling to a larger number of processors. We also compare the performance of our algorithm with some other prominent algorithms in the literature and get better or comparable performance . We identify the challenges in developing distributed memory algorithm and provide an optimized solution DPLAL showing performance analysis of the algorithm on large-scale real-world networks from different domains.

关键词： Community detection Louvain method parallel algorithms MPI OpenMP Load balancing Graph mining

来源：评论

学校读者我要写书评

暂无评论

Communication lower-bounds for distributed-memory computations for mass spectrometry based omics data

引用

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2022年 161卷 37-47页

作者： Saeed, Fahad Haseeb, Muhammad Iyengar, S. S. Florida Int Univ FIU Sch Comp & Informat Sci SCIS Miami FL 33199 USA

Mass spectrometry (MS) based omics data analysis require significant time and resources. To date, few parallel algorithms have been proposed for deducing peptides from mass spectrometry-based data. However, these parallel algorithms were designed, and developed when the amount of data that needed to be processed was smaller in scale. In this paper, we prove that the communication bound that is reached by the existing parallel algorithms is Omega(mn + 2r q/p), where m and n are the dimensions of the theoretical database matrix, q and r are dimensions of spectra, and pis the number of processors. We further prove that communication-optimal strategy with fast-memory root M = mn + 2qr/p can achieve Omega(2mnq/p) but is not achieved by any existing parallel proteomics algorithms till date. To validate our claim, we performed a meta-analysis of published parallel algorithms, and their performance results. We show that sub-optimal speedups with increasing number of processors is a direct consequence of not achieving the communication lower-bounds. We further validate our claim by performing experiments which demonstrate the communication bounds that are proved in this paper. Consequently, we assert that next-generation of provable, and demonstrated superior parallel algorithms are urgently needed for MS based large systems-biology studies especially for meta-proteomics, proteogenomic, microbiome, and proteomics for non-model organisms. Our hope is that this paper will excite the parallel computing community to further investigate parallel algorithms for highly influential MS based omics problems. (C) 2021 Elsevier Inc. All rights reserved.

关键词： parallel algorithms Mass spectrometry Proteomics Communication-avoiding

来源：评论

学校读者我要写书评

暂无评论

An Adaptive Asynchronous Approach for the Single-Source Shortest Paths Problem

An Adaptive Asynchronous Approach for the Single-Source Shor...

引用

2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024

作者： Rao, Ritvik Chandrasekar, Kavitha Kale, Laxmikant University of Illinois UrbanaIL United States

ISBN: (纸本)9798350355543

Large-scale graphs with billions and trillions of vertices and edges require efficient parallel algorithms for common graph problems, one of which is single-source shortest paths (SSSP). Bulk-synchronous parallel algorithms such as -stepping experience large synchronization costs at the scale of many nodes, so asynchronous approaches are needed for scalability. However, asynchronous approaches are susceptible to wasteful, speculative execution. We introduce ACIC, a highly asynchronous approach modulated by continuous concurrent introspection and adaptation. Using message-driven concurrent reductions and broadcasts, task-based scheduling, and an adaptive aggregation library, we explore techniques such as evolving windows and generation and prioritized flow of optimal updates, or edge relaxations, aimed at reducing speculative loss without constraining parallelism. Our results, while preliminary, demonstrate the promise of these ideas, with the potential to impact a wider class of graph algorithms. © 2024 IEEE.

关键词： graphs parallel algorithms sssp

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：