检索结果-内蒙古大学图书馆

Parallel evolutionary algorithms based on shared memory programming approaches

JOURNAL OF SUPERCOMPUTING 2011年第2期58卷 270-279页

作者： Redondo, J. L. Garcia, I. Ortigosa, P. M. Univ Almeria Dpt Comp Architecture & Elect Almeria Spain

In this work, two parallel techniques based on shared memory programming are presented. These models are specially suitable to be applied over evolutionary algorithms. To study their performance, the algorithm UEGO (U... 详细信息

关键词： Evolutionary algorithm shared memory programming Computational experiment UEGO

来源：评论

学校读者我要写书评

暂无评论

Modularity-based parallel protein design algorithm with an implementation using shared memory programming

引用

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS 2022年第3期90卷 658-669页

作者： Pal, Abantika Mulumudy, Rohith Mitra, Pralay Indian Inst Technol Kharagpur Dept Comp Sci & Engn Kharagpur 721302 W Bengal India

Given a target protein structure, the prime objective of protein design is to find amino acid sequences that will fold/acquire to the given three-dimensional structure. The protein design problem belongs to the non-deterministic polynomial-time-hard class as sequence search space increases exponentially with protein length. To ensure better search space exploration and faster convergence, we propose a protein modularity-based parallel protein design algorithm. The modular architecture of the protein structure is exploited by considering an intermediate structural organization between secondary structure and domain defined as protein unit (PU). Here, we have incorporated a divide-and-conquer approach where a protein is split into PUs and each PU region is explored in a parallel fashion. It has been further analyzed that our shared memory implementation of modularity-based parallel sequence search leads to better search space exploration compared to the case of traditional full protein design. Sequence-based analysis on design sequences depicts an average of 39.7% sequence similarity on the benchmark data set. Structure-based comparison of the modeled structures of the design protein with the target structure exhibited an average root-mean-square deviation of 1.17 angstrom and an average template modeling score of 0.89. The selected modeled structures of the design protein sequences are validated using 100 ns molecular dynamics simulations where 80% of the proteins have shown better or similar stability to the respective target proteins. Our study informs that our modularity-based protein design algorithm can be extended to protein interaction design as well.

关键词： computational protein design multicore implementation parallel algorithm protein unit replica-exchange Monte Carlo shared memory programming

来源：评论

学校读者我要写书评

暂无评论

Scalable parallel graph coloring algorithms

引用

CONCURRENCY-PRACTICE AND EXPERIENCE 2000年第12期12卷 1131-1146页

作者： Gebremedhin, AH Manne, F Univ Bergen Dept Informat N-5020 Bergen Norway

Finding a good graph coloring quickly is often a crucial phase in the development of efficient, parallel algorithms for many scientific and engineering applications. In this paper we consider the problem of solving the graph coloring problem itself in parallel, We present a simple and fast parallel graph coloring heuristic that is well suited for shared memory programming and yields an almost linear speedup on the PRAM model. We also present a second heuristic that improves on the number of colors used. The heuristics have been implemented using OpenMP, Experiments conducted on an SGI Gray Origin 2000 supercomputer using very large graphs from finite element methods and eigenvalue computations validate the theoretical run-time analysis. Copyright (C) 2000 John Whey & Sons, Ltd.

关键词： graph coloring parallel algorithms shared memory programming OpenMP

来源：评论

学校读者我要写书评

暂无评论

The Heat Equation: High-Performance Scientific Computing Case Study

引用

COMPUTING IN SCIENCE & ENGINEERING 2018年第5期20卷 114-127页

作者： Schuster, Micah D. Wentworth Inst Technol Dept Comp Sci & Networking Boston MA 02115 USA

In recent years, high performance computing and powerful supercomputers are becoming a staple in many areas of academia and industry. The author introduces the concepts of shared memory programming in the context of solving the heat equation, which will allow the exploration of several finite difference and parallelization schemes.

关键词： Diffusion Finite Difference Methods Parabolic Equations Parallel programming Partial Differential Equations Physics Computing shared memory Systems Powerful Supercomputers Academia Industry shared memory programming Heat Equation High Performance Scientific Computing Case Study Finite Difference Scheme Parallelization Scheme Mathematical Model Instruction Sets Heating Systems Parallel Processing Runtime Task Analysis Your Homework Assignment Heat Equation High Performance Computing HPC Scientific Computing shared memory programming Open MP

来源：评论

学校读者我要写书评

暂无评论

programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures

引用

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 2023年 175卷 51-65页

作者： Catalan, Sandra Igual, Francisco D. Herrero, Jose R. Rodriguez-Sanchez, Rafael Quintana-Orti, Enrique S. Univ Complutense Madrid Dept Arquitectura Comp & Automat Madrid Spain Univ Politecn Cataluna Dept Arquitectura Comp Barcelona Spain Univ Politecn Valencia Dept Informat Sistemas & Comp Valencia Spain

We propose a methodology to address the programmability issues derived from the emergence of newgeneration shared-memory NUMA architectures. For this purpose, we employ dense matrix factorizations and matrix inversion (DMFI) as a use case, and we target two modern architectures (AMD Rome and Huawei Kunpeng 920) that exhibit configurable NUMA topologies. Our methodology pursues performance portability across different NUMA configurations by proposing multi-domain implementations for DMFI plus a hybrid task- and loop-level parallelization that configures multi-threaded executions to fix core-todata binding, exploiting locality at the expense of minor code modifications. In addition, we introduce a generalization of the multi-domain implementations for DMFI that offers support for virtually any NUMA topology in present and future architectures. Our experimentation on the two target architectures for three representative dense linear algebra operations validates the proposal, reveals insights on the necessity of adapting both the codes and their execution to improve data access locality, and reports performance across architectures and inter- and intra-socket NUMA configurations competitive with state-of-the-art message-passing implementations, maintaining the ease of development usually associated with shared-memory programming. (c) 2023 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND

关键词： NUMA architectures Chiplets Dense linear algebra shared memory programming Portability

来源：评论

学校读者我要写书评

暂无评论

LLOV: A Fast Static Data-Race Checker for OpenMP Programs

引用

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION 2020年第4期17卷 1–26页

作者： Bora, Utpal Das, Santanu Kukreja, Pankaj Joshi, Saurabh Upadrasta, Ramakrishna Rajopadhye, Sanjay IIT Hyderabad Dept CSE Kandi 502285 Telangana India Colorado State Univ Dept CSE Ft Collins CO 80523 USA

In the era of Exascale computing, writing efficient parallel programs is indispensable, and, at the same time, writing sound parallel programs is very difficult. Specifying parallelism with frameworks such as OpenMP is relatively easy, but data races in these programs are an important source of bugs. In this article, we propose LLOV, a fast, lightweight, language agnostic, and static data race checker for OpenMP programs based on the LLVM compiler framework. We compare LLOV with other state-of-the-art data race checkers on a variety of well-established benchmarks. We show that the precision, accuracy, and the F1 score of LLOV is comparable to other checkers while being orders of magnitude faster. To the best of our knowledge, LLOV is the only tool among the state-of-the-art data race checkers that can verify a C/C++ or FORTRAN program to be data race free.

关键词： OpenMP shared memory programming static analysis polyhedral compilation program verification data race detection

来源：评论

学校读者我要写书评

暂无评论

A shared memory SMC Sampler for Decision Trees 35

A Shared Memory SMC Sampler for Decision Trees

引用

35th IEEE International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

作者： Drousiotis, Efthyvoulos Varsi, Alessandro Spirakis, Paul G. Maskell, Simon Univ Liverpool Dept Elect Engn & Elect Liverpool L69 3BX Merseyside England Univ Liverpool Dept Comp Sci Liverpool L69 3BX Merseyside England

ISBN: (纸本)9798350305487

Modern classification problems tackled by using Decision Tree (DT) models often require demanding constraints in terms of accuracy and scalability. This is often hard to achieve due to the ever-increasing volume of data used for training and testing. Bayesian approaches to DTs using Markov Chain Monte Carlo (MCMC) methods have demonstrated great accuracy in a wide range of applications. However, the inherently sequential nature of MCMC makes it unsuitable to meet both accuracy and scaling constraints. One could run multiple MCMC chains in an embarrassingly parallel fashion. Despite the improved runtime, this approach sacrifices accuracy in exchange for strong scaling. Sequential Monte Carlo (SMC) samplers are another class of Bayesian inference methods that also have the appealing property of being parallelizable without trading off accuracy. Nevertheless, finding an effective parallelization for the SMC sampler is difficult, due to the challenges in parallelizing its bottleneck, redistribution, in such a way that the workload is equally divided across the processing elements, especially when dealing with variable-size models such as DTs. This study presents a parallel SMC sampler for DTs on shared memory (SM) architectures, with an O(log(2) N) parallel redistribution for variable-size samples. On an SM machine mounting 32 cores, the experimental results show that our proposed method scales up to a factor of 16 compared to its serial implementation, and provides comparable accuracy to MCMC, but 51 times faster.

关键词： Parallel Algorithms Sequential Monte Carlo Samplers Markov Chain Monte Carlo Bayesian Decision Trees shared memory programming

来源：评论

学校读者我要写书评

暂无评论

Scaling shared memory Multiprocessing Applications in Non-cache-coherent Domains 20

Scaling Shared Memory Multiprocessing Applications in Non-ca...

引用

13th ACM International Systems and Storage Conference (SYSTOR)

作者： Chuang, Ho-Ren Lyerly, Robert Lankes, Stefan Ravindran, Binoy Virginia Tech Blacksburg VA 24061 USA Rhein Westfal TH Aachen Aachen Germany

ISBN: (纸本)9781450375887

Due to the slowdown of Moore's Law, systems designers have begun integrating non-cache-coherent heterogeneous computing elements in order to continue scaling performance. programming such systems has traditionally been difficult - developers were forced to use programming models that exposed multiple memory regions, requiring developers to manually maintain memory consistency. Previous works proposed distributed shared memory (DSM) as a way to achieve high programmability in such systems. However, past DSM systems were plagued by low-bandwidth networking and utilized complex memory consistency protocols, which limited their adoption. Recently, new networking technologies have begun to change the assumptions about which components are bottlenecks in the system. Additionally, many popular shared-memory programming models utilize memory consistency semantics similar to those proposed for DSM, leading to widespread adoption in mainstream programming. In this work, we argue that it is time to revive DSM as a means for achieving good programmability and performance on non-cache-coherent systems. We explore optimizing an existing DSM protocol by relaxing memory consistency semantics and exposing new cross-node barrier primitives. We integrate the new mechanisms into an existing OpenMP runtime, allowing developers to leverage cross-node execution without changing a single line of code. When evaluated on an x86 server connected to an ARMv8 server via InfiniBand, the DSM optimizations achieve an average of 11% (up to 33%) improvement versus the baseline DSM implementation.

关键词： System Software DSM Heterogeneous Architectures shared memory programming InfiniBand

来源：评论

学校读者我要写书评

暂无评论

Pleiad: A Cross-Environment Middleware Providing Efficient Multithreading on Clusters 09

Pleiad: A Cross-Environment Middleware Providing Efficient M...

引用

6th ACM International Conference on Computing Frontiers and Workshops

作者： Karantasis, Konstantinos I. Polychronopoulos, Eleftherios D. Univ Patras Comp Engn & Informat Dept High Performance Informat Syst Lab Rion 26500 Greece

ISBN: (纸本)9781605584133

The engagement of cluster and grid computing, two popular trends of today's high performance computation, has formed an imperative need for efficient utilization of the afforded resources. In this paper we present the concept, design and implementation of the Pleiad platform'. Having its origin in the proposition of distributed shared memory (DSM), Pleiad is a cluster middleware that provides shared memory abstraction which enables transparent multithreaded execution across the cluster nodes. It belongs to the new generation of cluster middleware that aside from providing the proof of concept regarding unification of the cluster memory resources, they aim to achieve satisfactory levels of performance and scalability for a broad range of multithreaded applications. First results from the performance evaluation of Pleiad appear emboldening and they are presented in comparison with an efficient implementation of MPI for the Java platform.

关键词： cluster middleware shared memory programming java

来源：评论

学校读者我要写书评

暂无评论

Programmability and Performance of New Global-View programming API for Multi-Node and Multi-Core Processing

Programmability and Performance of New Global-View Programmi...

引用

IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)

作者： Sakaguchi, Yugo Midorikawa, Hiroko Seikei Univ Grad Sch Sci & Technol Tokyo Japan

ISBN: (纸本)9781728127941

Various partitioned global address space (PGAS) languages capable of providing global-view programming environments on multi-node computer systems have been proposed to improve programming productivity in high-performance computing. However, several PGAS languages often require a detailed description of the remote data access, similar to descriptions used in message passing interface one-sided communications. Some PGAS languages have limitations pertaining to remote data access and recommend their local-view programming models, rather than the global-view ones, due to performance-related reasons. In this study, we propose SMint, which is an application programming interface that provides a global-view programming model with a software distributed shared memory mSMS as the runtime. Using stencil computation as a typical processing method, the performance and programmability of SMint have been compared with those of XcalableMP and Unified Parallel C, which are well-known examples of PGAS languages based on the C language. It was found that SMint achieved the best performance under the ideal global-view programming model.

关键词： PGAS directive-based language API global-view programming global address space parallel language software distributed shared memory cluster shared memory programming multi-node processing programming model

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：