检索结果-内蒙古大学图书馆

Hands-On Learning: Teaching parallel and distributed computing through Unplugged Activities in Undergraduate CS Courses

Hands-On Learning: Teaching Parallel and Distributed Computi...

引用

2024 Workshops of the International Conference for High Performance computing, Networking, Storage and Analysis, SC Workshops 2024

作者： Dasgupta, Anurag Margapuri, Venkat Shamoun, Simon Taneja, Shubbhi Toups, Matthew Valdosta State University Computer Science and Engineering Technology ValdostaGA31698 United States Villanova University Department of Computing Sciences VillanovaPA19085 United States Hofstra University Department of Computer Science HempsteadNY11549 United States Worcester Polytechnic Institute Department of Computer Science WorcesterMA01609 United States Tulane University Computer Science Department New OrleansLA70118 United States

ISBN: (纸本)9798350355543

The authors present and evaluate an unplugged activity to introduce parallel computing concepts to undergraduate students. Students in five CS classrooms used a deck of playing cards in small groups to consider how parallelization can improve performance and how improvement decreases with increased parallelization. Before and after the activity, students took a short survey about their solution and their ideas about parallelism. The authors carried out this activity in seven courses at five institutions in the 2023-2024 academic year. The results showed that students had an increased appreciation for parallelization and this type of activity. © 2024 IEEE.

关键词： computer science CS CS education hands-on learning parallel distributed computing unplugged

来源：评论

学校读者我要写书评

暂无评论

parallel distributed numerical simulations in aeronautic applications

引用

APPLIED MATHEMATICAL MODELLING 2006年第8期30卷 714-730页

作者： Alleon, G. Champagneux, S. Chevalier, G. Giraud, L. Sylvand, G. EADS CRC Ctr Toulouse F-31700 Blagnac France CERFACS F-31057 Toulouse France

The numerical simulation plays a key role in industrial design because it enables to reduce the time and the cost to develop new products. Because of the international competition, it is important to have a complete chain of simulation tools to perform efficiently some virtual prototyping. In this paper, we describe two components of large aeronautic numerical simulation chains that are extremely consuming of computer resource. The first is involved in computational fluid dynamics for aerodynamic studies. The second is used to study the wave propagation phenomena and is involved in acoustics. Because those softwares are used to analyze large and complex case studies in a limited amount of time, they are implemented on parallel distributed computers. We describe the physical problems addressed by these codes, the main characteristics of their implementation. For the sake of re-usability and interoperability, these softwares are developed using object-oriented technologies. We illustrate their parallel performance on clusters of symmetric multi-processors. Finally, we discuss some challenges for the future generations of parallel distributed numerical software that will have to enable the simulation of multi-physic phenomena in the context of virtual organizations also known as the extended enterprise. (c) 2005 Elsevier Inc. All rights reserved.

关键词： large scale numerical computations parallel distributed computing computational fluid dynamics computational electromagnetics industrial simulations

来源：评论

学校读者我要写书评

暂无评论

distributed and parallel Ensemble Classification for Big Data Based on Kullback-Leibler Random Sample Partition 20th

Distributed and Parallel Ensemble Classification for Big Dat...

引用

20th International Conference on Algorithms and Architectures for parallel Processing (ICA3PP)

作者： Wei, Chenghao Zhang, Jiyong Valiullin, Timur Cao, Weipeng Wang, Qiang Long, Hao Shenzhen Univ Big Data Inst Coll Comp Sci & Software Engn Shenzhen 518000 Peoples R China Hangzhou Dianzi Univ Sch Automat Hangzhou 311305 Peoples R China Southern Univ Sci & Technol SUSTech Acad Adv Interdisciplinary Studies Shenzhen 518055 Peoples R China

ISBN: (纸本)9783030602451;9783030602444

In this article, we use a Kullback-Leibler random sample partition data model to generate a set of disjoint data blocks, where each block is a good representation of the entire data set. Every random sample partition (RSP) block has a sample distribution function similar to the entire data set. To obtain the statistical measure between them, Kernel Density Estimation (KDE) with a dual-tree recursion data structure is firstly applied to fast estimate the probability density of each block. Then, based on the Kullback-Leibler (KL) divergence measure, we can obtain the statistical similarity between a randomly selected RSP data block and other RSP data blocks. We rank the RSP data blocks according to their divergence values in descending order and choose the first ten for an ensemble classification learning. The classification models are established in parallel for the selected RSP data blocks and the final ensemble classification model is obtained by the weighted voting ensemble strategy. The experiments were conducted by building XGboost model based on those ten blocks in parallel, and we incrementally ensemble them according to their KL values. The testing classification results show that our method can increase the generalization capability of the ensemble classification model. It could reduce the model building time in parallel computation environment by using less than 15% of the entire data, which could also solve the memory constraints of big data analysis.

关键词： Big data analysis Approximate computing Random sample partition Ensemble classification parallel distributed computing

来源：评论

学校读者我要写书评

暂无评论

ParaDist-HMM: A parallel distributed Implementation of Hidden Markov Model for Big Data Analytics using Spark

引用

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS 2021年第4期12卷 289-303页

作者： Sassi, Imad Anter, Samir Bekkhoucha, Abdelkrim Hassan II Univ FSTM Comp Sci Lab LIM Casablanca Morocco

Big Data is an extremely massive amount of heterogeneous and multisource data which often requires fast processing and real time analysis. Solving big data analytics problems needs powerful platforms to handle this enormous mass of data and efficient machine learning algorithms to allow the use of big data full potential. Hidden Markov models are statistical models, rich and widely used in various fields especially for time varying data sequences modeling and analysis. They owe their success to the existence of many efficient and reliable algorithms. In this paper, we present ParaDist-HMM, a parallel distributed implementation of hidden Markov model for modeling and solving big data analytics problems. We describe the development and the implementation of the improved algorithms and we propose a Spark-based approach consisting in a parallel distributed big data architecture in cloud computing environment, to put the proposed algorithms into practice. We evaluated the model on synthetic and real financial data in terms of running time, speedup and prediction quality which is measured by using the accuracy and the root mean square error. Experimental results demonstrate that ParaDist-HMM algorithms outperforms other implementations of hidden Markov models in terms of processing speed, accuracy and therefore in efficiency and effectiveness.

关键词： Big data machine learning Hidden Markov model forward backward baum-welch parallel distributed computing spark cloud computing ParaDist-HMM

来源：评论

学校读者我要写书评

暂无评论

Making applications faster by asynchronous execution: Slowing down processes or relaxing MPI collectives

引用

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 2023年 148卷 472-487页

作者： Afzal, Ayesha Hager, Georg Markidis, Stefano Wellein, Gerhard Erlangen Natl High Performance Comp Ctr NHR FAU D-91058 Erlangen Germany Friedrich Alexander Univ Erlangen Nurnberg Dept Comp Sci D-91058 Erlangen Germany KTH Royal Inst Technol Dept Comp Sci S-11428 Stockholm Sweden

Comprehending the performance bottlenecks at the core of the intricate hardware-software inter-actions exhibited by highly parallel programs on HPC clusters is crucial. This paper sheds light on the issue of automatically asynchronous MPI communication in memory-bound parallel programs on multicore clusters and how it can be facilitated. For instance, slowing down MPI processes by deliberate injection of delays can improve performance if certain conditions are met. This leads to the counter-intuitive conclusion that noise, independent of its source, is not always detrimental but can be leveraged for performance improvements. We employ phase-space graphs as a new tool to visualize parallel program dynamics. They are useful in spotting certain patterns in parallel execution that will easily go unnoticed with traditional tracing tools. We investigate five different microbenchmarks and applications on different supercomputer platforms: an MPI-augmented STREAM Triad, two implementations of Lattice-Boltzmann fluid solvers (D3Q19 and SPEChpc D2Q37), the LULESH and HPCG proxy applications.& COPY;2023 Elsevier B.V. All rights reserved.

关键词： parallel distributed computing Data analytic techniques MPI collectives Asynchronous MPI execution Resource scalability and bottleneck

来源：评论

学校读者我要写书评

暂无评论

Performance analysis of synchronous and asynchronous distributed genetic algorithms on multiprocessors

引用

SWARM AND EVOLUTIONARY COMPUTATION 2019年 49卷 147-157页

作者： Abdelhafez, Amr Alba, Enrique Luque, Gabriel Univ Malaga Dept Lenguajes & Ciencias Comp ETS Ingn Informat Campus Teatinos E-29071 Malaga Spain Assiut Univ Fac Sci Assiut 71515 Egypt

Because of their effectiveness and flexibility in finding useful solutions, Genetic Algorithms (GM) are very popular search techniques for solving complex optimization problems in scientific and industrial fields. parallel GM (PGAs), and especially distributed ones have been usually presented as the way to overcome the time-consuming shortcoming of sequential GM. In the case of applying PGAs, we can expect better performance, the reason being the exchange of knowledge during the parallel search process. The resulting distributed search is different compared to what sequential pansnictic GM do, then deserving additional studies. This article presents a performance study of three different PGAs. Moreover, we investigate the effect of synchronizing communications over modern shared-memory multiprocessors. We consider the master-slave model along with synchronous and asynchronous distributed GM (dGAs), presenting their different designs and expected similarities when running in a number of cores ranging from one to 32 cores. The master-slave model showed a competitive numerical effort versus the other dGAs and demonstrated to be able to scale-up well over multiprocessors. We describe how the speed-up and parallel performance of the dGAs is changing as the number of cores enlarges. Results of the island model show that synchronous and asynchronous dGAs have different numerical performances on a multiprocessor, the asynchronous algorithm having a faster execution, thus more attractive for time demanding applications. Our results and statistical analyses help in developing a novel body of knowledge on PGAs running in shared memory multiprocessors (versus overwhelming literature oriented to distributed memory clusters), something useful for researchers, beginners, and final users of these techniques.

关键词： parallel distributed computing Genetic algorithms Synchronization MPI Speed-up

来源：评论

学校读者我要写书评

暂无评论

Design and application of Parsim - A message-passing computer simulator

引用

IEE PROCEEDINGS-COMPUTERS AND DIGITAL TECHNIQUES 1997年第1期144卷 7-14页

作者： Symons, A Narasimhan, VL DEF SCI & TECHNOL ORG INFORMAT MANAGEMENT GRPDIV INFORMAT TECHNOLSALISBURYSA 5108AUSTRALIA

Currently many interconnection networks and parallel algorithms exist for message-passing computers. Users of these machines which to determine which message-passing computer is best for a given job, and how it will scale with the number of processors and algorithm size. The paper describes a general purpose simulator for message-passing multiprocessors (Parsim), which facilitates system modelling. A structured method for simulator design has been used which gives Parsim the ability to simulate different topologies and algorithm combinations easily. This is illustrated by applying Parsim to a number of algorithms on a variety of topologies. Parsim is then used to predict the performance of the new IBM SP2 parallel computer, with topologies ranging up to 1024 processors.

关键词： parallel distributed computing simulation hypercube IBM SP2 transport optimisation transputer mesh performance parameters

来源：评论

学校读者我要写书评

暂无评论

Toward performance-driven system support for distributed computing in clustered environments

引用

JOURNAL OF parallel AND distributed computing 1999年第2期59卷 132-154页

作者： Cruz, J Park, K Purdue Univ Dept Comp Sci Network Syst Lab W Lafayette IN 47907 USA

With the proliferation of workstation clusters connected by high-speed networks. providing efficient system support for concurrent applications engaging in nontrivial interaction has become an important problem. Two, principal barriers to harnessing parallelism are: (1) efficient mechanisms that achieve transparent dependency maintenance while preserving semantic correctness and (2) scheduling algorithms that match coupled processes to distributed resources while explicitly incorporating their communication costs. This paper describes a set of performance features and their properties and implementation in a system support environment called DUNES that achieves transparent dependency maintenance-IPC, file access, memory access, process creation termination, process relationships-under dynamic load balancing. The two principal performance features are push/pull-based active and passive end-point caching and communication-sensitive load balancing. Collectively, they mitigate the overhead introduced by the transparent dependency maintenance mechanisms. Communication-sensitive load balancing. in addition, affects the scheduling of distributed resources to application processes where both communication and computation costs are explicitly taken into account. DUNES' architecture endows commodity operating systems with distributed operating system functionality while achieving transparency with respect to their existing application base. DUNES also preserves semantic correctness with respect to single processor semantics. We show performance measurements of a UNIX-based implementation on Spare and x86 architectures over high-speed LAN environments. We show that significant performance gains in terms of system throughput and parallel application speedup are achievable. (C) 1999 Academic Press.

关键词： distributed operating systems communication-sensitive load balancing workstation networks process migration parallel distributed computing

来源：评论

学校读者我要写书评

暂无评论

Algebraic two-level preconditioners for the Schur complement method

引用

SIAM JOURNAL ON SCIENTIFIC computing 2001年第6期22卷 1987-2005页

作者： Carvalho, LM Giraud, L Le Tallec, P UERJ Inst Matemat & Estatist Rio De Janeiro Brazil CERFACS F-31057 Toulouse France Ecole Polytech F-91128 Palaiseau France

The solution of elliptic problems is challenging on parallel distributed memory computers since their Green's functions are global. To address this issue, we present a set of preconditioners for the Schur complement domain decomposition method. They implement a global coupling mechanism, through coarse-space components, similar to the one proposed in [Bramble, Pasciak, and Shatz, Math. Comp., 47 (1986), pp. 103-134]. The definition of the coarse-space components is algebraic;they are defined using the mesh partitioning information and simple interpolation operators. These preconditioners are implemented on distributed memory computers without introducing any new global synchronization in the preconditioned conjugate gradient iteration. The numerical and parallel scalability of those preconditioners are illustrated on two-dimensional model examples that have anisotropy and/or discontinuity phenomena.

关键词： domain decomposition two-level preconditioning Schur complement parallel distributed computing elliptic partial differential equations

来源：评论

学校读者我要写书评

暂无评论

The Role of Idle Waves, Desynchronization, and Bottleneck Evasion in the Performance of parallel Programs

引用

IEEE TRANSACTIONS ON parallel AND distributed SYSTEMS 2023年第2期34卷 623-638页

作者： Afzal, Ayesha Hager, Georg Wellein, Gerhard Friedrich Alexander Univ Erlangen Nurnberg Erlangen Natl High Performance Comp Ctr NHRFAU D-91054 Erlangen Germany Friedrich Alexander Univ Erlangen Nurnberg Dept Comp Sci D-91054 Erlangen Germany

The performance of highly parallel applications on distributed-memory systems is influenced by many factors. Analytic performance modeling techniques aim to provide insight into performance limitations and are often the starting point of optimization efforts. However, coupling analytic models across the system hierarchy (socket, node, network) fails to encompass the intricate interplay between the program code and the hardware, especially when execution and communication bottlenecks are involved. In this paper we investigate the effect of bottleneck evasionand how it can lead to automatic overlap of communication overhead with computation. Bottleneck evasion leads to a gradual loss of the initial bulk-synchronous behavior of a parallel code so that its processes become desynchronized. This occurs most prominently in memory-bound programs, which is why we choose memory-bound benchmark and application codes, specifically an MPI-augmented STREAM Triad, sparse matrix-vector multiplication, and a collective-avoiding Chebyshev filter diagonalization code to demonstrate the consequences of desynchronization on two different supercomputing platforms. We investigate the role of idle waves as possible triggers for desynchronization and show the impact of automatic asynchronous communication for a spectrum of code properties and parameters, such as saturation point, matrix structures, domain decomposition, and communication concurrency. Our findings reveal how eliminating synchronization points (such as collective communication or barriers) precipitates performance improvements that go beyond what can be expected by simply subtracting the overhead of the collective from the overall runtime.

关键词： Bottleneck desynchronization parallel distributed computing performance modeling performance optimization scalability synchronization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：