检索结果-内蒙古大学图书馆

26th ACM SIGPLAN Symposium on Principles and Practice of parallel programming, PPoPP 2021

作者： Voss, Caleb Sarkar, Vivek Georgia Institute of Technology United States

ISBN: (纸本)9781450382946

Task-parallel programs often enjoy deadlock freedom under certain restrictions, such as the use of structured join operations, as in Cilk and X10, or the use of asynchronous task futures together with deadlock-avoiding policies such as Known Joins or Transitive Joins. However, the promise, a popular synchronization primitive for parallel tasks, does not enjoy deadlock-freedom guarantees. Promises can exhibit deadlock-like bugs;however, the concept of a deadlock is not currently well-defined for promises. To address these challenges, we propose an ownership semantics in which each promise is associated to the task which currently intends to fulfill it. Ownership immediately enables the identification of bugs in which a task fails to fulfill a promise for which it is responsible. Ownership further enables the discussion of deadlock cycles among tasks and promises and allows us to introduce a robust definition of deadlock-like bugs for promises. Cycle detection in this context is non-trivial because it is concurrent with changes in promise ownership. We provide a lock-free algorithm for precise runtime deadlock detection. We show how to obtain the memory consistency criteria required for the correctness of our algorithm under TSO and the Java and C++ memory models. An evaluation compares the execution time and memory usage overheads of our detection algorithm on benchmark programs relative to an unverified baseline. Our detector exhibits a 12% (1.12×) geometric mean time overhead and a 6% (1.06×) geometric mean memory overhead, which are smaller overheads than in past approaches to deadlock cycle detection. © 2021 ACM.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

High performance computers: From parallel computing to quantum computers and biocomputers 2

High performance computers: From parallel computing to quant...

引用

2nd International Scientific Conference on Metrological Support of Innovative Technologies, ICMSIT II-2021

作者： Yerlanova, G. Serik, M. Kopyltsov, A. Department of Computer Science Eurasian National University 2 Satpayeva Street Nur-Sultan Kazakhstan Physics Department Saint Petersburg State University of Aerospace Instrumentation 67 BolshayaMorskaya Street St. Petersburg Russia

Various programming methods are considered. Particular attention is paid to parallel programming, quantum computers and biocomputers. This attention is due to the fact that in recent years, high-performance computing has been intensively developing. One of the main ideas for increasing the speed of information processing is to carry out calculations in parallel. For classical programming methods this is achieved thanks to the advent of multiprocessor computers. Such computers allow computational tasks to be parallelized by introducing parallelization elements into classical programming languages. Another approach to speed up computation is based on the idea of a quantum computer. The use of qubits in quantum computers leads to the fact that all possible states of the system are simultaneously processed. Another approach leading to increased computing performance is based on the development of biocomputers. This approach is based on the idea of using DNA chains consisting of a sequence of four nitrogenous bases (adenine, guanine, thymine, and cytosine). The information is stored and processed as a sequence of these nitrogenous bases. An increase in the speed of calculations is carried out due to the fact that biochemical reactions can take place simultaneously on different parts of the DNA - chains. © Published under licence by IOP Publishing Ltd.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Scaling implicit parallelism via dynamic control replication 21

Scaling implicit parallelism via dynamic control replication

引用

26th ACM SIGPLAN Symposium on Principles and Practice of parallel programming, PPoPP 2021

作者： Bauer, Michael Lee, Wonchan Slaughter, Elliott Jia, Zhihao Di Renzo, Mario Papadakis, Manolis Shipman, Galen McCormick, Patrick Garland, Michael Aiken, Alex NVIDIA United States Slac National Accelerator Laboratory United States Carnegie Mellon University United States Sapienza University of Rome Italy Los Alamos National Laboratory United States Stanford University United States

ISBN: (纸本)9781450382946

We present dynamic control replication, a run-time program analysis that enables scalable execution of implicitly parallel programs on large machines through a distributed and efficient dynamic dependence analysis. Dynamic control replication distributes dependence analysis by executing multiple copies of an implicitly parallel program while ensuring that they still collectively behave as a single execution. By distributing and parallelizing the dependence analysis, dynamic control replication supports efficient, on-the-fly computation of dependences for programs with arbitrary control flow at scale. We describe an asymptotically scalable algorithm for implementing dynamic control replication that maintains the sequential semantics of implicitly parallel programs. An implementation of dynamic control replication in the Legion runtime delivers the same programmer productivity as writing in other implicitly parallel programming models, such as Dask or TensorFlow, while providing better performance (11.4X and 14.9X respectively in our experiments), and scalability to hundreds of nodes. We also show that dynamic control replication provides good absolute performance and scaling for HPC applications, competitive in many cases with explicitly parallel programming systems. © 2021 ACM.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Study of the Algorithms Information Structure as the Basis of a Training Workshop 7th

Study of the Algorithms Information Structure as the Bas...

引用

7th Russian Supercomputing Days Conference, RuSCDays 2021

作者： Antonov, Alexander Volkov, Nikita Lomonosov Moscow State University Moscow Russia Moscow Center of Fundamental and Applied Mathematics Moscow Russia "TESIS" Company Moscow Russia

ISBN: (纸本)9783030928636

The study of the algorithms parallel structure is becoming increasingly important for any specialists dealing with high-performance computing. Theoretical information on this topic is included in various training courses of many higher educational institutions. However, the usual form of practical training is only the execution of tasks for the parallel implementation of specific algorithms. In the course "Supercomputing Simulation and Technologies" at the Faculty of Computational Mathematics and Cybernetics at Lomonosov Moscow State University, we have proposed a new type of practical task related to the study, description and visualization of the algorithms parallel structure. Using the AlgoView visualization system developed by the authors, the study of the information structure can be carried out without access to high-performance computing systems. The same approach is planned to be used in the AlgoWiki Open encyclopedia of parallel algorithmic features. © 2021, Springer Nature Switzerland AG.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

parallel Implementation of Luhn’s Algorithm for Credit Card Validation Using MPI and CUDA 8th

Parallel Implementation of Luhn’s Algorithm for Credit Card...

引用

8th International Conference on Frontiers of Intelligent Computing: Theory and Applications, FICTA 2020

作者： Kudva, P. Karthik G. Shreyas, M.L. Rao, B. Ashwath Rai, Shwetha Kini, N. Gopalakrishna Department of Computer Science and Engineering Manipal Institute of Technology Manipal Academy of Higher Education ManipalKarnataka576104 India

ISBN: (纸本)9789811557873

The Luhn’s algorithm is the first line of defense in various e-commerce sites and is utilized to validate credit card numbers. With increase in usage of credit cards validation process also needs to be faster. This fast processing is achievable by parallel processing. This paper intends to make use of MPI and CUDA programming to enhance the computation time of Luhn’s algorithm of validation of multiple credit cards in parallel. © 2021, The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Contextual contracts for component-oriented resource abstraction in a cloud of high performance computing services

Contextual contracts for component-oriented resource abstrac...

引用

作者： de Carvalho Junior, Francisco Heron Al-Alam, Wagner Guimarães de Oliveira Dantas, Allberson B. Pós-Graduação em Ciência da Computação Universidade Federal do Ceará Fortaleza Brazil Campus de Quixadá Universidade Federal do Ceará Quixadá Brazil Instituto de Educação a Distância Universidade da Integração Internacional da Lusofonia Afro-Brasileira Redenção Brazil

Efforts to support high performance computing (HPC) applications' requirements in the context of cloud computing have motivated us to design HPC Shelf, a cloud computing services platform to build and deploy large-scale parallel computing systems. We introduce Alite, the contextual contract system of HPC Shelf, to select component implementations according to requirements of the host application, target parallel computing platform characteristics (e.g., clusters and MPPs), quality of service (QoS) properties, and cost restrictions. It is evaluated through a small-scale case study employing two complementary component-based frameworks. The first one aims to represent components that implement linear algebra computations based on the BLAS interface. In turn, the second one aims to represent parallel computing platforms on the IaaS cloud offered by Amazon EC2 Service. © 2021 John Wiley & Sons, Ltd.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

The Future of High-Performance Computing 17

The Future of High-Performance Computing

引用

17th International Computer Engineering Conference, ICENCO 2021

作者： Zahran, Mohamed New York University Computer Science Department New YorkNY United States

ISBN: (纸本)9781728164489

We are witnessing several factors in computing that offers as much opportunities as challenges. We are witnessing the end of Moore's law, almost two decades after the death of Dennard's scaling. Exascale computing is within reach but the last few steps are the most difficult. For instance, what is the best programming model in the exascale era? How can we make progress toward higher performance without Moore's law? In this article, we discuss the future of high-performance computing in the post-Moore era from hardware and software perspectives. This article is a glimpse to the near future and raises many research *** are witnessing several factors in computing that offers as much opportunities as challenges. We are witnessing the end of Moore's law, almost two decades after the death of Dennard's scaling. Exascale computing is within reach but the last few steps are the most difficult. For instance, what is the best programming model in the exascale era? How can we make progress toward higher performance without Moore's law? In this article, we discuss the future of high-performance computing in the post-Moore era from hardware and software perspectives. This article is a glimpse to the near future and raises many research questions. © 2021 IEEE.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Improvement of load balancing in shared-memory multiprocessor systems 1st

Improvement of load balancing in shared-memory multiprocesso...

引用

1st International Conference on Intelligent and Cloud Computing, ICICC 2019

作者： Deeb, Hasan Sarangi, Archana Sarangi, Shubhendu Kumar Department of Computer Science and Engineering ITER Siksha ‘O’ Anusandhan Deemed to be University BhubaneswarOdisha India Department of Electronics and Instrumentation Engineering ITER Siksha ‘O’ Anusandhan Deemed to be University BhubaneswarOdisha India

ISBN: (纸本)9789811562013

parallel programming is one of the most effective approaches to handle complex problems regarding time complexity by reducing computation time, by getting the most of the capacity of the processors and shared-memory or distributed systems. One of the main ingredients of parallel programming is ‘Loops’ and especially DOALL loops. All loop scheduling algorithms try to achieve load balancing by using chunk resizing techniques (decreasing, increasing…). In this paper, prior loop scheduling algorithms will be evaluated on Mandelbrot. A new algorithm is obtained by merging the increasing and decreasing chunk size techniques in order to acquire the advantages of both approaches. The experimental results show that in the decreasing approach, a large number of small chunks in the last stages will increase the scheduling overhead because of the increasing of inter-processor communication. Also, the large chunk size of the initial stages will increase load imbalance. On the other hand, for the increasing approach, the large chunks assigned to processors at the last stages might increase load imbalance especially when iterations of the last stages are more time consuming than others. This work will introduce a new approach. This approach will try to minimize the load imbalance and communication overhead that are caused by using a decreasing chunk size approach and to minimize the load imbalance and scheduling overhead caused by the use of increase chunk size approach, which is going to provide better performance for many workload patterns. © Springer Nature Singapore Pte Ltd 2021.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

High-Efficiency Specialized Support for Dense Linear Algebra Arithmetic in LuNA System 16th

High-Efficiency Specialized Support for Dense Linear Algebra...

引用

16th International Conference on parallel Computing Technologies, PaCT 2021

作者： Belyaev, Nikolay Perepelkin, Vladislav Institute of Computational Mathematics and Mathematical Geophysics SB RAS Novosibirsk Russia Novosibirsk State University Novosibirsk Russia

ISBN: (纸本)9783030863586

Automatic synthesis of efficient scientific parallel programs for supercomputers is in general a complex problem of system parallel programming. Therefore various specialized synthesis algorithms and heuristics are of use. LuNA system for automatic construction of distributed parallel programs provides a basis for accumulation of such algorithms to provide high-quality parallel programs generation in particular subject domains. If no specialized support is available in LuNA for given input, then the general synthesis algorithm is used, which does construct the required program, but its efficiency may be unsatisfactory. In the paper a specialized run-time system for LuNA is presented, which provides runtime support for dense linear algebra operations implementation on distributed memory multicomputers. Experimental results demonstrate, that automatically generated parallel programs of the class outperform corresponding ScaLAPACK library subroutines, which makes LuNA system practically applicable for generating high performance distributed parallel programs for supercomputers in the dense linear algebra application class. © 2021, Springer Nature Switzerland AG.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Towards High-Performance Code Generation for Multi-GPU Clusters Based on a Domain-Specific Language for Algorithmic Skeletons

引用

INTERNATIONAL JOURNAL OF parallel programming 2020年第4期48卷 713-728页

作者： Wrede, Fabian Kuchen, Herbert Univ Munster European Res Ctr Informat Syst ERCIS Dept Informat Syst Leonardo Campus 3 D-48149 Munster Germany

In earlier work, we defined a domain-specific language (DSL) with the aim to provide an easy-to-use approach for programming multi-core and multi-GPU clusters. The DSL incorporates the idea of utilizing algorithmic skeletons, which are well-known patterns for parallel programming, such as map and reduce. Based on the chosen skeleton, a user-defined function can be applied to a data structure in parallel with the main advantage that the user does not have to worry about implementation details. So far, we had only implemented a generator for multi-core clusters and in this paper we present and evaluate two prototypes of generators for multi-GPU clusters, which are based on OpenACC and CUDA. We have evaluated the approach with four benchmark applications. The results show that the generation approach leads to execution times, which are on par with an alternative library implementation.

关键词： Algorithmic skeletons parallel programming High-performance computing Model-driven development Domain-specific language

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：