To efficiently perform collective communications in current high-performance computing systems is a time-consuming task. With future exascale systems, this communication time will be increased further. However, global...
详细信息
ISBN:
(纸本)9781728101767
To efficiently perform collective communications in current high-performance computing systems is a time-consuming task. With future exascale systems, this communication time will be increased further. However, global information is frequently required in various physical models. By exploiting domain knowledge of the model behaviors globally needed information can be distributed more efficiently, using only peer-to-peer communication which spread the information to all processes asynchronous during multiple communication steps. In this article, we introduce a multi-hop based Manhattan Street Network (MSN) for global information exchange and show the conditions under which a local neighbor exchange is sufficient for exchanging distributed information. Besides the MSN, in various models, global information is only needed in a spatially limited region inside the simulation domain. Therefore, a second network is introduced, the local exchange network, to exploit this spatial assumption. Both non-collective global exchange networks are implemented in the massively parallel NAStJA framework. Based on two models, a phase-field model for droplet simulations and the cellular Potts model for biological tissue simulations, we exemplary demonstrate the wide applicability of these networks. Scaling tests of the networks demonstrate a nearly ideal scaling behavior with an efficiency of over 90%. Theoretical prediction of the communication time on future exascale systems shows an enormous advantage of the presented exchange methods of O(1) by exploiting the domain knowledge.
The proceedings contain 10 papers. The special focus in this conference is on. The topics include: Analysis of Mixed Workloads from Shared Cloud Infrastructure;Tuning EASY-Backfilling Queues;don’t Hurry Be Happy: A D...
ISBN:
(纸本)9783319773971
The proceedings contain 10 papers. The special focus in this conference is on. The topics include: Analysis of Mixed Workloads from Shared Cloud Infrastructure;Tuning EASY-Backfilling Queues;don’t Hurry Be Happy: A Deadline-Based Backfilling Approach;Supporting Real-Time Jobs on the IBM Blue Gene/Q: simulation-Based Study;towards Efficient Resource Allocation for distributed Workflows Under Demand Uncertainties;programmable In Situ System for Iterative Workflows;A Data Structure for Planning Based Workload Management of Heterogeneous HPC Systems;ScSF: A Scheduling simulation Framework.
The optimization of performance of complex simulation codes with high computational demands, such as Octo-Tiger, is an ongoing challenge. Octo-Tiger is an astrophysics code simulating the evolution of star systems bas...
详细信息
ISBN:
(纸本)9781450364393
The optimization of performance of complex simulation codes with high computational demands, such as Octo-Tiger, is an ongoing challenge. Octo-Tiger is an astrophysics code simulating the evolution of star systems based on the fast multipole method using adaptive octrees as the central data structure. Octo-Tiger was implemented using high-level C++ libraries, specifically HPX and Vc, which allows its use on different hardware platforms. Recently, we have demonstrated excellent scalability in a distributed setting. In this paper, we study the node-level performance of Octo-Tiger on an Intel Knights Landing platform. We focus on Octo-Tiger's fast multipole method, as it is the computationally most demanding component. By using HPX and a futurization approach, we can efficiently traverse the adaptive octrees in parallel. On the core-level, threads process sub-grids using multiple 1074-element stencils. In numerical experiments, simulating the time evolution of a rotating star on an Intel Xeon Phi 7250 Knights Landing processor, Octo-Tiger shows good parallel efficiency. For the fast multipole algorithm, we achieved up to 408 GFLOPS, resulting in a speedup of 2x compared to a 24-core Skylake-SP platform, using the same high-level abstractions.
The earthquake process leads to the destruction of building and has a significant influence on human life. Nowadays there is no methods to forecast it with the 100 % precision. Scientists around the world make efforts...
详细信息
Synchronization aspects in the method of large-scale simulation, knovvn as parallel discrete event simulation (PDES), analyzed using the models of the time profile evolutions. Time profile is formed vvith the local vi...
详细信息
Simulated and experimental data processing is an important issue in modern high-energy physics experiments. High interaction rate and particle multiplicity in addition to the long sequential processing time of million...
详细信息
Theoretical and experimental investigations of water vapor interaction with porous materials are very needful for various fields of science and technology. Mathematical modelling plays an important role in these inves...
详细信息
This paper proposes an innovative algorithm for running in a distributed and parallel fashion the well-known Finite-Difference Time-Domain method. Given the dependence among data, for the proposed distributed version,...
详细信息
The proceedings contain 16 papers. The topics discussed include: computer simulation of phase transition in antiferromagnetic systems with long-range effect;computer simulation of semi-infinite antiferromagnetics isin...
The proceedings contain 16 papers. The topics discussed include: computer simulation of phase transition in antiferromagnetic systems with long-range effect;computer simulation of semi-infinite antiferromagnetics ising models critical behavior;the communities search on a pixels set of the image;image clustering by means of an artificial neural network;automatic method of the associative rules set analysis in a public organizations activities research;anomaly detection in event log of OS Windows;detection of plagiarism of the software source code using the mathematical metrics of complexity;the class of T-share inequalities for the service schedules of requirements by parallel devices polytope;the Blom's scheme modification for discretionary access control;realization of simplex channels in the distributed systems on the basis of the Blom's preliminary distribution of keys scheme;detection the stegoinsertions like LSB-substitution in bitmap images;robust algorithm of embedding of digital water marks in a video stream;role-based access control metamodel;local optimization of the role-based access control policy;analysis of formal concepts for combining user rights in computer systems;on the application of the access control policy to the threshold key exchange protocol based on the Blakley's scheme of secret sharing;and on the estimation of the damage from the leakage of permissions in a role-based security model.
暂无评论