A large scale simulation for polymer chains in good solvent is performed. The implementation technique for efficient parallel execution, optimization, and load-balancing are discussed on this practical application. Fi...
详细信息
A large scale simulation for polymer chains in good solvent is performed. The implementation technique for efficient parallel execution, optimization, and load-balancing are discussed on this practical application. Finally a simple performance model is proposed.
A myriad of problems in science and engineering, involve the solution of sparse triangular linear systems. They arise frequently as part of direct and iterative solvers for linear systems and eigenvalue problems, and ...
详细信息
ISBN:
(纸本)9781538643686
A myriad of problems in science and engineering, involve the solution of sparse triangular linear systems. They arise frequently as part of direct and iterative solvers for linear systems and eigenvalue problems, and hence can be considered as a key building block of sparse numerical linear algebra. This is why, since the early days, their parallel solution has been exhaustively studied, and efficient implementations of this kernel can be found for almost every hardware platform. In the GPU context, the most widespread implementation of this kernel is the one distributed in NVIDIA CUSPARSE library, which relies on a preprocessing stage to aggregate the unknowns of the triangular system into level sets. This determines an execution schedule for the solution of the system, where the level sets have to be processed sequentially while the unknowns that belong to one level set can be solved in parallel. One of the disadvantages of the CUSPARSE implementation is that this preprocessing stage is often extremely slow in comparison to the runtime of the solving phase. In this work, we present a parallel GPU algorithm that is able to compute the same level sets as CUSPARSE but takes significantly less runtime. Our experiments on a set of matrices from the SuiteSparse collection show acceleration factors of up to 44x. Additionally, we provide a routine capable of solving a triangular linear system on the same pass used to calculate the level sets, yielding important performance benefits.
In order to obtain efficiency, current practice in distributed software systems design often suffers from a lack of abstraction. An object-oriented design technique based on UML notations and a special type of high-le...
详细信息
ISBN:
(纸本)0769506348
In order to obtain efficiency, current practice in distributed software systems design often suffers from a lack of abstraction. An object-oriented design technique based on UML notations and a special type of high-level Petri-Nets is used to demonstrate how designs can be kept sufficiently abstract to be platform independent and re-usable but still support design alternatives and their evaluation w.r.t. availability and principle system performance.
Scientific applications are increasingly complex and domain specific, and the underlying architectures of the parallel and distributedsystems on which they are executed also continue to grow in complexity. As these h...
详细信息
ISBN:
(纸本)9781538655559
Scientific applications are increasingly complex and domain specific, and the underlying architectures of the parallel and distributedsystems on which they are executed also continue to grow in complexity. As these high performance parallel and distributed computing applications and environments continue to grow both in complexity and computing power, there is an increasing financial cost associated with both the acquisition and maintenance of those systems. Therefore, the ability to model the performance of these applications and systems before and during their development and deployment to guide cost-effective decisions about their resources and configurations is highly important to the designers of those applications and systems. Performance Evaluation Process Algebra (PEPA) is a modeling language and framework for modeling parallel and distributed computing and communication applications and systems, and numerous examples are present in the literature where PEPA has been utilized to model these systems for evaluating or predicting their performance using various metrics, including throughput, utilization, and robustness. Since its development, the PEPA modeling framework has been expanded to model biological systems and networks (Bio-PEPA), and massive (on the order of similar to 10(129) components) homogeneous systems with Grouped PEPA (GPEPA). PEPA and its derivatives are implemented in a variety of ways, ranging from plug-ins integrated with the Eclipse integrated development environment to standalone command-line based interpreters, each with their own unique and often challenging installation and configuration requirements. To help enable other researchers to more easily utilize these frameworks and facilitate increased and robust reproducibility across end-user platforms, we present and make available containerized versions of a number of these PEPA frameworks. We have validated the functionality of these containers by testing them with models available f
Energy consumption is a critical issue in parallel and distributed embedded systems. We present a novel algorithm for energy efficient scheduling of Directed Acyclic Graph (DAG) based applications on Dynamic Voltage S...
详细信息
ISBN:
(纸本)9781424416936
Energy consumption is a critical issue in parallel and distributed embedded systems. We present a novel algorithm for energy efficient scheduling of Directed Acyclic Graph (DAG) based applications on Dynamic Voltage Scaling (DVS) enabled systems. Experimental results show that our algorithm provides near optimal solutions for energy minimization with considerably smaller computational time and memory requirements compared to an existing algorithm that provides near optimal solutions.
Replication is a common method for increasing the availability of data in a distributed environment. Our interest is in the application of replication techniques in the domain of parallel processing. This paper explor...
详细信息
ISBN:
(纸本)0818626720
Replication is a common method for increasing the availability of data in a distributed environment. Our interest is in the application of replication techniques in the domain of parallel processing. This paper explores the issues concerning degree of replication and granularity in the context of a distributed and highly available Linda tuple space. In particular, we study the performance effects of varying the number of replicas and the granularities of replication and concurrency control. Traditionally, when using replication in databases, the granularity of replication and that of concurrency control have been the same (at the file level, for example). This is not an inherent requirement however. In this paper we show by detailed simulation of a replicated Linda tuple space that it is useful to separate the two granularities and that it is an important design issue especially in parallel processing systems.
In this paper, the idea of operating parallel inverters to mimic the dynamic stability of a synchronous generator (SG) is investigated when the inertia and damping constants are differed. These inverters are virtual s...
详细信息
ISBN:
(纸本)9781538667057
In this paper, the idea of operating parallel inverters to mimic the dynamic stability of a synchronous generator (SG) is investigated when the inertia and damping constants are differed. These inverters are virtual synchronous machines (VSM) due to the replication of the inertial dynamics inherent to the SGs. Instead of using a conventional Phase-Lock Loop (PLL) in order to synchronize distributed generation (DG) to the grid frequency, the swing equation inherent to SG dynamics implemented. parallel VSM controlled inverters can have behaviors based on the constants of their individual swing equation. This can cause the phase angles to differ beyond IEEE synchronization limits. The proposed algorithm is implemented to correct the phase angle and return parallel VSMs to acceptable operating ranges. Simulation results are performed in PSCAD-EMTDC simulation environment.
As semiconductor design approaches physical limits, computer processing speeds are stagnating. This poses significant challenges for traffic simulations, which are becoming more and more computationally demanding. To ...
详细信息
ISBN:
(纸本)9798350369205;9798350369199
As semiconductor design approaches physical limits, computer processing speeds are stagnating. This poses significant challenges for traffic simulations, which are becoming more and more computationally demanding. To maintain fast execution times while accommodating more complex simulations, it is essential to utilize the parallel computing capabilities of modern hardware. This paper discusses the need for an updated architectural design in the MATSim traffic simulation framework to take advantage of parallel computing infrastructures. We introduce a prototype that adapts the existing traffic simulation logic to a distributedparallel algorithm. Extensive benchmarks have been conducted to evaluate the prototype's performance and identify its limitations. The results demonstrate that the prototype performs up to 100 times faster than the current implementation. Based on these findings, we advocate for the integration of a distributed traffic simulation within the MATSim framework and outline necessary steps to enhance the prototype.
Symbolic computing is one of fastest growing areas of scientific computing. An overview of the state-of-the-art in symbolic computations on distributed architectures, in particular Web and Grid architectures, is prese...
详细信息
ISBN:
(纸本)9780769529172
Symbolic computing is one of fastest growing areas of scientific computing. An overview of the state-of-the-art in symbolic computations on distributed architectures, in particular Web and Grid architectures, is presented. The background information, including typical application areas, is followed by a list of past and on-going projects involving symbolic computations on distributed computing environments. To illustrate in more details issues involved in porting computer algebra systems to the Grid, some case studies involving popular environments are presented.
暂无评论