We present a suite of algorithms for migrating Lagrangian data between processors in a parallel environment when the underlying mesh is Eulerian. The collection of algorithms applies to both uniform and adaptive meshe...
详细信息
We present a suite of algorithms for migrating Lagrangian data between processors in a parallel environment when the underlying mesh is Eulerian. The collection of algorithms applies to both uniform and adaptive meshes. The algorithms are implemented in, and distributed with, FLASH, a publicly available multiphysics simulation code. Migrating Lagrangian data on an Eulerian mesh is non-trivial because the Eulerian grid points are spatially fixed whereas Lagrangian entities move with the flow of a simulation. Thus, the movement of Lagrangian data cannot use the data migration methods associated with the Eulerian mesh. Additionally, when the mesh is adaptive, as the simulation progresses the grid resolution changes. The resulting regridding process can cause complex Lagrangian data migration. The algorithms presented in this paper describe Lagrangian data movement on a static uniform mesh and on an adaptive octree based block-structured mesh. Some of the algorithms are general enough to be applicable to any block structured mesh, while some others exploit the meta-data and structure of PARAMESH, the adaptive mesh refinement (AMR) package used in FLASH. We also present an analysis of the algorithms' comparative performances in different parallel environments, and different flow characteristics. (C) 2011 Elsevier B.V. All rights reserved.
A new dynamic data structure has been proposed recently in *** are several algorithms for matrix *** none of them has used r-train data structure for storing and multiplying the *** this paper algorithm for matrix mul...
详细信息
A new dynamic data structure has been proposed recently in *** are several algorithms for matrix *** none of them has used r-train data structure for storing and multiplying the *** this paper algorithm for matrix multiplication using r-train for parallel machine has been proposed.
作者:
Kalgin, K. V.Russian Acad Sci
Inst Computat Math & Math Geophys Siberian Branch Pr Akad Lavrenteva 6 Novosibirsk 630090 Russia
An efficient way of how some parallel algorithms of asynchronous cellular automata simulation can be mapped onto the architecture of a modern 32-core computer (4xIntel Xeon X7560) is investigated. An example is a mode...
详细信息
An efficient way of how some parallel algorithms of asynchronous cellular automata simulation can be mapped onto the architecture of a modern 32-core computer (4xIntel Xeon X7560) is investigated. An example is a model of the CO + O = CO2 reaction on the surface of palladium particles.
In the last decade, the hierarchical matrix technique was introduced to deal with dense matrices in an efficient way. It provides a data-sparse format and allows an approximate matrix algebra of nearly optimal complex...
详细信息
In the last decade, the hierarchical matrix technique was introduced to deal with dense matrices in an efficient way. It provides a data-sparse format and allows an approximate matrix algebra of nearly optimal complexity. This paper is concerned with utilizing multiple processors to gain further speedup for the H-matrix algebra, namely matrix truncation, matrix-vector multiplication, matrix-matrix multiplication, and inversion. One of the most cost-effective solution for large-scale computation is distributed computing. Distribute-memory architectures provide an inexpensive way for an organization to obtain parallel capabilities as they are increasingly popular. In this paper, we introduce a new distribution scheme for H-matrices based on the corresponding index set. Numerical experiments applied to a BEM model will complement our complexity analysis.
A large number of optimization problems have been identified as computationally challenging and/or intractable to solve within a reasonable amount of time. Due to the NP-hard nature of these problems, in practice, heu...
详细信息
A large number of optimization problems have been identified as computationally challenging and/or intractable to solve within a reasonable amount of time. Due to the NP-hard nature of these problems, in practice, heuristics account for the majority of existing algorithms. Metaheuristics are one very popular type of heuristics used for many of these optimization problems. In this paper, we present a novel parallel-metaheuristic framework, which effectively enables to devise parallel metaheuristics, particularly with heterogeneous metaheuristics. The core component of the proposed framework is its harmony-search-based coordinator. Harmony search is a recent breed of metaheuristic that mimics the improvisation process of musicians. The coordinator facilitates heterogeneous metaheuristics (forming a parallel metaheuristic) to escape local optima. Specifically, best solutions generated by these worker metaheuristics are maintained in the harmony memory of the coordinator, and they are used to form new-possibly better-harmonies (solutions) before actual solution sharing between workers occurs;hence, their solutions are harmonized with each other. For the applicability validation and the performance evaluation, we have implemented a parallel hybrid metaheuristic using the framework for the task scheduling problem on multiprocessor computing systems (e.g., computer clusters). Experimental results verify that the proposed framework is a compelling approach to parallelize heterogeneous metaheuristics.
In this paper, we describe a parallel algorithm for solving large systems of first order delay differential equations. The algorithm is based on a variable stepsize variable order block method. The method produces two...
详细信息
ISBN:
(纸本)9781467346177;9781467346153
In this paper, we describe a parallel algorithm for solving large systems of first order delay differential equations. The algorithm is based on a variable stepsize variable order block method. The method produces two new approximations in a single integration step. The formulae derivation permits concurrent computation between two processors. The parallel algorithm is implemented by calling the Message Passing Interface (MPI) library. The performance of the sequential and parallel block method is compared with a sequential non-block method. Moreover, the performance of the parallel algorithm is assessed in terms of speedup and efficiency. It is shown from the numerical results that the overall performance of the block method is increased by parallelizing each point in a block.
As the sizes of FPGA device grow, the long run-time of the placement is becoming a great challenge for the FPGA design flow. Simulated annealing is the best-known method applied to this problem due to the good quality...
详细信息
ISBN:
(纸本)9781479927173
As the sizes of FPGA device grow, the long run-time of the placement is becoming a great challenge for the FPGA design flow. Simulated annealing is the best-known method applied to this problem due to the good quality of result (QoR), but its computation time seems not satisfactory. In this paper, we propose a parallel placement algorithm named MPP-SA (Multi-core parallel Placement algorithm based on Simulated Annealing). Our goal is to provide a fast placement algorithm with high QoR. MPP-SA has the same annealing schedule as the traditional simulated annealing, but it uses the parallel approach to move blocks concurrently by multiple threads that are run on different cores of the same processor. To ensure the correctness of the results, MPP-SA also uses synchronization technology and lock mechanism, which brings some overheads. However, experiment results show that these overheads have not seriously affected the performance of our algorithm, especial for large circuits. Compared with the placement algorithm of T_VPlace in VPR5.0, MPP-SA is able to decrease the run-time of 5 different size benchmark circuits by an average of 32%-42% without losing QoR.
Recently, a variety of indexing techniques have been proposed for optimizing keyword search on graph. However, graph indexing has very high space and time complexities, and thus these single-machine in-memory indices ...
详细信息
ISBN:
(纸本)9781450317207
Recently, a variety of indexing techniques have been proposed for optimizing keyword search on graph. However, graph indexing has very high space and time complexities, and thus these single-machine in-memory indices are usually not affordable for massive graphs. In this paper, we propose a novel distributed disk-based index, which organizes the local topology information in the graph to track and prune matched vertices that will not participate in the top-k answers to a specified query before search with heuristics. The distributed index can be constructed in a MapReduce manner. Moreover, a parallel search algorithm is also developed. It runs multiple asynchronous search instances that incrementally enumerate the current best local answers and then produces the global top-k answers from them. Lastly, we perform experiments on both synthetic and real graphs with various configurations. The results show that our approach can improve search efficiency on massive graphs significantly with affordable indexing overheads.
Network is divided into different management domains in large-scale communication system, each management domain has its own Local Management Site. The distributed characteristics of network determine the Global Manag...
详细信息
ISBN:
(纸本)9781467321013
Network is divided into different management domains in large-scale communication system, each management domain has its own Local Management Site. The distributed characteristics of network determine the Global Management Site with several Local Management Sites to meet a distributed architecture. This paper proposes a new algorithm called PFAARM (parallel fuzzy alarm association rules mining algorithm), which can be used for alarm fuzzy association rules parallel mining in multi-domain distributed communication network. Alarm correlation analysis ban be executed in parallel both in Global Management Site and Local Management Sites. Fuzzy association rules can be achieved within inner- and inter-domain alarms. On the basis of alarm correlation analysis in single management domain, the introduction of inter-domain alarm fuzzy association rules, which are based on inter-domain communication relationship, gives another essential clue for fault location. Meanwhile, it's of great significance for quick and efficient fault location. The simulation results illustrated its feasibility and efficiency.
Precise integration methods to solve structural dynamic responses and the corresponding time integration formula are composed of two parts: the multiplication of an exponential matrix with a vector and the integratio...
详细信息
Precise integration methods to solve structural dynamic responses and the corresponding time integration formula are composed of two parts: the multiplication of an exponential matrix with a vector and the integration term. The second term can be solved by the series solution. Two hybrid granularity parallel algorithms are designed, that is, the exponential matrix and the first term are computed by the fine-grained parallel algorithra and the second term is computed by the coarse-grained parallel algorithm. Numerical examples show that these two hybrid granularity parallel algorithms obtain higher speedup and parallel efficiency than two existing parallel algorithms.
暂无评论