The PROUD module placement algorithm mainly uses a hierarchical decomposition technique and the solution of sparse linear systems based on a resistive network analogy. It has been shown that the PROUD algorithm can ac...
详细信息
ISBN:
(纸本)0769517307;0769517315
The PROUD module placement algorithm mainly uses a hierarchical decomposition technique and the solution of sparse linear systems based on a resistive network analogy. It has been shown that the PROUD algorithm can achieve a comparable design of the placement problems for very large circuits with the best placement algorithm based on simulated annealing, but with several order of magnitude faster. The modified PROUD, namely MPROUD algorithm by perturbing the coefficient matrices performs much faster that the original PROUD algorithm. Due to the instability and unguaranteed convergence of MPROUD algorithm, we have proposed a new convergent and numerically stable PROUD, namely Improved PROUD algorithm, denoted as IPROUD with attractive computational costs to solve the module placement problems by making use of the SYMMLQ and MINRES methods based on Lanczos process in [11]. In this paper, we subsequently propose parallel versions of the improved PROUD algorithms. The parallel algorithm is derived such that all inner products and matrix-vector multiplications of a single iteration step am independent. Therefore, the cost of global communication which represents the bottleneck of the parallel performance on parallel distributed memory computers can be significantly reduced, therefore, to obtain another order of magnitude improvement in the runtime without loss of the quality of the layout.
In this study, a solution to the school timetabling problem using parallel genetic algorithm with simulated annealing is presented. The hybridization of simulated annealing and parallel genetic algorithm is explained....
详细信息
ISBN:
(纸本)0769515797
In this study, a solution to the school timetabling problem using parallel genetic algorithm with simulated annealing is presented. The hybridization of simulated annealing and parallel genetic algorithm is explained. Also, how these algorithms are run in parallel on a local network of workstations are discussed. Some comparative results among the different parallel models are exhibited. The implementation of the parallel algorithms are used to construct conflict-free and satisfying timetables for the Department of Mathematics of the University of the Philippines Diliman. The program output of this study can be easily modified to be used as a helpful and efficient guide to the decision-making process of the scheduler.
This paper adopts a transformational programming approach for deriving massively parallel algorithms from functional specific ations. It gives a brief description of a framework for relating key higher order functions...
详细信息
In this paper, we present parallel algorithms for the coarse grained multicomputer (CGM) and the bulk synchronous parallel computer (BSP) for solving two well known graph problems: (1) determining whether a graph G is...
详细信息
We present a class of new parallel algorithms for solving large sparse linear systems with special structure on distributed memory multiprocessor systems such as PC clusters. The objective of these algorithms is to re...
详细信息
Exploiting programs' locality is one of the most important problems in parallel compiling optimization and the program transformations are one of the most important approaches in exploiting programs' temporal ...
详细信息
Exploiting programs' locality is one of the most important problems in parallel compiling optimization and the program transformations are one of the most important approaches in exploiting programs' temporal locality and spatial locality. The paper presents a new locality optimization approach using non-singular loop transformations to optimize programs' locality, namely linear expressing based loop transformations. This approach uses a group of the least linearly independent vectors to express array accesses' subscripts, and then constructs a non-singular loop transformation matrix to optimize array accesses' temporal locality and spatial locality. The approach can fully exploit array accesses' temporal locality, and easily determine whether array accesses' temporal locality or spatial locality can be exploited, it can also simultaneously optimize the given loop nest's temporal locality and spatial locality. The experimental results show that the linear expressing based approach for optimizing locality using non-singular loop transformations presented in this paper is effective.
Matrix partitioning problems that arise in the efficient estimation of sparse Jacobians andHessians can be modeledusing variants of graph coloring problems. In a previous work [6], we argue that distance-2 and distanc...
详细信息
Data and control parallelism algorithms are described for a matrix method which detects and locates the presence of logic hazards in combinational logic circuits. Examples are given for illustration.
Data and control parallelism algorithms are described for a matrix method which detects and locates the presence of logic hazards in combinational logic circuits. Examples are given for illustration.
The method of discrete ordinates is commonly used to solve the Boltzmann radiation transport equation for applications ranging from simulations of fires to weapons effects. The equations are most efficiently solved by...
详细信息
The method of discrete ordinates is commonly used to solve the Boltzmann radiation transport equation for applications ranging from simulations of fires to weapons effects. The equations are most efficiently solved by sweeping the radiation flux across the computational grid. For unstructured grids this poses several interesting challenges, particularly when implemented on distributed-memory parallel machines where the grid geometry is spread across processors. We describe a asynchronous, parallel, message-passing algorithm that performs sweeps simultaneously from many directions across unstructured grids. We identify key factors that limit the algorithm’s parallel scalability and discuss two enhancements we have made to the basic algorithm: one to prioritize the work within a processor’s subdomain and the other to better decompose the unstructured grid across processors. Performance results are give for the basic and enhanced algorithms implemented withi a radiation solver running on hundreds of processors of Sandia’s Intel Tflops machine and DEC-Alpha CPlant cluster.
The increasing interest in product networks (PNs) as a method of combining desirable properties of component networks, has prompted a need for the general study of the algorithmic issues related to this important clas...
详细信息
The increasing interest in product networks (PNs) as a method of combining desirable properties of component networks, has prompted a need for the general study of the algorithmic issues related to this important class of interconnection networks. In this paper we present unified parallel algorithms for Gaussian elimination, with partial and complete pivoting, on product networks. A parallel algorithm for backward substitution is also presented. The proposed algorithms are network independent and are also independent of the matrix distribution methods employed. These algorithms can be used on a wide range of PNs including hypercube, mesh, and k-ary n-cube. Unified models for estimating computation time and interprocessor communication time are also presented. These models are then used to measure the performance of the proposed algorithms on several product networks
暂无评论