A new algorithm for reducing the division operation to a series of smaller divisions is introduced. Partitioning the dividend into segments, we perform divisions, shifts, and accumulations taking into account the weig...
详细信息
ISBN:
(纸本)0780375963
A new algorithm for reducing the division operation to a series of smaller divisions is introduced. Partitioning the dividend into segments, we perform divisions, shifts, and accumulations taking into account the weight of dividend bits. Each partial division can be performed by any existing division algorithm. From an algorithmic point of view, computational complexity analysis is performed in comparison with existing algorithms. From an implementation point of view, since the division can be performed by any existing divider, the designer can chose the divider which meets his specifications best. Two possible implementations of the algorithm, namely the sequential and parallel are derived, with several variations, allowing performance, cost, and cost/performance trade-offs.
Recent research efforts of parallelprocessing on non-dedicated clusters have focused on high execution performance, parallelism management, transparent access to resources, and making clusters easy to use. However as...
详细信息
ISBN:
(纸本)0769515126
Recent research efforts of parallelprocessing on non-dedicated clusters have focused on high execution performance, parallelism management, transparent access to resources, and making clusters easy to use. However as a collection of independent computers used by multiple users, clusters are susceptible to failure. this paper shows the development of a coordinated checkpointing facility for the GENESIS cluster operating system. this facility was developed by exploiting existing operating system services. High performance and low overheads are achieved by allowing the processes of a parallel application to continue executing during the creation of check-points, while maintaining low demands on cluster resources by using coordinated checkpointing.
the external selection problem is to select the record withthe K-th smallest key from the given N records that are distributed and stored evenly on the D disks for the parallel machine with D processors. Each process...
详细信息
ISBN:
(纸本)0769515126
the external selection problem is to select the record withthe K-th smallest key from the given N records that are distributed and stored evenly on the D disks for the parallel machine with D processors. Each processor has its own primary memory of size M records and one disk, where N/D> M. the processors are connected with a root D X rootD Mesh architecture. Based on a two-stage approach, this paper presents an efficient parallel external selection algorithm for the distributed-memory parallel systems. First, all the processors execute local external sorting in parallel, each processor sorts the N/D records on its own disk. Next, they execute parallel external selection from the D sorted sub files on the D disks. this algorithm is asymptotically optimal and has a small constant factor of time complexity.
Heterogeneous parallel systems are becoming increasingly more common, especially withthe increasing use of cluster computers, such as PCs and networks of workstations for parallel computing. the main concern of this ...
详细信息
ISBN:
(纸本)0769515126
Heterogeneous parallel systems are becoming increasingly more common, especially withthe increasing use of cluster computers, such as PCs and networks of workstations for parallel computing. the main concern of this paper is measuring and evaluating the performance of such parallel systems, based on dynamic load balancing algorithm for parallel search algorithm depth-first search algorithm (DFS). the implementation of dynamic load balancing is running under the MPI (message passing interface) that allows parallel execution on cluster of heterogeneous 6 SUN workstations (COHW), operating with Solaris operating system and cluster of 10 PCs operating with Linux operating system, parallel program of dynamic load balancing is written in C language.
this paper presents a two-level parallel evolutionary algorithm for solving function optimization problem containing multiple solutions.. By combining the characteristics of both global search and local search, the fo...
详细信息
ISBN:
(纸本)0769515126
this paper presents a two-level parallel evolutionary algorithm for solving function optimization problem containing multiple solutions.. By combining the characteristics of both global search and local search, the former enables individual to draw closer to each optimal solution and keeps the genetic diversity,of individuals. then different individuals are selected fort local evolution in their appropriate neighborhood. this simple as well as easy-to-handle algorithm turns out to be very practical according to the numerical experiments which indicate that all optimal solutions can be found out by running once of the algorithm within a fairly short period of time.
the paper concerns the parallel computing and its application for solving the full Lyapunov exponents in the general nonlinear parameter-dependent continuous ordinary differential equations. Based on a standard serial...
详细信息
ISBN:
(纸本)0769515126
the paper concerns the parallel computing and its application for solving the full Lyapunov exponents in the general nonlinear parameter-dependent continuous ordinary differential equations. Based on a standard serial algorithm developed by Wolf et al.'s [1], we present a parallel algorithm using the block-cyclic decomposition method, and then apply it for solving the Lyapunov exponents of a continuous differential equation. By testing its performance of the parallel algorithm on the supercomputer DAWNING-2000II, it is proved that the parallel algorithm is of high level parallelism, no need for message passing (little communication cost), and little I/O. In addition, the algorithm can be extended to any high dimensional ordinary differential equations.
In this paper, we present the design and implementation of a new cluster file system, th-CluFS, which is based on the standard NFS protocol and is implemented in the user level space completely. this open platform fil...
详细信息
ISBN:
(纸本)0769515126
In this paper, we present the design and implementation of a new cluster file system, th-CluFS, which is based on the standard NFS protocol and is implemented in the user level space completely. this open platform file system is important as the clusters become larger and heterogeneous. To take advantages of the accumulated resources and high-speed network in clusters, th-CluFS follows a serverless architecture, hybrid distributed metadata management, and file granular data distribution, and it uses distributed metadata cache and unique cache to optimize performance. For the flexibility of th-CluFS, We plan to employ file migration to balance I/O load across nodes dynamically. According to the experiment results, we conclude that th-CluFS can meet the requirements of consistent file system view, performance and scalability gracefully.
As new computer architectures are developed to exploit large-scale data-level parallelism, techniques are needed to retarget legacy sequential code to these platforms. Sequential programming languages force programmer...
详细信息
ISBN:
(纸本)0769517994
As new computer architectures are developed to exploit large-scale data-level parallelism, techniques are needed to retarget legacy sequential code to these platforms. Sequential programming languages force programmers to include sequential artifacts in their code, particularly with respect to how the source code expresses data references (generally assuming a linear address space). In contrast, data-parallel programs apply many operations in parallel to elements in two-dimensional data sets, and a given data parallel operation can access other spatially local elements along either dimension. Of key importance in exposing data parallelism is determining these two-dimensional data dependencies among elements of a matrix. this paper presents a reverse engineering technique for identifying such dependencies in sequential image processing code, using pattern matching on an attributed dataflow representation of the program. the technique is applied to common image filtering algorithms. the technique is validated by retargeting to a Matlab program and matching the results against those of the original source.
Biological sequence comparison is an important tool for researchers in molecular biology. there are several algorithms for sequence comparison. the Smith-Waterman algorithm, based on dynamic programming, is one of the...
详细信息
We propose an improved version of the CGS method for the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices. the proposed method combines elements of numerical stability an...
详细信息
ISBN:
(纸本)0769515126
We propose an improved version of the CGS method for the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices. the proposed method combines elements of numerical stability and parallel algorithm design without increasing computational costs. the algorithm is derived such that all matrix-vector multiplication, inner products and vector updates of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time of vector updates. therefore, the cost of global communication which represents the bottleneck of the performance can be significantly reduced. In this paper, the Bulk Synchronous parallel (BSP) model is used to design a fully efficient, scalable and portable parallel proposed algorithm and to provide accurate performance prediction of the algorithm for a wide range of architectures including the Cray T3D, the Parsytec, and a cluster of workstations connected by an Ethernet. this performance model uses only a few system dependent parameters based on a simple and accurate cost modelling to provide useful insight in the time complexity of the method. the theoretical performance prediction are compared with some preliminary measured timing results of a numerical application from ocean flow simulation.
暂无评论