This paper proposes a novel genetic parallel programming (GPP) paradigm for evolving optimal parallel programs running on a multi-ALU processor by linear genetic programming. GPP uses a two-phase evolution approach. I...
详细信息
This paper proposes a novel genetic parallel programming (GPP) paradigm for evolving optimal parallel programs running on a multi-ALU processor by linear genetic programming. GPP uses a two-phase evolution approach. It evolves completely correct solution programs in the first phase. Then it optimizes execution speeds of solution programs in the second phase. Besides, GPP also employs a new genetic operation that swaps sub-instructions of a solution program. Three experiments (Sextic, Fibonacci and Factorial) are given as examples to show that GPP could discover novel parallel programs that fully utilize the processor's parallelism.
Traditionally, a local area network (LAN) has been used for parallel programming with PVM and MPI. The improvement of communications in wireless local area networks (WLANs) achieving up to 11 Mbps make them, according...
详细信息
ISBN:
(纸本)0769514448
Traditionally, a local area network (LAN) has been used for parallel programming with PVM and MPI. The improvement of communications in wireless local area networks (WLANs) achieving up to 11 Mbps make them, according to some authors, candidates to be used as a resource for grid computing. In this paper we use our library based on LAM/MPI named LAMGAC in order to parallelize an algorithm that finds the global minimum of a nonlinear real valued continuous function. The algorithm uses a strategy based on the division of the domain into small boxes and it locates the extreme by means of a multiple start algorithm (MRS). The local minimizer is carried out by means of the steepest descent and the DFP method. The novelty of this approach is that we can vary the parallel virtual machine in runtime (spawning new processes using functions defined in MPI-2), we generate algorithms in which computations and communications are efficiently overlapped and we include a Web interface to offer our system as a grid resource. We have measured the execution time of some algorithms and the components of LAMGAC, obtaining interesting results.
The paper considers the modular programming with hierarchically structured multi-processor tasks on top of SPMD tasks for distributed memory machines. The parallel execution requires a corresponding decomposition of t...
详细信息
ISBN:
(纸本)9780769515243
The paper considers the modular programming with hierarchically structured multi-processor tasks on top of SPMD tasks for distributed memory machines. The parallel execution requires a corresponding decomposition of the set of processors into a hierarchical group structure onto which the tasks are mapped. This results in a multi-level group SPMD computation model with varying processor group structures. The advantage of this kind of mixed task and data parallelism is a potential to reduce the communication overhead and to increase scalability. We present a runtime library to support the coordination of hierarchically structured multi-processor tasks. The library exploits an extended parallel group SPMD programming model and manages the entire task execution including the dynamic hierarchy of processor groups. The library is built on top of MPI, has an easy-to-use interface, and leads to only a marginal overhead while allowing static planning and dynamic restructuring.
The demand for high-density, high-speed programming in flash memories has been increasing because their expanding applications in portable equipment such as digital still cameras and music players. A multilevel techni...
详细信息
The demand for high-density, high-speed programming in flash memories has been increasing because their expanding applications in portable equipment such as digital still cameras and music players. A multilevel technique is one of the most effective approaches for improving memory density. But long cell programming time and precise control of the memory cell's threshold voltage (Vth) degrade its programming performance. To realize fast cell programming, we have developed a so-called assist-gate (AG)-AND-type flash cell, in which programming is performed by source side channel hot electron injection (SSI). In this paper, we developed a constant-charge-injection programming, which realizes fast precise control of Vth by suppressing the characteristic deviation. By utilizing proposed scheme, we achieved. 10.3-MB/s programming throughput in multilevel AG-AND flash memories.
We investigate remapping multi-dimensional arrays on cluster of SMP architectures under OpenMP, MPI, and hybrid paradigms. Traditional method of array transpose needs an auxiliary array of the same size and a copy bac...
详细信息
We investigate remapping multi-dimensional arrays on cluster of SMP architectures under OpenMP, MPI, and hybrid paradigms. Traditional method of array transpose needs an auxiliary array of the same size and a copy back stage. We recently developed an in-place method using vacancy tracking cycles. The vacancy tracking algorithm outperforms the traditional 2-array method as demonstrated by extensive comparisons. The independence of vacancy tracking cycles allows efficient parallelization of the in-place method on SMP architectures at node level. Performance of multi-threaded parallelism using OpenMP are tested with different scheduling methods and different number of threads. The vacancy tracking method is parallelized using several parallel paradigms. At node level, pure OpenMP outperforms pure MPI by a factor of 2.76. Across entire cluster of SMP nodes, the hybrid MPI/OpenMP implementation outperforms pure MPI by a factor of 4.44, demonstrating the validity of the parallel paradigm of mixing MPI with OpenMP.
Shared object Distributed Shared Memory (DSM) minimizes the problem of false sharing by allowing programmer to control the sharing size. This shared object approach for distributed parallel programming works well in t...
详细信息
Shared object Distributed Shared Memory (DSM) minimizes the problem of false sharing by allowing programmer to control the sharing size. This shared object approach for distributed parallel programming works well in task parallelism but not in data parallelism. When the data of a shared object is being modified, a lock on that object must be enforced to exclude any concurrent access on that same object. If the shared data within an object is large, internal false sharing would become a problem. We present a multi-locking mechanism for shared object DSM which allows multiple locks be applied to the different data sets of a shared object and thus enhances its concurrency power.
This article studies a static scheduling method based on workload balancing. An equation is presented for the case when the workload is equally distributed onto all the processors. An efficient load balance scheduling...
详细信息
This article studies a static scheduling method based on workload balancing. An equation is presented for the case when the workload is equally distributed onto all the processors. An efficient load balance scheduling algorithm is developed assuming that the workload has certain properties. Finally, some computational results are given for the product between an upper diagonal matrix and a vector.
Clusters are attractive for executing sequential and parallel applications. However, there is a need to design a cluster distributed operating system to provide a Single System Image. A cluster operating system provid...
详细信息
Clusters are attractive for executing sequential and parallel applications. However, there is a need to design a cluster distributed operating system to provide a Single System Image. A cluster operating system providing both a DSM system and load balancing is attractive for efficiently executing a workload of sequential applications and shared memory parallel applications. Gobelins is a distributed operating system dedicated to clusters that provides both a DSM system and a process migration mechanism to support load balancing. In this paper, we present the implementation of Gobelins process migration mechanism which exploits Gobelins kernel level DSM system. We show that Gobelins DSM allows to implement simply an efficient migration mechanism, that can be used to move processes or threads among cluster nodes. A prototype of Gobelins has been implemented. Some performance results are presented in this paper.
In the report the conceptual problems of the rise of productivity and reliability of specialized computing devices (SCD) of intelligent systems are considered on the basis of representation and data processing with th...
详细信息
In the report the conceptual problems of the rise of productivity and reliability of specialized computing devices (SCD) of intelligent systems are considered on the basis of representation and data processing with the help of various problem-oriented machine arithmetic (POMA). The directions of development of methods of representation and data processing are considered, and also the new methods are offered and the known methods are essentially advanced.
暂无评论