While traditional parallel computing systems are still struggling to gain a wider acceptance, the largest parallel computer that has ever been available is currently growing with the communication resource Internet. U...
详细信息
ISBN:
(纸本)0769509878
While traditional parallel computing systems are still struggling to gain a wider acceptance, the largest parallel computer that has ever been available is currently growing with the communication resource Internet. Unfortunately it is also rarely used in the parallel computation field. The reason for the rejection of parallel computers is mainly the difficulty of parallel programming. In this paper we propose the Self Distributing Associative ARChitecture (SDAARC). It has been derived from the Cache Only Memory Architecture (COMA). COMAs provide a distributed shared memory (DSM) with automatic distribution of data. We show how this paradigm of data distribution can be extended to the automatic distribution of instruction sequences (microthreads). We show how microthreads can be extracted from legacy C code to produce code that can automatically be parallelized by SDAARC at run time. We also discuss how SDAARC can be implemented on a rightly coupled multiprocessor systems on heterogenous LAN based computer networks (Intranet) and on WANs of computing resources.
Cluster computing is becoming increasingly popular among users of parallel and distributed applications. However since few clusters are solely dedicated to run individual user jobs, it is necessary to coordinate those...
详细信息
ISBN:
(纸本)0769515282
Cluster computing is becoming increasingly popular among users of parallel and distributed applications. However since few clusters are solely dedicated to run individual user jobs, it is necessary to coordinate those jobs among independently administered clusters effectively and with less user interaction. For this purpose, we have developed an infrastructure for inter-cluster job coordination using Voyager mobile agents. Once a user submits a job request, which is converted into XML, a mobile agent searches for the most available cluster establishes a job execution environment, executes the job, and reports its results back to the client. Using Voyager's dynamic aggregation feature, we also plan to make such job coordination agents to evolve themselves as to be suitable to be deployed in dynamic cluster environments. This paper presents the design principles and the latest implementation status of our infrastructure as well as advanced services using evolvable agents.
An embedded flash memory module has 1.2 V read capability and a 1.5 V program/erase capability. The flash cell is 2-transistor FN-NOR in a 0.181 /spl mu/m logic process. Design techniques improve observability and red...
详细信息
ISBN:
(纸本)0780366085
An embedded flash memory module has 1.2 V read capability and a 1.5 V program/erase capability. The flash cell is 2-transistor FN-NOR in a 0.181 /spl mu/m logic process. Design techniques improve observability and reduce test time.
In this paper we describe an instrumentation environment for the performance analysis and visualization of parallel applications written in JOMP, an OpenMP-like interface for Java. The environment includes two complem...
详细信息
In this paper we describe an instrumentation environment for the performance analysis and visualization of parallel applications written in JOMP, an OpenMP-like interface for Java. The environment includes two complementary approaches. The first one has been designed to provide a detailed analysis of the parallel behavior at the JOMP programming model level. At this level, the user is faced with parallel, work-sharing and synchronization constructs, which are the core of JOMP. The second mechanism has been designed to support an in-depth analysis of the threaded execution inside the Java virtual machine (JVM). At this level of analysis, the user is faced with the supporting threads layer monitors and conditional variables. The paper discusses the implementation of both mechanisms and evaluates the overhead incurred by them.
In this paper, we address the issues of partitioning sparse arrays whose non-zero elements are distributed non-uniformly. We consider inference schemes for Fortran 90 array intrinsics so that the non-zero structure of...
详细信息
ISBN:
(纸本)0769512585
In this paper, we address the issues of partitioning sparse arrays whose non-zero elements are distributed non-uniformly. We consider inference schemes for Fortran 90 array intrinsics so that the non-zero structure of the output array can be deduced from the non-zero structures of the input arrays. Experiments are conducted to measure the effectiveness of our method with the Harwell-Boeing sparse matrix collection. We also demonstrate that, given the sparsity structures of the source arrays and with the help of our inference schemes, one can predict the performance differences among a collection of equivalent Fortran 90 code for sample on-line analytical processing (OLAP). The experiments are performed on an IBM SP2 cluster with the library support of our sparse array intrinsics.
This paper presents benchmark results of three different parallel-programming paradigms on an unstructured shock capturing numerical code for transient problems. The three parallel programming methods include: (1) a s...
详细信息
ISBN:
(纸本)0769509908
This paper presents benchmark results of three different parallel-programming paradigms on an unstructured shock capturing numerical code for transient problems. The three parallel programming methods include: (1) a shared-memory programming of OpenMP using cache coherent non-uniform memory access (CC-NUMA) of SGI Origin2000 (2) an MPI (Message Passing Interface) implementation and (3) a SHMEM implementation using the parallel library called "Shared Memory Access Library". The methods (2) and (3) are both based on distributed memory architecture. SGI Origin2000 is used throughout the current study. It is found that the scalability of the programming (1) is so poor that its usage for the unstructured CFD code is impractical. The scalabilities of programming (2) and (3) are much better than programming (1) and the computational speed of giga-flops range can be achieved with 16 CPUs. The parallel programming with SHMEM libraries is approximately twice as fast as the one with MPI.
The close association between higher order functions and algorithmic skeletons is a promising source of automatic parallelisation of programs. An approach to automatically synthesizing higher order functions from func...
详细信息
ISBN:
(纸本)076951426X
The close association between higher order functions and algorithmic skeletons is a promising source of automatic parallelisation of programs. An approach to automatically synthesizing higher order functions from functional programs through proof planning is presented Our work has been conducted within the context of a parallelising compiler for SML, with the objective of exploiting parallelism latent in potential higher order function use in programs.
The BSP model can be extended with a zero cost synchronization mechanism that can be used when the numbers of messages due to receive is known. This mechanism, usually known as "oblivious synchronization", i...
详细信息
ISBN:
(纸本)0769509878
The BSP model can be extended with a zero cost synchronization mechanism that can be used when the numbers of messages due to receive is known. This mechanism, usually known as "oblivious synchronization", implies that different processors can be in different supersteps at the same time. An unwanted consequence of these software improvements is a loss of accuracy in prediction. This paper proposes an extension of the BSP complexity model to deal with oblivious barriers and shows its accuracy.
We present two generic parallel skeletons for the tabu search method-a well known meta-heuristic for approximately solving combinatorial optimization problems. The first skeleton is based on independent runs while the...
详细信息
ISBN:
(纸本)0769511538
We present two generic parallel skeletons for the tabu search method-a well known meta-heuristic for approximately solving combinatorial optimization problems. The first skeleton is based on independent runs while the second in the classical master-slave model. Our starting point is the design and implementation of a sequential skeleton that is used later as basis for the two parallel skeletons. Both skeletons provide the user with the following: a permit to obtain parallel implementations of the tabu search method for concrete combinatorial optimization problems from existing sequential implementations; there is no need for the user to know either parallel programming or communication libraries; and the parallel implementation of tabu search for a concrete problem is obtained automatically from a sequential implementation of tabu search for the problem. The skeletons, however, require from the user a sequential instantiation of the tabu search method for the problem at hand. The skeletons are implemented in C++ using MPI as the communication library and offer genericity, flexibility, component reuse, robustness and time savings. We have instantiated the two skeletons for the 0-1 multidimensional knapsack problem, among others, for which we report computational results.
暂无评论