In this paper, we argue that interacting processes (IP) with multiparty interactions are an ideal model for parallel programming. The IP model with multiparty interactions was originally proposed by N. Francez and I. ...
详细信息
A number of high-level parallel programming platforms for networks of workstations (NOWs) have been developed in recent times. Most of these platforms target the exploitation of data parallelism in applications. They ...
详细信息
A number of high-level parallel programming platforms for networks of workstations (NOWs) have been developed in recent times. Most of these platforms target the exploitation of data parallelism in applications. They do not allow expressibility of applications as a collection of tasks along with their precedence relationships, As a result, the control or task parallelism in an application cannot be expressed or exploited. The current work aims at integrating the notion of task parallelism and precedence relationships among constituting tasks to such high-level data parallel platforms for NOWs, Our model of integration provides for arbitrary nesting of data and task parallel modules. Also, the precedence relationships are clearly reflected from the program structure. The model relieves the programmer from the need to design applications for non-determinism in the order of completion of constituting tasks. The design of the runtime support as well as system-level book keeping is discussed, The model is general enough to be applied to a wide range of data parallel platforms. A specific case of integrating the model into anonymous remote computing (ARC), a data parallel programming platform, is presented. The performance related aspects are also discussed. Copyright (C) 2000 John Wiley & Sons, Ltd.
Distributed shared memory systems provide an easy-to-program parallel environment, to harness the available computing power of PC networks. We present a layered architecture that allows a portable, scalable, and low-c...
详细信息
Java's support for parallel and distributed processing makes the language attractive for metacomputing applications, such as parallel applications that run on geographically distributed (wide-area) systems. To obt...
详细信息
Java's support for parallel and distributed processing makes the language attractive for metacomputing applications, such as parallel applications that run on geographically distributed (wide-area) systems. To obtain actual experience with a Java-centric approach to metacomputing, we have built and used a highperformance wide-area Java system, called Manta, Manta implements the Java Remote Method Invocation (RMI) model using different communication protocols (active messages and TCP/IP) for different networks. The paper shows how wide-area parallel applications can be expressed and optimized using Java RMI, Also, it presents performance results of several applications on a wide-area system consisting of four Myrinet-based clusters connected by ATM WANs, We finally discuss alternative programming models, namely object replication, JavaSpaces, and MPI for Java, Copyright (C) 2000 John Wiley & Sons, Ltd.
Programmable network interfaces provide the potential to extend the functionality of network services but lead to instruction processing overheads when compared to application-specific network interfaces. This paper a...
详细信息
Programmable network interfaces provide the potential to extend the functionality of network services but lead to instruction processing overheads when compared to application-specific network interfaces. This paper aims to offset those performance disadvantages by exploiting task-level concurrency in the workload to parallelize the network interface firmware for a programmable controller with two processors. By carefully partitioning the handler procedures that process various events related to the progress of a packet, the system can minimize sharing, achieve load balance, and efficiently utilize on-chip storage. Compared to the uniprocessor firmware released by the manufacturer, the parallelized network interface firmware increases throughput by 65% for bidirectional UDP traffic of maximum-sized packets, 157% for bidirectional UDP traffic of minimum-sized packets, and 32-107% for real network services. This parallelization results in performance within 10-20% of a modem ASIC-based network interface for real network services.
Gather and scatter are data redistribution functions of long-standing importance to high performance computing. In this paper, we present a highly-general array operator with powerful gather and scatter capabilities u...
详细信息
Gather and scatter are data redistribution functions of long-standing importance to high performance computing. In this paper, we present a highly-general array operator with powerful gather and scatter capabilities unmatched by other array languages. We discuss an efficient parallel implementation, introducing three new optimizations-schedule compression, dead array reuse, and direct communication-that reduce the costs associated with the operator's wide applicability. In our implementation of this operator in ZPL, we demonstrate performance comparable to the hand-coded Fortran + MPI versions of the NAS FT and CC benchmarks.
CCJ is a communication library that adds MPI-like message passing and collective operations to Java. Rather than trying to adhere to the precise MPI syntax, CCJ aims at a clean integration of communication into Java&#...
详细信息
CCJ is a communication library that adds MPI-like message passing and collective operations to Java. Rather than trying to adhere to the precise MPI syntax, CCJ aims at a clean integration of communication into Java's object-oriented framework. For example, CCJ uses thread groups to support Java's multithreading model and it allows any data structure (not just arrays) to be communicated. CCJ is implemented entirely in Java, on top of RMI, so it can be used with any Java virtual machine. The paper discusses three parallel Java applications that use collective communication. It compares the performance (on top of a Myrinet cluster) of CCJ, RMI and mpiJava versions of these applications and also compares their code complexity. A detailed performance comparison between CCJ and mpiJava is given using the Java Grande Forum MPJ benchmark suite. The results show that neither CCJ's object-oriented design nor its implementation on top of RMI impose a performance penalty on applications compared to their mpiJava counterparts. The source of CCJ is available from our Web site http://***/manta. Copyright (C) 2003 John Wiley Sons, Ltd.
ARMI is a communication library that provides a framework for expressing fine-grain parallelism and mapping it to a particular machine using shared-memory and message passing library calls. The library is an advanced ...
详细信息
ARMI is a communication library that provides a framework for expressing fine-grain parallelism and mapping it to a particular machine using shared-memory and message passing library calls. The library is an advanced implementation of the RMI protocol and handles low-level details such as scheduling incoming communication and aggregating outgoing communication to coarsen parallelism when necessary. These details can be tuned for different platforms to allow user codes to achieve the highest performance possible without manual modification. ARMI is used by STAPL, our generic parallel library, to provide a portable, user transparent communication layer, We present the basic design as well as the mechanisms used in the current Pthreads/OpenMP, MPI implementations and/or a combination thereof. Performance comparisons between ARMI and explicit use of Pthreads or MPI are given on a variety of machines, including an HP V2200, SGI Origin 3800, IBM Regatta-HPC and IBM RS6000 SP cluster.
A design pattern is a mechanism for encapsulating the knowledge of experienced designers into a re-usable artifact. parallel design patterns reflect commonly occurring parallel communication and synchronization struct...
详细信息
A design pattern is a mechanism for encapsulating the knowledge of experienced designers into a re-usable artifact. parallel design patterns reflect commonly occurring parallel communication and synchronization structures. Our tools, CO2P3S (Correct Object-Oriented Pattern-based parallel programming System) and MetaCO(2)P(3)S, use generative design patterns. A programmer selects the parallel design patterns that are appropriate for an application, and then adapts the patterns for that specific application by selecting from a small set of code-configuration options. CO2P3S then generates a custom framework for the application that includes all of the structural code necessary for the application to ran in parallel. The programmer is only required to write simple code that launches the application and to fill in some application-specific sequential hook routines. We use generative design patterns to take an application specification (parallel design patterns + sequential user code) and use it to generate parallel application code that achieves good performance in shared memory and distributed memory environments. Although our implementations are for Java, the approach we describe is tool and language independent. This paper describes generalizing CO2P3S to generate distributed-memory parallel solutions.
暂无评论