A number of high-level parallel programming platforms for networks of workstations (NOWs) have been developed in recent times. Most of these platforms target the exploitation of data parallelism in applications. They ...
详细信息
A number of high-level parallel programming platforms for networks of workstations (NOWs) have been developed in recent times. Most of these platforms target the exploitation of data parallelism in applications. They do not allow expressibility of applications as a collection of tasks along with their precedence relationships, As a result, the control or task parallelism in an application cannot be expressed or exploited. The current work aims at integrating the notion of task parallelism and precedence relationships among constituting tasks to such high-level data parallel platforms for NOWs, Our model of integration provides for arbitrary nesting of data and task parallel modules. Also, the precedence relationships are clearly reflected from the program structure. The model relieves the programmer from the need to design applications for non-determinism in the order of completion of constituting tasks. The design of the runtime support as well as system-level book keeping is discussed, The model is general enough to be applied to a wide range of data parallel platforms. A specific case of integrating the model into anonymous remote computing (ARC), a data parallel programming platform, is presented. The performance related aspects are also discussed. Copyright (C) 2000 John Wiley & Sons, Ltd.
The paper focuses on the development of a numerical code for the computation of basins of attraction by using the parallel programming. Two different approaches based on the massage passing interface (MPI) standard ar...
详细信息
The paper focuses on the development of a numerical code for the computation of basins of attraction by using the parallel programming. Two different approaches based on the massage passing interface (MPI) standard are presented;the performance analysis presented encourages us to use a massive communication between nodes only for a few-cores architecture. The critical issues arising from the study of a generic dynamical system are discussed while the computation of basins is performed on a benchmark system described by Duffing's equation. We paid attention at the optimization of the computing time as well as the work time load on each node in order to develop a performing and portable code. For the presented codes, both the scalability with an implementation on a professional cluster and the capabilities of the parallelism in the elaborations of basins with a large set of initial conditions have been tested. (C) 2015 Elsevier Ltd. All rights reserved.
In this paper we present ALua, an event-driven communication mechanism for developing distributed parallel applications, based on the interpreted language Lua. We propose a dual programming model for parallel applicat...
详细信息
In this paper we present ALua, an event-driven communication mechanism for developing distributed parallel applications, based on the interpreted language Lua. We propose a dual programming model for parallel applications, where ALua acts as a gluing element, allowing precompiled program parts to run on different machines. We show, through examples, how three types of applications can benefit from the flexibility that derives from this model. We then present a study of ALua's performance, by comparing execution times of two parallel applications written in ALua with their counterparts written in PVM. (C) 2002 Elsevier Science Ltd. All rights reserved.
Sequential programs are often difficult to parallelize because of the complexity in their implementation and the uncertainty in their behavior. We will demonstrate behavior-oriented parallelization (BOP), which provid...
详细信息
While the frame rate is higher and the image size is larger, sequence images processing is harder. Good real-time can be ensured by the multi-core DSP in the embedded image processing system. TMS320C6670 which is the ...
详细信息
Skeleton and pattern based parallel programming promise significant benefits but remain absent from mainstream practice. We consider why this situation has arisen and propose a number of design principles which may he...
详细信息
Skeleton and pattern based parallel programming promise significant benefits but remain absent from mainstream practice. We consider why this situation has arisen and propose a number of design principles which may help to redress it. We sketch the eSkel library, which represents a concrete attempt to apply these principles. eSkel is based on C and MPI, thereby embedding its skeletons in a conceptually familiar framework. We presert an application of eSkel and analyse it as a response to our manifesto. (C) 2004 Elsevier B.V. All rights reserved.
Linda is a language for programmingparallel applications whose most notable feature is a distributed shared memory called tuple space. While suitable for a wide variety of programs, one shortcoming of the language as...
详细信息
Linda is a language for programmingparallel applications whose most notable feature is a distributed shared memory called tuple space. While suitable for a wide variety of programs, one shortcoming of the language as commonly defined and implemented is a lack of support for writing programs that can tolerate failures in the underlying computing platform. This paper describes FT-Linda, a version of Linda that addresses this problem by providing two major enhancements that facilitate the writing of fault-tolerant applications: stable tuple spaces and atomic execution of tuple space operations. The former is a type of stable storage in which tuple values are guaranteed to persist across failures, while the latter allows collections of tuple operations to be executed in an all-or-nothing fashion despite failures and concurrency. The design of these enhancements is presented in detail and illustrated by examples drawn from both the Linda and fault-tolerance domains. An implementation of FT-Linda for a network of workstations is also described. The design is based on replicating the contents of stable tuple spaces to provide failure resilience and then updating the copies using atomic multicast. This strategy allows an efficient implementation in which only a single multicast message is needed for each atomic collection of tuple space operations.
In this article, the author describes the trade-offs in using high-leveltools for parallel computing, focusing particularly on those that integrate with existing scientificcomputing software on the desktop.
In this article, the author describes the trade-offs in using high-leveltools for parallel computing, focusing particularly on those that integrate with existing scientificcomputing software on the desktop.
Orca is a language for implementing parallel applications on loosely coupled distributed systems. Unlike most languages for distributed programming, it allows processes on different machines to share data. Such data a...
详细信息
Orca is a language for implementing parallel applications on loosely coupled distributed systems. Unlike most languages for distributed programming, it allows processes on different machines to share data. Such data are encapsulated in data-objects, which are instances of user-defined abstract data types. The implementation of Orca takes care of the physical distribution of objects among the local memories of the processors. In particular, an implementation may replicate and/or migrate objects in order to decrease access times to objects and increase parallelism. This paper gives a detailed description of the Orca language design and motivates the design choices. Orca is intended for applications programmers rather than systems programmers. This is reflected in its design goals to provide a simple, easy to use language that is type-secure and provides clean semantics. The paper discusses three example parallel applications in Orca, one of which is described in detail. It also describes one of the existing implementations, which is based on reliable broadcasting. Performance measurements of this system are given for three parallel applications. The measurements show that significant speedups can be obtained for all three applications. Finally, the paper compares Orca with several related languages and systems.
Teaching and training for high-performance computing in our college could not catch up with HPC research level. Thus, it is imperative to promote teaching reform on parallel computing course in our college. Our first ...
详细信息
暂无评论