While the frame rate is higher and the image size is larger, sequence images processing is harder. Good real-time can be ensured by the multi-core DSP in the embedded image processing system. TMS320C6670 which is the ...
详细信息
Skeleton and pattern based parallel programming promise significant benefits but remain absent from mainstream practice. We consider why this situation has arisen and propose a number of design principles which may he...
详细信息
Skeleton and pattern based parallel programming promise significant benefits but remain absent from mainstream practice. We consider why this situation has arisen and propose a number of design principles which may help to redress it. We sketch the eSkel library, which represents a concrete attempt to apply these principles. eSkel is based on C and MPI, thereby embedding its skeletons in a conceptually familiar framework. We presert an application of eSkel and analyse it as a response to our manifesto. (C) 2004 Elsevier B.V. All rights reserved.
Linda is a language for programmingparallel applications whose most notable feature is a distributed shared memory called tuple space. While suitable for a wide variety of programs, one shortcoming of the language as...
详细信息
Linda is a language for programmingparallel applications whose most notable feature is a distributed shared memory called tuple space. While suitable for a wide variety of programs, one shortcoming of the language as commonly defined and implemented is a lack of support for writing programs that can tolerate failures in the underlying computing platform. This paper describes FT-Linda, a version of Linda that addresses this problem by providing two major enhancements that facilitate the writing of fault-tolerant applications: stable tuple spaces and atomic execution of tuple space operations. The former is a type of stable storage in which tuple values are guaranteed to persist across failures, while the latter allows collections of tuple operations to be executed in an all-or-nothing fashion despite failures and concurrency. The design of these enhancements is presented in detail and illustrated by examples drawn from both the Linda and fault-tolerance domains. An implementation of FT-Linda for a network of workstations is also described. The design is based on replicating the contents of stable tuple spaces to provide failure resilience and then updating the copies using atomic multicast. This strategy allows an efficient implementation in which only a single multicast message is needed for each atomic collection of tuple space operations.
In this article, the author describes the trade-offs in using high-leveltools for parallel computing, focusing particularly on those that integrate with existing scientificcomputing software on the desktop.
In this article, the author describes the trade-offs in using high-leveltools for parallel computing, focusing particularly on those that integrate with existing scientificcomputing software on the desktop.
Orca is a language for implementing parallel applications on loosely coupled distributed systems. Unlike most languages for distributed programming, it allows processes on different machines to share data. Such data a...
详细信息
Orca is a language for implementing parallel applications on loosely coupled distributed systems. Unlike most languages for distributed programming, it allows processes on different machines to share data. Such data are encapsulated in data-objects, which are instances of user-defined abstract data types. The implementation of Orca takes care of the physical distribution of objects among the local memories of the processors. In particular, an implementation may replicate and/or migrate objects in order to decrease access times to objects and increase parallelism. This paper gives a detailed description of the Orca language design and motivates the design choices. Orca is intended for applications programmers rather than systems programmers. This is reflected in its design goals to provide a simple, easy to use language that is type-secure and provides clean semantics. The paper discusses three example parallel applications in Orca, one of which is described in detail. It also describes one of the existing implementations, which is based on reliable broadcasting. Performance measurements of this system are given for three parallel applications. The measurements show that significant speedups can be obtained for all three applications. Finally, the paper compares Orca with several related languages and systems.
Turing's model is a model contains reaction-diffusion equation that capable to form skin patterns on an animal. In this paper, Turing's model was investigated, with the model improvisation by Barrio et al. [12...
详细信息
Teaching and training for high-performance computing in our college could not catch up with HPC research level. Thus, it is imperative to promote teaching reform on parallel computing course in our college. Our first ...
详细信息
In this work, we take up the challenge of performance portable programming of heterogeneous stencil computations across a wide range of modern shared-memory systems. An important example of such computations is the Mu...
详细信息
In this work, we take up the challenge of performance portable programming of heterogeneous stencil computations across a wide range of modern shared-memory systems. An important example of such computations is the Multidimensional Positive Definite Advection Transport Algorithm (MPDATA), the second major part of the dynamic core of the EULAG geophysical model. For this aim, we develop a set of parametric optimization techniques and four-step procedure for customization of the MPDATA code. Among these techniques are: islands-of-cores strategy, (3+1)D decomposition, exploiting data parallelism and simultaneous multithreading, data flow synchronization, and vectorization. The proposed adaptation methodology helps us to develop the automatic transformation of the MPDATA code to achieve high sustained scalable performance for all tested ccNUMA platforms with Intel processors of last generations. This means that for a given platform, the sustained performance of the new code is kept at a similar level, independently of the problem size. The highest performance utilization rate of about 41-46% of the theoretical peak, measured for all benchmarks, is provided for any of the two-socket servers based on Skylake-SP (SKL-SP), Broadwell, and Haswell CPU architectures. At the same time, the four-socket server with SKL-SP processors achieves the highest sustained performance of around 1.0-1.1 Tflop/s that corresponds to about 33% of the peak.
It is maintained that to exploit fully the parallelism inherent in animate vision systems, an integrated vision architecture must support multiple models of parallelism. To support this claim, the hardware base of a t...
详细信息
It is maintained that to exploit fully the parallelism inherent in animate vision systems, an integrated vision architecture must support multiple models of parallelism. To support this claim, the hardware base of a typical animate vision laboratory and the software requirements of applications are described. A brief overview is then given of the Psyche operating system, which was designed to support multimodel programming. A complex animate vision application, checkers, constructed as a multimodel program under Psyche, is also described. Checkers demonstrates the advantages of decomposing animate vision systems by function and independently selecting an appropriate parallel-programming model for each function
This paper describes emulation of parallel execution of a program written in standard occam(TM) source code. The occam language is a high-level language specifically designed to accommodate concurrent programming. The...
详细信息
This paper describes emulation of parallel execution of a program written in standard occam(TM) source code. The occam language is a high-level language specifically designed to accommodate concurrent programming. The emulator checks and executes most instructions in the occam 2 language, providing a useful tool for debugging simple occam programs, and also provides accessibility to allow monitoring of execution. A ''user friendly'' graphical interface is an integral part of the emulator. The paper describes the emulator and its use in teaching the occam language and parallel programming concepts to final year undergraduates. The teaching context is given and laboratory notes outlined along with sample programs that illustrate features of the language.
暂无评论