The mobile agents model has the potential to provide a flexible framework to face the challenges of high performance computing, especially when targeted towards heterogeneous distributed architectures. We developed a ...
详细信息
The mobile agents model has the potential to provide a flexible framework to face the challenges of high performance computing, especially when targeted towards heterogeneous distributed architectures. We developed a framework for supporting programming and execution of mobile agent based distributed applications, the MAGDA (Mobile Agents Distributed Applications) toolset. It supplements mobile agent technology with a set of features for supporting parallel programming on a dynamic heterogeneous distributed environment.
Recent commercial hardware platforms for embedded real-time systems feature heterogeneous processing units and computing accelerators on the same System-on-Chip. When designing complex real-time application for such a...
详细信息
This paper presents the alternatives available to support threadprivate data in OpenMP and evaluates them. We show how current compilation systems rely on custom techniques for implementing thread-local data. But in f...
详细信息
ISBN:
(纸本)9781424400546
This paper presents the alternatives available to support threadprivate data in OpenMP and evaluates them. We show how current compilation systems rely on custom techniques for implementing thread-local data. But in fact the ELF binary specification currently supports data sections that become threadprivate by default. ELF naming for such areas is thread-local storage (TLS). Our experiments demonstrate that implementing threadprivate based on the TLS support is very easy, and more efficient. This proposal goes in the same line as the future implementation of OpenMP on the GNU compiler collection. In addition, our experience with the use of threadprivate in OpenMP applications shows that usually it is better to avoid it. This is because threadprivate variables reside in common blocks and they impede the compiler to fully optimize the code. So it is better to keep threadprivate as a temporary technique only to ease porting MPI codes to OpenMP
In this paper, the performance of parallel computing will be thoroughly discussed in the domain of image matching. The concept of image matching is widely used in the areas of security, medical and computer vision whi...
详细信息
In this paper, the performance of parallel computing will be thoroughly discussed in the domain of image matching. The concept of image matching is widely used in the areas of security, medical and computer vision which require comparing two images for similarities. However, depending on the size of images, it is highly possible that the application computation cannot be handled in a single processor running a sequential algorithm. In order to overcome this limitation, parallel computing is introduced through the Message Passing Interface (MPI) library. In this project, for the comparison of two images, both images are first converted into grayscale and then are compared using the Sum of Square Differences (SSD) algorithm. Further, a parallel network of 12 processors was implemented for image matching and to calculate the performance of the SSD algorithm between both images. The performance gain of 12, 8, 4 and 2 processors was compared with the performance of a single processor. The comparison results presented a linear relationship between the performance gain and the number of processors used for execution. Hence, it proves that there are significant benefits of parallelism on SSD applications.
Using the Message Passing Interface (MPI) in C++ has been difficult up to this point, because of the lack of suitable C++ bindings and C++ class libraries. The existing MPI standard provides language bindings only for...
详细信息
Using the Message Passing Interface (MPI) in C++ has been difficult up to this point, because of the lack of suitable C++ bindings and C++ class libraries. The existing MPI standard provides language bindings only for C and Fortran 77, precluding their direct use in object-oriented programming. Even the proposed C++ bindings in MPI-2 are at a fairly low-level and are not directly suitable for object-oriented programming. In this paper, we present the requirements, analysis and design for Object-Oriented MPI (OOMPI), a C++ class library for MPI. Although the OOMPI class library is specified in C++, in some sense the specification is a generic one that uses C++ as the program description language. Thus, the OOMPI specification can also be considered as a generic object-oriented class library specification which can thus also form the basis for MPI class libraries in other object-oriented languages.
Nowadays, most embedded devices need to support multiple applications running concurrently. In contrast to desktop computing, very often the set of applications is known at design time and the designer needs to assure...
详细信息
ISBN:
(纸本)9783981080162
Nowadays, most embedded devices need to support multiple applications running concurrently. In contrast to desktop computing, very often the set of applications is known at design time and the designer needs to assure that critical applications meet their constraints in every possible use-case. In order to do this, all possible use-cases, i.e. subset of applications running simultaneously, have to be verified thoroughly. An approach to reduce the verification effort, is to perform composability analysis which has been studied for sets of applications modeled as Synchronous Dataflow Graphs. In this paper we introduce a framework that supports a more general parallel programming model based on the Kahn Process Networks Model of Computation and integrates a complete MPSoC programming environment that includes: compiler-centric analysis, performance estimation, simulation as well as mapping and scheduling of multiple applications. In our solution, composability analysis is performed on parallel traces obtained by instrumenting the application code. A case study performed on three typical embedded applications, JPEG, GSM and MPEG-2, proved the applicability of our approach.
The GAME system is a programming environment being developed at University College London as part of the major European project, aimed at promoting and demonstrating the use of genetic algorithms in real world applica...
详细信息
The GAME system is a programming environment being developed at University College London as part of the major European project, aimed at promoting and demonstrating the use of genetic algorithms in real world applications. GAME is target at the development and execution of complex sequential, concurrent or parallel applications, based on the genetic algorithm (GA) paradigm. Its object-oriented design and implementation provide the required levels of abstraction to describe and configure applications for a broad range of domains. GAME addresses the basic requirements involved in the design cycle of a GA application; it offers a set of genetic-oriented data structures, objects and straightforward programming interfaces that permit the implementation of a variety of GAs and parallel GAs. The underlying infrastructure provides the mechanisms for problem independent manipulation of data structures, monitoring, and execution on a virtual computing environment supporting multiple parallel computation models. Applications are constructed from parameterised libraries containing algorithms and genetic operators modules. GAME is highly customisable and its libraries can be easily expanded with the inclusion of new parameterised modules. Novice users can rapidly configure and execute pre-defined applications by simply setting up few parameters. Programmers can create new applications by combining pre-defined algorithms and genetic operators, or by directly programming new algorithms using the set of C++ classes provided. A graphic interface and monitoring facilities are also available in GAME.< >
Abstract program schemes, such as scan or homomorphism, can capture a wide range of data parallel programs. While versatile, these schemes are of limited practical use on their own. A key problem is that the more natu...
详细信息
Abstract program schemes, such as scan or homomorphism, can capture a wide range of data parallel programs. While versatile, these schemes are of limited practical use on their own. A key problem is that the more natural sequential specifications may not have associative combine operators required by these schemes. As a result, they often fail to be immediately identified. To resolve this problem, the authors propose a method to systematically derive parallel programs from sequential definitions. This method is special in that it can automatically invent auxiliary functions needed by associative combine operators. Apart from a formalisation, they also provide new theorems, based on the notion of context preservation, to guarantee parallelization for a precise class of sequential programs.
In this paper, we introduce DLoVe, a new paradigm for designing and implementing distributed and nondistributed virtual reality applications, using one-way constraints. DLoVe allows programs written in its framework t...
详细信息
ISBN:
(纸本)0769514928
In this paper, we introduce DLoVe, a new paradigm for designing and implementing distributed and nondistributed virtual reality applications, using one-way constraints. DLoVe allows programs written in its framework to be executed on multiple computers for improved performance. It also allows easy specification and implementation of multi-user interfaces. DLoVe hides all the networking aspects of message passing among the machines in the distributed environment and performs the needed network optimizations. As a result, a user of DLoVe does not need to understand parallel and distributed programming to use the system; he or she needs only be able to use the serial version of the user interface description language. parallelizing the computation is performed by DLoVe, without modifying the interface description.
parallel computing systems have been based on multicore CPUs and specialized coprocessors, like GPUs. Work-stealing is a scheduling technique that has been used to distribute and redistribute the workload among resour...
详细信息
parallel computing systems have been based on multicore CPUs and specialized coprocessors, like GPUs. Work-stealing is a scheduling technique that has been used to distribute and redistribute the workload among resources in an efficient way. This work aims to propose, implement and validate a scheduling approach based on work stealing in parallel systems with CPUs and GPUs simultaneously. Results show that our approach, called WORMS, presents competitive performance when compared to reference tool for multicore CPUs (Cilk). In hybrid scenario, WORMS with multicore+GPU outperforms WORMS and Cilk with multicore only and also the GPU reference tool (Thrust).
暂无评论