Evolutionary multi-agent systems (EMAS) play a critical role in many artificial intelligence applications that are in use today. In this paper, we present a new generic skeleton in Erlang for parallel EMAS computation...
详细信息
Evolutionary multi-agent systems (EMAS) play a critical role in many artificial intelligence applications that are in use today. In this paper, we present a new generic skeleton in Erlang for parallel EMAS computations. The skeleton enables us to capture a wide variety of concrete evolutionary computations that can exploit the same underlying parallel implementation. We demonstrate the use of our skeleton on two different evolutionary computing applications: (1) computing the minimum of the Rastrigin function;and (2) solving an urban traffic optimisation problem. We show that we can obtain very good speedups (up to 142.44 the sequential performance) on a variety of different parallel hardware, while requiring very little parallelisation effort.
Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to be executed correctly as well as with high performance on m...
详细信息
Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to be executed correctly as well as with high performance on machines with different architectures, operating systems, and software libraries. The finite element method (FEM) is a popular and flexible method for discretizing partial differential equations arising in a wide variety of scientific, engineering, and industrial applications that require HPC. This article presents some preliminary results pertaining to our development of a performance portable implementation of the FEM-based Albany code. Performance portability is achieved using the Kokkos library. We present performance results for the Aeras global atmosphere dynamical core module in Albany. Numerical experiments show that our single code implementation gives reasonable performance across three multicore/many-core architectures: NVIDIA General Processing Units (GPU's), Intel Xeon Phis, and multicore CPUs.
Traditional data-oriented programming languages such as dataflow languages and stream languages provide a natural abstraction for parallel programming. In these languages, a developer focuses on the flow of data throu...
详细信息
ISBN:
(纸本)9781450300193
Traditional data-oriented programming languages such as dataflow languages and stream languages provide a natural abstraction for parallel programming. In these languages, a developer focuses on the flow of data through the computation and these systems free the developer from the complexities of low-level, thread-oriented concurrency primitives. This simplification comes at a cost traditional data-oriented approaches restrict the mutation of state and, in practice, the types of data structures a program can effectively use. Bamboo borrows from work in typestate and software transactions to relax the traditional restrictions of data-oriented programming models to support mutation of arbitrary data structures. We have implemented a compiler for Bamboo which generates code for the TILEPro64 many-core processor. We have evaluated this implementation on six benchmarks: Tracking, a feature tracking algorithm from computer vision;KMeans, a K-means clustering algorithm;Monte Carlo, a Monte Carlo simulation;Filter Bank, a multi-channel filter bank;Fractal, a Mandelbrot set computation;and Series, a Fourier series computation. We found that our compiler generated implementations that obtained speedups ranging from 26.2 x to 61.6 x when executed on 62 cores.
Heterogeneous computing platforms with multicore host system and many-core accelerator devices have taken a major step forward in the mainstream HPC computing market this year with the announcement of HP Apollo 6000 S...
详细信息
ISBN:
(纸本)9781450343510
Heterogeneous computing platforms with multicore host system and many-core accelerator devices have taken a major step forward in the mainstream HPC computing market this year with the announcement of HP Apollo 6000 System's ProLiant XL250a server features the Intel (R) Xeon Phi (TM) coprocessors. Although many application developers attempt to use it in the same way as GPGPU acceleration platforms, doing so forfeits the processing capability of multicore host processors and introduces power inefficiency in business operations. In this paper, we propose an application optimization framework to turn sequential legacy applications into highly parallel applications that make use of the hardware resources both on the host CPU and on the accelerator devices to enable simultaneous heterogeneous computing. As a case study, we look at how to apply this framework and adopt a structured methodology to develop option pricing applications to take advantages of a heterogeneous computing environment.
Intel's Single-chip Cloud Computer (SCC) is a prototype architecture for on-chip many-core systems. By incorporating 48 cores into a single die, it provides unique opportunities to gain insights into many-core sof...
详细信息
ISBN:
(纸本)9783642544194;9783642544200
Intel's Single-chip Cloud Computer (SCC) is a prototype architecture for on-chip many-core systems. By incorporating 48 cores into a single die, it provides unique opportunities to gain insights into many-core software development. Earlier results have shown that programming efficient and reliable software for many-core processors is difficult due to a lack of appropriate programming tools. In this paper, we present a programming framework to execute multiple applications specified as Kahn process networks on the SCC. These applications might be started or stopped at runtime based on requests of the user. The proposed application programming interface (API) abstracts low-level implementation details from the application designer enabling high-level performance analysis and automated mapping optimization. To efficiently execute workload specified by the proposed API, a lightweight runtime-system and an automated program synthesis backend are presented. Extensive experiments are carried out to characterize the performance of the proposed framework.
暂无评论