The Adapteva Epiphany many-core architecture comprises a scalable 2d mesh Network-on-Chip (NoC) of low-power RíSC cores with minimal uncore functionality. Whereas such a processor offers high computational energy...
详细信息
The Adapteva Epiphany many-core architecture comprises a scalable 2d mesh Network-on-Chip (NoC) of low-power RíSC cores with minimal uncore functionality. Whereas such a processor offers high computational energy efficiency and parallel scalability, developing effective programming models that address the unique architecture features has presented many challenges. We present here a distributed shared memory (dSM) model supported in software transparently using C++ templated meta-programming techniques. The approach offers an extremely simple parallel programming model well suited for the architecture. Initial results are presented that demonstrate the approach and provide insight into the efficiency of the programming model and also the ability of the NoC to support a dSM without explicit control over data movement and localization.
The Adapteva Epiphany many-core architecture comprises a 2d tiled mesh Network-on-Chip (NoC) of low-power risc cores with minimal uncore functionality. It offers high computational energy efficiency for both integer a...
详细信息
The Adapteva Epiphany many-core architecture comprises a 2d tiled mesh Network-on-Chip (NoC) of low-power risc cores with minimal uncore functionality. It offers high computational energy efficiency for both integer and floating point calculations as well as parallel scalability. Yet despite the interesting architectural features, a compelling programming model has not been presented to date. This paper demonstrates an efficient parallel programming model for the Epiphany architecture based on the Message Passing Interface (MPI) standard. Using MPI exploits the similarities between the Epiphany architecture and a conventional parallel distributed cluster of serial cores. Our approach enables MPI codes to execute on the riscarray processor with little modification and achieve high performance. We report benchmark results for the threaded MPI implementation of four algorithms (dense matrix-matrix multiplication, N-body particle interaction, five-point 2d stencil update, and2d FFT) and highlight the importance of fast inter-core communication for the architecture. Published by Elsevier B.V.
The Adapteva Epiphany MIMd architecture is a scalable 2darray of risc cores with a fast network-on-chip (NoC) for parallel processing. The work presented here discusses the suitability of the architecture to handle s...
详细信息
ISBN:
(纸本)9781509035250
The Adapteva Epiphany MIMd architecture is a scalable 2darray of risc cores with a fast network-on-chip (NoC) for parallel processing. The work presented here discusses the suitability of the architecture to handle software defined radio (SdR) applications such as Finite Impulse Response (FIR) filters. This paper discusses implementation of the Hilbert filter through using the COPRTHR 2.0 SdK which includes Pthread-like interface for offloading the thread function. We present timing and performance results for our implementation.
The low-power Adapteva Epiphany riscarray processor offers high computational energy-efficiency and parallel scalability. However, extracting performance with a standard parallel programming model remains a great cha...
详细信息
The low-power Adapteva Epiphany riscarray processor offers high computational energy-efficiency and parallel scalability. However, extracting performance with a standard parallel programming model remains a great challenge. We present an effective programming model for the Epiphany architecture based on the Message Passing Interface (MPI) standard adapted for coprocessor offload. Using MPI exploits the similarities between the Epiphany architecture and a networked parallel distributed cluster. Furthermore, our approach enables codes written with MPI to execute on the riscarray processor with little modification. We present experimental results for matrix-matrix multiplication using MPI and highlight the importance of fast inter-core data transfers. Using MPI we demonstrate an on-chip performance of 9.1 GFLOPS with an efficiency of 15.3 GFLOPS/W. Threaded MPI exhibits the highest performance reported for the Epiphany architecture using a standard parallel programming model. (C) 2015 Elsevier B.V. All rights reserved.
暂无评论