HPRC (High-Performance Reconfigurable Computing) systems include multicore processors and reconfigurable devices acting as custom coprocessors. Due to economic constraints, the number of reconfigurable devices is usua...
详细信息
HPRC (High-Performance Reconfigurable Computing) systems include multicore processors and reconfigurable devices acting as custom coprocessors. Due to economic constraints, the number of reconfigurable devices is usually smaller than the number of processor cores, thus preventing that a 1:1 mapping between cores and coprocessors could be achieved. This paper presents a solution to this problem, based on the virtualization of reconfigurable coprocessors. A Virtual Coprocessor Monitor (VCM) has been devised for the XtremeData XD2000i In-Socket Accelerator, and a thread-safe API is available for user applications to communicate with the VCM. Two reference applications, an IDEA cipher and an Euler CFD solver, have been implemented in order to validate the proposed architecture and execution model. Results show that the benefits arising from coprocessor virtualization outperform its overhead, specially when code has a significant software weight. (c) 2012 Elsevier B.V. All rights reserved.
Current distributed computing systems comprising of commodity computers like Network of Workstations (NOW) are obliged to deploy multicore processors to raise their performance. However, because multicore processors w...
详细信息
High-performance architecture and compilation are the foundation on which the modern computer systems are built. The two sub-topics are very strongly related and only in combination can deliver performance levels we c...
详细信息
ISBN:
(纸本)9783642328206
High-performance architecture and compilation are the foundation on which the modern computer systems are built. The two sub-topics are very strongly related and only in combination can deliver performance levels we came to expect from systems. The topic is quite broad, with sub-areas of interest ranging from multicore and multi-threaded processors to large-scale parallel machines, and from program analysis, program transformation, automatic discovery and management of parallelism, programmer productivity tools, concurrent and sequential languages, and other compiler issues.
In this paper we lay out a case for the use of dataflow programming and the CAL language as a way of addressing current challenges in programming parallel hardware such as multicore systems and FPGAs. We show how the ...
详细信息
ISBN:
(纸本)9781467350518
In this paper we lay out a case for the use of dataflow programming and the CAL language as a way of addressing current challenges in programming parallel hardware such as multicore systems and FPGAs. We show how the design of the CAL language balances conflicting concerns of expressiveness, analyzability, and implementability, making it a promising tool for the implementation of parallel stream processing applications. The language itself as well as the design considerations are presented and illustrated with a number of different use cases from a wide range of application domains.
Requirements for efficient parallelization of many complex and irregular applications can be cast as a hypergraph partitioning problem. The current-state-of-the art software libraries that provide tool support for the...
详细信息
ISBN:
(纸本)9780769546759
Requirements for efficient parallelization of many complex and irregular applications can be cast as a hypergraph partitioning problem. The current-state-of-the art software libraries that provide tool support for the hypergraph partitioning problem are designed and implemented before the game-changing advancements in multi-core computing. Hence, analyzing the structure of those tools for designing multithreaded versions of the algorithms is a crucial tasks. The most successful partitioning tools are based on the multi-level approach. In this approach, a given hypergraph is coarsened to a much smaller one, a partition is obtained on the the smallest hypergraph, and that partition is projected to the original hypergraph while refining it on the intermediate hypergraphs. The coarsening operation corresponds to clustering the vertices of a hypergraph and is the most time consuming task in a multi-level partitioning tool. We present three efficient multithreaded clustering algorithms which are very suited for multi-level partitioners. We compare their performance with that of the ones currently used in today's hypergraph partitioners. We show on a large number of real life hypergraphs that our implementations, integrated into a commonly used partitioning library PaToH, achieve good speedups without reducing the clustering quality.
In this paper five kinds of typical multi-core processers are compared from thread cache inter-core interconnect and etc. Two kinds of multi-core programming environments and some new programming languages are introdu...
详细信息
Parallelization is attractive for speeding up dynamic program analysis on multicores. However, inter-thread communication overhead may outweigh any benefit from parallel execution. We propose deferred methods, a high-...
详细信息
This paper implements basic computational kernels of the scientific computing such as matrix - vector product, matrix product and Gaussian elimination on multi-core platforms using several parallel programming tools. ...
详细信息
The reconfigurable data-stream hardware software architecture (Redsharc) is a programming model and network-on-a-chip solution designed to scale tomeet the performance needs ofmulti-core Systems on a programmable chip...
详细信息
The reconfigurable data-stream hardware software architecture (Redsharc) is a programming model and network-on-a-chip solution designed to scale tomeet the performance needs ofmulti-core Systems on a programmable chip (MCSoPC). Redsharc uses an abstract API that allows programmers to develop systems of simultaneously executing kernels, in software and/or hardware, that communicate over a seamless interface. Redsharc incorporates two on-chip networks that directly implement the API to support high-performance systems with numerous hardware kernels. This paper documents the API, describes the common infrastructure, and quantifies the performance of a complete implementation. Furthermore, the overhead, in terms of resource utilization, is reported along with the ability to integrate hard and soft processor cores with purely hardware kernels being demonstrated.
On the basis of traditional image-based diagnostic, there have been many research results shown that the comparative analysis would be more comprehensive, intuitive and targeted. In actual clinical environment, the co...
详细信息
ISBN:
(纸本)9781467311830
On the basis of traditional image-based diagnostic, there have been many research results shown that the comparative analysis would be more comprehensive, intuitive and targeted. In actual clinical environment, the comparison of medical imaging data on the clinician workstations is inefficient and cumbersome due to the limited support of an integrated way for presenting these data, and this leads to the emergence of the integrated visualization of these data. With the large amount of medical imaging data distributed in achieve servers, the integrated visualization on traditional clinician workstations often lead to poor user interaction. Based on the relatively mature framework of traditional clinician workstations, this paper has located the bottlenecks of data transmission and data parsing. In addition, it provides several optimization schemas against these bottlenecks, such as Streaming concept, multi-core programming and an optimization based on Intel IPP, to enhance the user interaction for the integrated visualization of imaging data. This optimized framework has been applied to many hospitals and proven to meet the clinical requirements.
暂无评论