As the satellite data gradually increases, the processing time of the algorithm for remote sensing also increases. It is now essential to make efforts to improve the processing performance in various accelerators such...
详细信息
ISBN:
(纸本)9781538671504
As the satellite data gradually increases, the processing time of the algorithm for remote sensing also increases. It is now essential to make efforts to improve the processing performance in various accelerators such as Graphics Processing Unit (GPU). However, it is very difficult to use programming models that make programs to run on accelerators. Especially for scientists, easy programming and performance are often more important than providing many optimization functions and directives. To meet these requirements, we developed the accelerated code generator based on Open Computing Language (OpenCL) for processing ocean color remote sensing data. We easily applied our generator to atmospheric correction program for GOCI data as a case study and the program's performance achieved about 29.2x that is as good as the hand-written OpenCL program. We look forward to many scientists that want to find a tool mentioned above taking advantage of our generator.
Hiding communication behind useful computation is an important performance programming technique but remains an inscrutable programming exercise even for the expert. We present Bamboo, a code transformation framework ...
详细信息
Hiding communication behind useful computation is an important performance programming technique but remains an inscrutable programming exercise even for the expert. We present Bamboo, a code transformation framework that can realize communication overlap in applications written in MPI without the need to intrusively modify the source code. We reformulate MPI source into a task dependency graph representation, which partially orders the tasks, enabling the program to execute in a data-driven fashion under the control of an external runtime system. Experimental results demonstrate that Bamboo significantly reduces communication delays while requiring only modest amounts of programmer annotation for a variety of applications and platforms, including those employing co-processors and accelerators. Moreover, Bamboo's performance meets or exceeds that of labor-intensive hand coding. The translator is more than a means of hiding communication costs automatically;it demonstrates the utility of semantic level optimization against a well-known library. (C) 2017 Elsevier Inc. All rights reserved.
We discuss early results with Toucan, a source-to-source translator that automatically restructures C/C++ MPI applications to overlap communication with computation. We co-designed the translator and runtime system to...
详细信息
ISBN:
(纸本)9781538639153
We discuss early results with Toucan, a source-to-source translator that automatically restructures C/C++ MPI applications to overlap communication with computation. We co-designed the translator and runtime system to enable dynamic, dependence-driven execution of MPI applications, and require only a modest amount of programmer annotation. Co-design was essential to realizing overlap through dynamic code block reordering and avoiding the limitations of static code relocation and inlining. We demonstrate that Toucan hides significant communication in four representative applications running on up to 24K cores of NERSC's Edison platform. Using Toucan, we have hidden from 33% to 85% of the communication overhead, with performance meeting or exceeding that of painstakingly hand-written overlap variants.
暂无评论