Conventional software speculative parallel models are facing challenge due to the increasing number of the processor core and the diversification of the application. The performance of the guest program under the soft...
详细信息
ISBN:
(纸本)9781479914487
Conventional software speculative parallel models are facing challenge due to the increasing number of the processor core and the diversification of the application. The performance of the guest program under the software speculative parallel execution model is closely related to the speculation accuracy, the control overhead and the rollback overhead of the model. In order to improve the speculative accuracy and the load balance, as well as improve the overhead of the conventional model, in this paper, we proposed a novel speculative parallel model named HEUSPEC. The HEUSPEC includes 2 key techniques, the heuristic value prediction(HVP) and the dynamic task granularity resizing(DTGR). We have implemented the runtime system of the model in ANSI C language. The experiment results show that when the speedup of the HEUSPEC model can reach 4.51 on the average (12% higher than conventional model) when speculative depth equals to 7. Besides, it shows good scalability and lower memory cost.
We propose to design a source-to-source parallelizing tool that transforms an incorrectly synchronized Jomp parallel program into a correctly synchronized and well-optimized program. The tool deals with the primitive ...
详细信息
We propose to design a source-to-source parallelizing tool that transforms an incorrectly synchronized Jomp parallel program into a correctly synchronized and well-optimized program. The tool deals with the primitive variables accesses (read/write). The approach consists of (1) analyzing variables accesses and dependences between statements, (2) slicing the program into independent subsets of statements (3) combining all accesses of each variable (4) and finally enforcing a set of transformation rules. The tool aims to be less error-prone and to generate optimized code. We've proved the feasibility of that tool by applying by hand the transformation rules. Experimental results reveal that programs dealt with the tool can achieve better performance gain.
A Motif based graphical tool XTracker is described. XTracker can show Gannt-like charts of the activities on each node or it can show the event messages as traffic between simulation objects. XTracker can take its dat...
详细信息
ISBN:
(纸本)9780818671203
A Motif based graphical tool XTracker is described. XTracker can show Gannt-like charts of the activities on each node or it can show the event messages as traffic between simulation objects. XTracker can take its data from sequential simulation runs and simulate a parallel execution under a number of simulation methods. XTracker can act as a performance modeling tool.
Component-based programming has been applied to address the requirements of applications in high performance computing (HPC). The usual service connectors of commercial component models do not fit some requirements of...
详细信息
Component-based programming has been applied to address the requirements of applications in high performance computing (HPC). The usual service connectors of commercial component models do not fit some requirements of HPC, mainly regarding the support of parallelism, however. This paper looks at extensions to the usual notion of service connector to meet such requirements, using the # component model as a substratum, evidencing its expressiveness.
Presents reference cards that describe the special features, system design, memory capabilities, platform layers, and processing capabilities of the Khronos Group's OpenCL (Open Computing Language) computing platf...
详细信息
Presents reference cards that describe the special features, system design, memory capabilities, platform layers, and processing capabilities of the Khronos Group's OpenCL (Open Computing Language) computing platform.
Future of the computer based systems resides in the multi-core and many-core architectures. Thanks to availability of different multi-core processors, many parallelization tools and techniques emerged. However, majori...
详细信息
ISBN:
(纸本)9781479908530
Future of the computer based systems resides in the multi-core and many-core architectures. Thanks to availability of different multi-core processors, many parallelization tools and techniques emerged. However, majority of them rely on the shared memory architecture model, where data to multiple core processors is simply accessible. In this paper we present a simple hardware abstraction that targets features of a multicore DSP processor with distributed memory architecture, aiming support for program parallelization. Both manual and automatic code parallelization approaches can use library routines described in this paper. By validating performance of multiple manually created test cases we demonstrate capabilities of presented approach. Performance is estimated by measuring time necessary for DMA data transfer between the cores using GPIO pins attached to the DSP. In addition, earlier developed C code parallelization technique for the same DSP is extended to use this library providing full working solution verified on real hardware.
A file data model for algorithmic skeletons is proposed, focusing on transparency and efficiency. Algorithmic skeletons correspond to a high-level programming model that takes advantage of nestable programming pattern...
详细信息
A file data model for algorithmic skeletons is proposed, focusing on transparency and efficiency. Algorithmic skeletons correspond to a high-level programming model that takes advantage of nestable programming patterns to hide the complexity of parallel/distributed applications. Transparency is achieved using a workspace factory abstraction and the proxy pattern to intercept calls on File type objects. Thus allowing programmers to continue using their accustomed programming libraries, without having the burden of explicitly introducing non-functional code to deal with the distribution aspects of their data. A hybrid file fetching strategy is proposed (instead of lazy or eager), that takes advantage of annotated functions and pipelined multithreaded interpreters to transfer files in-advance or on-demand. Experimentally, using a BLAST skeleton application, it is shown that the hybrid strategy provides a good tradeoff between bandwidth usage and CPU idle time.
We present on-going research concerning an object oriented associative memory. It is a massively parallel architecture, fully programmable and configurable, composed of VLSI circuits. It is well adapted to genome data...
详细信息
We present on-going research concerning an object oriented associative memory. It is a massively parallel architecture, fully programmable and configurable, composed of VLSI circuits. It is well adapted to genome data processing. DNA and proteins sequences alignment is a very important application in biology research. Standard sequences alignment usually entails software implementation with disastrous execution time (typically one year). A parallel implementation can solve such a problem. Rapid-2 will be able to execute and accelerate genome tasks. We programmed and simulated several variants of the Needleman and Wunsch algorithm (1970) with progressive complexity (gaps, Dayhoff mutation matrix). Execution time evaluations of all these methods are presented. We consider that Rapid-2 could improve software implementation of a factor 100.< >
A programming tool, called parallelizer, for the static optimization of concurrent programs is considered. The tool partitions the alternative command lists of a nondeterministic iterative command into distinct elemen...
详细信息
A programming tool, called parallelizer, for the static optimization of concurrent programs is considered. The tool partitions the alternative command lists of a nondeterministic iterative command into distinct elements that are concurrently executed. To improve the program's performance, the tool determines a decomposition where the granularity of the resulting processes is close to optimal for the target parallel architecture. This requires that some parameters of the target architecture are taken into account. Search techniques traditionally used in artificial intelligence are exploited to determine an optimal alternative assignment. The implementation of the parallelizer is described and an example of its application is considered.< >
暂无评论