Term Rewriting System (TRS) is a model of computation and it is used in various application such as algebraic specification. TRS has an inherent concurrency and it is suitable for parallel computing. We have already p...
详细信息
Term Rewriting System (TRS) is a model of computation and it is used in various application such as algebraic specification. TRS has an inherent concurrency and it is suitable for parallel computing. We have already proposed BOB (Bundle Of Branches), which is a mechanism of data management for parallel rewriting. We have proposed a model of parallel rewriting using BOB and implemented a TRS simulator based on this model on a shared memory parallel computer. Because it fully depends on the feature of a shared memory architecture, that is, a process can access any memory element, it is hard to transport it on a distributed memory parallel computer. In this paper, we propose autonomous BOB model. This model is suitable for a distributed memory architecture since a process uses message passing protocol and the method of load balancing is provided. We implement a TRS simulator using this model on a distributed memory architecture and it runs about 30 times faster on 64 processors than on a single processor.
作者:
Zhang, HuirongCao, JianwenChinese Acad Sci
Grad Univ Lab Parallel Software & Computat Sci Software Inst Software Beijing Peoples R China Chinese Acad Sci
Inst Software Lab Parallel Software & Computat Sci Software Beijing Peoples R China
In this paper, we consider second order elliptic ODE eigenproblems on general grids. We construct an efficient algorithm for computing the eigenvalue by using weighted mean combination of the linear finite element met...
详细信息
ISBN:
(纸本)9781479941698
In this paper, we consider second order elliptic ODE eigenproblems on general grids. We construct an efficient algorithm for computing the eigenvalue by using weighted mean combination of the linear finite element method and corresponding 2nd-order finite difference method. We first take the arithmetic mean of the two methods. Then we compute the quasi-optimal combined parameters for different eigenvalues to improve our efficient algorithm. The algorithm we construct convergence faster and have higher accuracy than the linear finite element method and corresponding 2nd-order finite difference method. Some numerical examples tested on both uniform meshes and nonuniform meshes are given to illustrate the computational cost of different numerical methods for solving eigenvalue problems. For efficiency, all the matrices use sparse storage in our algorithm.
In this paper we present and evaluate Inhambu, a distributed object-oriented system that supports the execution of data mining applications on clusters of PCs and workstations. This system provides a resource manageme...
详细信息
In this paper we present and evaluate Inhambu, a distributed object-oriented system that supports the execution of data mining applications on clusters of PCs and workstations. This system provides a resource management layer, built on the top of Java/RMI, that supports the execution of the data mining tool called Weka. We evaluate the performance of Inhambu by means of several experiments in homogeneous, heterogeneous and non-dedicated clusters. The obtained results are compared with those achieved by a similar system named Weka-parallel. Inhambu outperforms its counterpart for coarse grain applications, mainly for heterogeneous and non-dedicated clusters. Also, our system provides additional advantages such as application checkpointing, support for dynamic aggregation of hosts to the cluster, automatic restarting of failed tasks, and a more effective usage of the cluster. Therefore, Inhambu is a promising tool for efficiently executing real-world data mining applications. The software is delivered at the project's web site available at http://***/projects/inhambu/. (c) 2006 Elsevier Inc. All rights reserved.
Two kinds of parallel genetic algorithm (PGA) are implemented in this paper based on the MATLAB (R) parallel Computing Toolbox (TM) and distributed Computing Server T software. parallel for-loops, SPMD (Single Program...
详细信息
ISBN:
(纸本)9780769541105
Two kinds of parallel genetic algorithm (PGA) are implemented in this paper based on the MATLAB (R) parallel Computing Toolbox (TM) and distributed Computing Server T software. parallel for-loops, SPMD (Single Program Multiple Data) block and co-distributed arrays, three basic parallel programming modes in MATLAB are employed to accomplish the global and coarse-grained PGAs. To validate and compare our implementation, both PGAs are applied to run the problem of range image registration. A set of experiments have illustrated that it is convenient and effective to use MATLAB to parallelize the existing algorithms. At the same time, a higher speed-up and performance enhancement can be obtained obviously.
In this paper we consider the problem of programming for heterogeneous computer systems consist of CPUs and various accelerating devices such as GPUs. We introduce a few of the most popular models for heterogeneous pa...
详细信息
ISBN:
(纸本)9781538621622
In this paper we consider the problem of programming for heterogeneous computer systems consist of CPUs and various accelerating devices such as GPUs. We introduce a few of the most popular models for heterogeneous parallel programming, including OpenCL (Open Computing Language), CUDA (Compute Unified Device Architecture), OpenACC, OpenHMPP (Hybrid Multicore parallel Programming), C++ AMP (accelerated massive parallelism), HPL (Heterogeneous Programming Library), etc.
Expertise on distributedsystems is critical for system maintenance and improvement. However, it is challenging to keep the up-to-date knowledge from distributedsystems due to the complexity and continuous updates. H...
详细信息
ISBN:
(纸本)9781538694435
Expertise on distributedsystems is critical for system maintenance and improvement. However, it is challenging to keep the up-to-date knowledge from distributedsystems due to the complexity and continuous updates. Hence, computing platform providers study on how to extract knowledge directly from system behavior. In this paper, we propose a methodology called KEREP to automatically extract knowledge on distributed system behavior through request execution path. Technologies are devised to construct component structures, to depict the in-depth dynamic behavior and to identify the heartbeat mechanisms of target distributedsystems. Experiments on two real-world distributedsystems show the KEREP methodology extracts accurate knowledge of request processing and discovers undocumented features with good execution performance.
While modern large-scale computing tasks have grown to span many machines, each with many cores, traditional programming models have not kept up with these advancements, resulting in difficulty exploiting these comput...
详细信息
ISBN:
(纸本)9781424437511
While modern large-scale computing tasks have grown to span many machines, each with many cores, traditional programming models have not kept up with these advancements, resulting in difficulty exploiting these computing resources with only modest programmer effort. Thalweg seeks to address this breakdown in several ways. It provides a model for designing algorithms that have the potential to scale to multiple cores and machines, with subsequent optimization by software engineers. Based on this concept, Thalweg presents an API for handling these algorithms, for transferring data to and from nodes and coprocessors, and for verifying the correct operation of the hardware. Finally, Thalweg presents a set of concepts and a laboratory framework for pedagogical use that will educate the next generation of software engineers to operate in a world in which multi-core and distributed computing are everywhere.
distributedsoftware transactional memory (D-STM) is an emerging, alternative concurrency control model for distributedsystems that promises to alleviate the difficulties of lock-based distributed synchronization-e.g...
详细信息
ISBN:
(纸本)9780769546759
distributedsoftware transactional memory (D-STM) is an emerging, alternative concurrency control model for distributedsystems that promises to alleviate the difficulties of lock-based distributed synchronization-e.g., distributed deadlocks, livelocks, and lock convoying. We consider Herlihy and Sun's dataflow D-STM model, where objects are migrated to invoking transactions, and the closed nesting model of managing inner (distributed) transactions. We present a transactional scheduler called, reactive transactional scheduler (or RTS) to boost the throughput of closed-nested transactions. RTS determines whether a conflicting parent transaction must be aborted or enqueued according to the level of contention. If a transaction is enqueued, its nested inner transactions do not have to retrieve objects again, resulting in reduced communication delays. Our implementation of RTS in the HyFlow D-STM framework and experimental evaluations reveal that RTS improves throughput over D-STM without RTS, by as much as 88%.
The complexity of today's embedded real-time systems is continuously growing with high demands on dependability, resource-efficiency, and reusability Two solution approaches address these needs: First, in the comp...
详细信息
ISBN:
(纸本)9781424416936
The complexity of today's embedded real-time systems is continuously growing with high demands on dependability, resource-efficiency, and reusability Two solution approaches address these needs: First, in the component based softwareengineering (CBSE) paradigm, software is decomposed into self-contained components with explicit interactions and context dependencies. Connectors represent the abstraction of interactions between these components. Second, components can be shifted from software to reconfigurable hardware, typically field programmable gate arrays (FPGAs), in order to meet real-time constraints. This paper proposes a component-based concept to support efficient hardware/software co-design: A hardware component together with the hardware/soflware connector can seamlessly replace a software component with the same functionality, while the particularities of the alternative interaction are encapsulated in the component connector. Our approach provides for tools that can generate all necessary interaction mechanisms between hardware and software components. A proof-of-concept application demonstrates the advantages of our concept: Rapid change and comparison of different partitioning decisions due to automated and faultless generation of the hardware/software connectors.
暂无评论