Runtime systems are critical to the implementation of concurrent object-oriented programming languages. This paper describes a concurrent object-oriented programming language - Balinda C++, running on a distributed me...
详细信息
Runtime systems are critical to the implementation of concurrent object-oriented programming languages. This paper describes a concurrent object-oriented programming language - Balinda C++, running on a distributed memory system and its runtime implementation. The runtime system is built on the top of the Nexus communication library. The tuplespace is the key of Balinda C++. A distributed tuplespace model is presented to improve data locality. Some experiments have been done to verify our model. The results indicate that our model is effective to improve system performance.
Dataflow computation models enable simpler and more efficient management of the memory hierarchy - a key barrier to the performance of many parallel programs. This paper enumerates some advantages of using the dataflo...
详细信息
Dataflow computation models enable simpler and more efficient management of the memory hierarchy - a key barrier to the performance of many parallel programs. This paper enumerates some advantages of using the dataflow model; it argues that the programming model is simple and easily managed by a programmer and demonstrates some of the efficiencies that the dataflow model allows an underlying run-time system to achieve.
parallel simulation has the potential to accelerate the execution of simulation applications. However developing a parallel discrete-event simulation from scratch requires an in-depth knowledge of the mapping process ...
详细信息
parallel simulation has the potential to accelerate the execution of simulation applications. However developing a parallel discrete-event simulation from scratch requires an in-depth knowledge of the mapping process from the physical model to the simulation model, and a substantial effort in optimising performance. This paper presents an overview of the SPaDES (Structured parallel Discrete-Event Simulation) parallel simulation framework. We focus on the performance analysis of SPaDES/C++, an implementation of SPaDES on a distributed-memory Fujitsu AP3000 parallel computer. SPaDES/C++ hides the underlying complex parallel simulation synchronization and parallelprogramming details from the simulationist. Our empirical results show that the SPaDES framework can deliver good speedup if the process granularity is properly optimised.
Run-time systems are critical to the implementation of concurrent object oriented programming languages. The paper describes a concurrent object oriented programming language, Balinda C++, running on a distributed mem...
详细信息
Run-time systems are critical to the implementation of concurrent object oriented programming languages. The paper describes a concurrent object oriented programming language, Balinda C++, running on a distributed memory system and its run-time implementation. The run-time system is built on the top of the Nexus communication library. The tuplespace is the key to Balinda C++. A distributed tuplespace model is presented to improve data locality. Some experiments have been done to verify our model. The results indicate that our model is effective at improving system performance.
parallel simulation has the potential to accelerate the execution of simulation applications. However, developing a parallel discrete-event simulation from scratch requires an in-depth knowledge of the mapping process...
详细信息
parallel simulation has the potential to accelerate the execution of simulation applications. However, developing a parallel discrete-event simulation from scratch requires an in-depth knowledge of the mapping process from the physical model to the simulation model, and a substantial effort in optimising performance. This paper presents an overview of the SPaDES (Structured parallel Discrete-Event Simulation) parallel simulation framework. We focus on the performance analysis of SPaDES/C++, an implementation of SPaDES on a distributed-memory Fujitsu AP3000 parallel computer. SPaDES/C++ hides the underlying complex parallel simulation synchronization and parallelprogramming details from the simulationist. Our empirical results show that the SPaDES framework can deliver good speedup if the process granularity is properly optimised.
Gang scheduling has been widely used as a practical solution to the dynamic parallel job scheduling problem. parallel threads of a single job are scheduled for simultaneous execution on a parallel computer even if the...
详细信息
Gang scheduling has been widely used as a practical solution to the dynamic parallel job scheduling problem. parallel threads of a single job are scheduled for simultaneous execution on a parallel computer even if the job does not fully utilize all available processors. Non allocated processors go idle for the duration of the time quantum assigned to the threads. In this paper we propose a class of scheduling policies, dubbed Concurrent Gang, that is a generalization of gang-scheduling, and allows for the flexible simultaneous scheduling of multiple parallel jobs, thus improving the space sharing characteristics of gang scheduling. However, all the advantages of gang scheduling such as responsiveness, efficient sharing of resources, ease of programming, etc., are maintained.
While Java has provided a mechanism for concurrent programming implemented as language constructs, it is too rudimentary for most programmers and has certain limitations that make programs unnecessarily complex and pr...
详细信息
While Java has provided a mechanism for concurrent programming implemented as language constructs, it is too rudimentary for most programmers and has certain limitations that make programs unnecessarily complex and prevent fine-grained concurrency. We have implemented Java4P, an extension of the Java language, that offers a simpler concurrency model and overcomes Java's limitations. Threads are no longer associated with thread objects, allowing concurrency at any level of granularity. Thread creation is made implicit and synchronisation is achieved through method guards. Synchronisation specification is separated from the functional specification to provide a parallelprogramming model closer to sequential programming.
Gang scheduling has been widely used as a practical solution to the dynamic parallel job scheduling problem. parallel threads of a single job are scheduled for simultaneous execution on a parallel computer even if the...
详细信息
Gang scheduling has been widely used as a practical solution to the dynamic parallel job scheduling problem. parallel threads of a single job are scheduled for simultaneous execution on a parallel computer even if the job does not fully utilize all available processors. Non allocated processors go idle for the duration of the time quantum assigned to the threads. In this paper we propose a class of scheduling policies, dubbed concurrent gang, that is a generalization of gang-scheduling, and allows for the flexible simultaneous scheduling of multiple parallel jobs, thus improving the space sharing characteristics of gang scheduling. However all the advantages of gang scheduling such as responsiveness, efficient sharing of resources, ease of programming, etc., are maintained.
For an effective Internet-based distributed parallel computing platform, Java-Internet Computing Environment (JICE) is designed and implemented with multithreading and remote method invocation mechanisms provided in J...
详细信息
For an effective Internet-based distributed parallel computing platform, Java-Internet Computing Environment (JICE) is designed and implemented with multithreading and remote method invocation mechanisms provided in Java. Specifically, JICE supports a shared memory system model for communication between any two nodes. Under the JICE, communication time is a major candidate of performance bottleneck. To reduce this communication overhead, a method of grouping is designed based on the optimal communication time. Communication performance given by grouping is evaluated through the analysis of execution time and verified via experiments. The results show that communication time can be reduced about 80% from executing some Java benchmarks on JICE.
This paper introduces a new parallel performance profiling system for the Bulk Synchronous parallel (BSP) model. The profiling system, called BSP Pro, consists of a performance profiling tool, BSP Profiler, and a perf...
详细信息
This paper introduces a new parallel performance profiling system for the Bulk Synchronous parallel (BSP) model. The profiling system, called BSP Pro, consists of a performance profiling tool, BSP Profiler, and a performance visualisation tool, BSP Visualiser. The aim of BSP Pro is to assist in the analysis and improvement of BSP program performance by minimising load imbalance among processes. BSP Pro is different from other systems, such as the profiling tools within the Oxford BSP toolset, in terms of both its features and its implementation. It uses BSP Profiler to trace and generate more comprehensive profiling information resulting from BSP program executions. The profiling information is then visualised and shown as performance profiling graphs using BSP Visualiser. The visualising component of BSP Pro is fully developed in Java and utilises Java graphics to expose and highlight process load imbalance in both computation and interprocess communication.
暂无评论