Massively parallel computers consisting of a large number of processing elements have been developed and expected as high performance computers in advanced science and technology. Practical parallel computation model ...
详细信息
Massively parallel computers consisting of a large number of processing elements have been developed and expected as high performance computers in advanced science and technology. Practical parallel computation model has been required to analyze parallel algorithms on massively parallel computers. We present a practical parallel computation model LogPQ taking account of communication queues into the LogP model. The LogPQ model has three queues for each communication line, and four supplement parameters in addition to the LogP model. This paper addresses the performance of parallel matrix multiplication using the LogPQ model. The parallel performances on the parallel machine CM-5 are compared between the LogP and LogPQ model. It is seen that the LogPQ model expects the execution times more accurately than the LogP model.
Multicast (group) communications have been widely recognized by current research and industry. Multicast is very useful for various network applications such as distributed (replicated) database, video/audio conferenc...
详细信息
Multicast (group) communications have been widely recognized by current research and industry. Multicast is very useful for various network applications such as distributed (replicated) database, video/audio conference, information distribution and server locations etc. But design and implementation of such multicast communication systems in networks are complicated tasks, especially when quality of services (QoS) of applications such as real-time and reliability are desired. To quick design and implement multicast communication, good tools are crucial and must be facilitated. This paper presents a novel object-oriented (O-O) QoS driven approach for the quick design and prototyping of multicast communication systems under certain QoS requirements for multicast message transmission and receptions such as real-time, total ordering, atomicity and fault-tolerance etc.
This paper briefly reviews some of the more popular parallel-computer models-pipelined optical bus and OTIS interconnect models-that employ optical interconnect. The interconnect topology and some simple algorithms fo...
详细信息
This paper briefly reviews some of the more popular parallel-computer models-pipelined optical bus and OTIS interconnect models-that employ optical interconnect. The interconnect topology and some simple algorithms for each model are also described.
A key issue for parallel systems is the development of useful programming abstractions that can coexist with good performance. We describe a communication library that supports an object-based abstraction with a bulk-...
详细信息
A key issue for parallel systems is the development of useful programming abstractions that can coexist with good performance. We describe a communication library that supports an object-based abstraction with a bulk-synchronous communication style;this is the first time such a library has been proposed and implemented. By restricting the library to the exclusive use of barrier synchronization, we are able to design a simple and easy-to-use object system. By exploiting established techniques based on the bulk-synchronous parallel (BSP) model, we are able to design algorithms and library implementations that work well across platforms.
The investigation of strong quantum effects is one of the major research areas in physics with important application aspects, like the high temperature superconductors. The only reliable approaches to these systems ar...
详细信息
Many naive parallel processing schemes were not as successful as many researchers thought, because of the heavy cost of communication and synchronization resulting from parallelization. In this paper, we identify the ...
详细信息
Many naive parallel processing schemes were not as successful as many researchers thought, because of the heavy cost of communication and synchronization resulting from parallelization. In this paper, we identify the reasons for this poor performance and the compiler requirements for performance improvement. We realized that the parallelizing decisions should be derived from the overhead information. We added this idea to the automatic parallelizing compiler SUIF. We substituted the original backend of SUIF with our backend using MPI, and gave it the capability to validate parallelization decisions based on overhead parameters. This backend converts shared memory-based parallel programs into distributed memory-based parallel programs with MPI function calls without excessive parallelization, which causes performance degradation.
The proceedings contain 137 papers. The special focus in this conference is on High-Level parallelprogramming Models and Supportive Environments. The topics include: Efficient program partitioning based communication...
ISBN:
(纸本)3540658319
The proceedings contain 137 papers. The special focus in this conference is on High-Level parallelprogramming Models and Supportive Environments. The topics include: Efficient program partitioning based communication;a flexible base for transparent shared memory programming models on clusters of PCs;flexible collective operations for distributed object groups;a framework for performance evaluation of scalable computing;recursive individually distributed objects;a flexible combination of on-stack execution and work-stealing;an automatic distribution front-end for java;concurrent language support for interoperable applications;on the distributed implementation of aggregate data structures by program transformation;a transformational framework for skeletal programs;implementing a non-strict functional programming language on a threaded architecture;the biological basis of the immune system as a model for intelligent agents;a formal definition of the phenomenon of collective intelligence and its IQ measure;implementation of data flow logical operations via self-assembly of DNA;a parallel hybrid evolutionary metaheuristic for the period vehicle routing problem;distributed scheduling with decomposed optimization criterion;a parallel genetic algorithm for task mapping on parallel machines;evolution-based scheduling of fault-tolerant programs on multiple processors;a genetic-based fault-tolerant routing strategy for multiprocessor networks;regularity considerations in instance-based locality optimization;parallel ant colonies for combinatorial optimization problems;an analysis of synchronous and asynchronous parallel distributed genetic algorithms with structured and panmictic islands and GA-based parallel image registration on parallel clusters.
The paper introduces a new parallel performance profiling system for the Bulk Synchronous parallel (BSP) model. The profiling system, called BSP Pro, consists of a performance profiling tool, BSP Profiler and a perfor...
详细信息
The paper introduces a new parallel performance profiling system for the Bulk Synchronous parallel (BSP) model. The profiling system, called BSP Pro, consists of a performance profiling tool, BSP Profiler and a performance visualisation tool, BSP Visualiser. The aim of BSP Pro is to assist in the analysis and improvement of BSP program performance by minimising load imbalance among processes. BSP Pro is different from other systems, such as the profiling tools within the Oxford BSP toolset, in terms of both its features and its implementation. It uses BSP Profiler to trace and generate more comprehensive profiling information resulting from BSP program executions. The profiling information is then visualised and shown as performance profiling graphs using BSP Visualiser. The visualising component of BSP Pro is fully developed in Java and utilises Java graphics to expose and highlight process load imbalance in both computation and interprocess communication.
A new approach to develop parallel and distributed scheduling algorithms for multiprocessor systems is proposed. Its main innovation ties in evolving a decomposition of the global optimization criteria. For this purpo...
详细信息
The choice of sharing model (objects vs. plain shared memory), memory consistency model and coherence protocol are all fundamental aspects of the design of distributed shared data systems. Unfortunately, no single sha...
详细信息
The choice of sharing model (objects vs. plain shared memory), memory consistency model and coherence protocol are all fundamental aspects of the design of distributed shared data systems. Unfortunately, no single sharing model, memory model or coherence protocol is suitable for all applications. In this paper, we describe the design of an extensible distributed shared data framework, called the Extensible Coherence Interface (ECI), that gives the application layer complete control over how data is shared, including whether to share data on a per-object basis or per-page, which memory model to use, and how coherence is enforced for a region of shared memory. The ECI implementation itself simply acts as a layer for transporting and dispatching distributed events among objects at the application layer.
暂无评论