fWe introduce a dialect of Java, JShield, for concurrent object oriented programming, whose primary design goal is robustness. JShield preserves the Java syntax and the semantics of sequential Java programs. It modifi...
详细信息
fWe introduce a dialect of Java, JShield, for concurrent object oriented programming, whose primary design goal is robustness. JShield preserves the Java syntax and the semantics of sequential Java programs. It modifies the semantics of concurrent programs to completely avoid data races without relying on the programmer ability, and without requiring special annotations. This is achieved by combining Hoare's monitors with remote method invocations of Java to ensure the proper request of a lock before manipulating shared data. We show that this can be done with a reasonable overhead in execution time compared to Java.
The adaptive parallelism environment is introduced as a means of effectively utilizing MPP processing resources in a multi-programmed MIMD or distributed system. It achieves this by dynamically calculating the optimal...
详细信息
The adaptive parallelism environment is introduced as a means of effectively utilizing MPP processing resources in a multi-programmed MIMD or distributed system. It achieves this by dynamically calculating the optimal number of threads for a given program at runtime. The optimality calculation considers both the number of processing elements available to handle additional workloads and the estimated computational speedup gained by using additional threads versus the communications overhead of using additional threads. The adaptive parallelism environment is composed of three sections: a load balancer, which migrates threads and provides information regarding the availability of processing elements, a code analyzer, which estimates the number and composition of instructions in a potential thread as well as the number of communications needed to synchronize the potential thread, and a runtime environment, which gathers information from the load balancer and code analyzer and performs the runtime calculations to estimate the optimal number of threads.
New high-level control mechanisms for design of parallel programs are introduced in the paper. Special synchronizer processes collect information on application parallel process states and construct strongly consisten...
详细信息
ISBN:
(纸本)0769520693
New high-level control mechanisms for design of parallel programs are introduced in the paper. Special synchronizer processes collect information on application parallel process states and construct strongly consistent global states, using time interval timestamps. Based on consistent global states, synchronization and execution control predicates are evaluated by the synchronizers. As a result, synchronization/control signals are sent to application processes. The signals can trigger asynchronous computation activation or cancellation. The proposed synchronization framework is integrated with a message passing system and included into GRADE graphical parallel program design environment. Architecture and implementation aspects of such system are discussed.
The following topics are dealt with: Grid and distributed computing; scheduling task systems; shared-memory multiprocessors; imaging and visualization; testing and debugging; performance analysis and real-time systems...
详细信息
The following topics are dealt with: Grid and distributed computing; scheduling task systems; shared-memory multiprocessors; imaging and visualization; testing and debugging; performance analysis and real-time systems; scheduling for heterogeneous resources; networking; peer-to-peer and mobile computing; compiler technology and run-time systems; load balancing; network routing; parallel programming models; parallel algorithms; scheduling and storage; parallel and distributed performance; software for high performance clusters; decentralized algorithms; multithreading and VLIW; parallel and distributed real-time systems; high-level parallel programming models and supportive environments; Java for parallel and distributed computing; nature inspired distributed computing; high performance computational biology; advances in parallel and distributed computational models; reconfigurable architectures; communication architecture for clusters; next generation systems; fault-tolerant parallel and distributed systems; wireless, mobile and ad hoc networks; parallel and distributed image processing, video processing, and multimedia; formal methods for parallel programming; Internet computing and e-commerce; parallel and distributed scientific and engineering computing with applications; massively parallel processing; performance modeling, evaluation, and optimization of parallel and distributed systems; and parallel and distributed systems: testing and debugging.
We have developed a finite element method (FEM) software repository tool named feelfem that serves as a code generator. One important feature of feelfem is that it is designed to generate various program models of FEM...
详细信息
We have developed a finite element method (FEM) software repository tool named feelfem that serves as a code generator. One important feature of feelfem is that it is designed to generate various program models of FEM analysis, including users' own newly developed numerical schemes. Another feature is that interfaces to newly developed parallel programming paradigms and parallel solvers can easily be added to it. Software reuse is an important target of the feelfem system. To achieve flexibility and expandability for the system, we adopt an object-oriented technique and implementation-oriented pseudo-code representation of numerical algorithms. In its latest released version, feelfem has strong interaction with the personal pre/post processor GiD. By using a combination of feelfem and GiD, users can generate prototype parallel FEM applications with newly developed solvers very easily and quickly.
In this paper we describe how to apply powerful performance analysis techniques to understand the behavior of multilevel parallel applications. We use the Paraver/OMPItrace performance analysis system for our study. T...
详细信息
In this paper we describe how to apply powerful performance analysis techniques to understand the behavior of multilevel parallel applications. We use the Paraver/OMPItrace performance analysis system for our study. This system consists of two major components: The OMPItrace dynamic instrumentation mechanism, which allows the tracing of processes and threads and the Paraver graphical user interface for inspection and analyses of the generated traces. We apply the system to conduct a detailed comparative study of a benchmark code implemented in five different programming paradigms applicable for shared memory computer architectures.
The X4CP32 is a parallel/reconfigurable microprocessor with 2 programming levels. Although it is a general-purpose microprocessor, it has the reliable performance of a reconfigurable architecture. We expose its archit...
详细信息
The X4CP32 is a parallel/reconfigurable microprocessor with 2 programming levels. Although it is a general-purpose microprocessor, it has the reliable performance of a reconfigurable architecture. We expose its architecture and programming levels, and discuss the powerful interaction between parallel programming and reconfiguration. It shows two performance-optimized implementations of matrix multiplication using both parallel and reconfigurable paradigms and a parallel implementation of miner intelligent agents.
We propose general purposes natural heuristics for static block and block-cyclic heterogeneous data decomposition over processes of parallel program mapped into multidimensional grid. This heuristics is an extension o...
详细信息
We propose general purposes natural heuristics for static block and block-cyclic heterogeneous data decomposition over processes of parallel program mapped into multidimensional grid. This heuristics is an extension of the intuitively clear heterogeneous data distribution for one-dimensional case. It is compared to advanced heuristics for heterogeneous data decomposition proposed for solving linear algebra problems on two-dimensional process grid. We experimentally show that for typical local network (12 Windows 2000 PCs interconnected via Fast Ethernet switch) and for typical linear algebra problems these two heuristics have almost the same efficiency. We demonstrate efficiency of the proposed natural decomposition for case of three-dimensional process grid on the example of 3D modeling of supernova explosion.
暂无评论