In this paper, we mainly study the parallelization aspects of the accelerated waveform relaxation algorithms for the transient simulation of semiconductor devices on paralleldistributed memory computers since these m...
详细信息
In this paper, we mainly study the parallelization aspects of the accelerated waveform relaxation algorithms for the transient simulation of semiconductor devices on paralleldistributed memory computers since these method are competitive with standard pointwise methods on serial machines, but are significantly faster on parallel computers. Here we propose an efficient variant of GMRES method, for solving the resulting sequence of time-varying sparse linear differential-algebraic initial-value problems (IVP) arising at each linearization step with waveform Newton. We will describe a more efficient alternative which avoids the global communication of inner products and only requires local communications on for massively distributed memory computers since the number of inner products represents the bottleneck of parallel performance. Experimental results carried out on Parsytec GC/PowerPlus with regards to the comparison with other accelerated approaches such as convolution SOR and waveform GMRES techniques on waveform relaxation algorithm and pointwise methods are described.
the complexity of emerging distributed applications mandates real-time analysis and tuning of system and application performance. To meet this need, a prototype system that integrates collaborative, immersive performa...
详细信息
the complexity of emerging distributed applications mandates real-time analysis and tuning of system and application performance. To meet this need, a prototype system that integrates collaborative, immersive performance visualization with real time performance measurement and adaptive control of applications on computational grids is designed. this system combines the SvPablo instrumentation system, the Autopilot real time adaptive control toolkit, and the Virtue virtual environment. the combination of these tools enables physically distributed collaborators to explore and steed, in real time, the behavior of complex software. Users can pose interactive queries and modify application parameters and behavior during execution.
this paper describes a technique that allows an MPI code to be encapsulated into a component. Our technique is based on an extension to the Common Object Request Broker Architecture (CORBA) from the OMG (Object Manage...
详细信息
this paper describes a technique that allows an MPI code to be encapsulated into a component. Our technique is based on an extension to the Common Object Request Broker Architecture (CORBA) from the OMG (Object Management Group). the proposed extensions do not modify the CORBA core infrastructure (the Object Request Broker) so that it can fully co-exist with existing CORBA applications. An MPI code is seen as a new kind of CORBA object that hides most of the cumbersome problems when dealing withparallelism. Such technique can be used to connect MPI codes to existing CORBA software infrastructures which are now being developed in the framework of several research and development projects such as JACO3, JULIUS or TENT from DLR. To illustrate the concept of parallel CORBA object, we present a virtual reality application that is made of the coupling of a light simulation application (radiosity) and a visualization tool using VRML and Java.
A methodology intended to avoid bottlenecks that typically arise as the result of data consumers that must access and process large amounts of data that has been generated and resides on other hosts, and which must pa...
详细信息
A methodology intended to avoid bottlenecks that typically arise as the result of data consumers that must access and process large amounts of data that has been generated and resides on other hosts, and which must pass through a central data cache prior to being used by the data consumer is described. It is based on a fundamental paradigm that the end result rendered by a data consumer can be produced using a reduced data set that has been distilled or filtered from the original data set. Data distribution bottlenecks for visualization applications are reduced by avoiding the transfer of large amounts of raw data in favor of considerably distilled visual data.
MATmarks is an extension of the MATLAB tool that enables shared memory programming on a network of workstations by adding a small set of commands. A high-level overview of the MATmarks system, the commands added to MA...
详细信息
MATmarks is an extension of the MATLAB tool that enables shared memory programming on a network of workstations by adding a small set of commands. A high-level overview of the MATmarks system, the commands added to MATLAB, and the performance gains achieved are presented. Performance results show that linear speedup can be achieved on a moderate number of workstations. While transforming a serial program into a shared memory parallel one is much easier than writing a message-based parallel program, for good performance one needs to be more careful coding a parallel algorithm.
Coupling computer resources to work on a given problem has been a successful strategy for years. Due to the existence of high bandwidth WAN it has become possible to couple big machines to build clusters that outperfo...
详细信息
Coupling computer resources to work on a given problem has been a successful strategy for years. Due to the existence of high bandwidth WAN it has become possible to couple big machines to build clusters that outperform the most powerful existing single computers. In the following we will speak of such a powerful cluster as a metacomputer. While for MPPs the message passing model has been standardized already 4 years ago, a standard for interoperable MPI has only been published recently. In this paper we briefly present an approach for MPI based metacomputing called PACX-MPI. We shortly describe other approaches, and give an overview of the technical concept.
We are developing a system for collaborative research and development for a distributed group of researchers at different institutions around the world. In a new paradigm for collaborative computational science, the c...
详细信息
We are developing a system for collaborative research and development for a distributed group of researchers at different institutions around the world. In a new paradigm for collaborative computational science, the computer code and supporting infrastructure itself becomes the collaborating instrument, just as an accelerator becomes the collaborating tool for large numbers of distributed researchers in particle physics. the design of this `Collaboratory' allows many users, with very different areas of expertise, to work coherently together, on distributed computers around the world. Different supercomputers may be used separately, or for problems exceeding the capacity of any single system, multiple supercomputers may be networked together through high speed gigabit networks. Central to this Collaboratory is a new type of community simulation code, called `Cactus'. the scientific driving force behind this project is the simulation of Einstein's equations for studying black holes, gravitational waves, and neutron stars, which has brought together researchers in very different fields from many groups around the world to make advances in the study of relativity and astrophysics. But the system is also being developed to provide scientists and engineers, without expert knowledge of parallel or distributed computing, mesh refinement, and so on, with a simple framework for solving any system of partial differential equations on many parallel computer systems, from traditional supercomputers to networks of workstations.
We present a performance-based methodology for designing a high-bandwidth radar application on commodity platforms. Unlike many real-time systems, our approach works for commodity processors running commodity operatin...
详细信息
We present a performance-based methodology for designing a high-bandwidth radar application on commodity platforms. Unlike many real-time systems, our approach works for commodity processors running commodity operating systems. Our technique is innovative because it uses stochastic models of the processing time at each step in the process to allow for the variabilities of running on a non-realtime operating system. We show how our system synthesizes the runtime parameters for a synthetic aperture radar application under a variety of loading conditions.
Withthe proliferation of workstation clusters connected by high-speed networks, providing efficient system support for concurrent applications engaging in nontrivial interaction has become an important problem. Two p...
详细信息
Withthe proliferation of workstation clusters connected by high-speed networks, providing efficient system support for concurrent applications engaging in nontrivial interaction has become an important problem. Two principal barriers to harnessing parallelism are: one, efficient mechanisms that achieve transparent dependency maintenance while preserving semantic correctness, and two, scheduling algorithms that match coupled processes to distributed resources while explicitly incorporating their communication costs. this paper describes a set of performance features, their properties, and implementation in a system support environment called DUNES that achieves transparent dependency maintenance - IPC, file access, memory access, process creation/termination, process relationships - under dynamic load balancing. the two principal performance features are push/pull-based active and passive end-point caching and communication-sensitive load balancing. Collectively, they mitigate the overhead introduced by the transparent dependency maintenance mechanisms. Communication-sensitive load balancing, in addition, affects the scheduling of distributed resources to application processes where both communication and computation costs are explicitly taken into account. DUNES' architecture endows commodity operating systems withdistributed operating system functionality while achieving transparency with respect to their existing application base. DUNES also preserves semantic correctness with respect to single processor semantics. We show performance measurements of a UNIX based implementation on Sparc and x86 architectures over high-speed LAN environments. We show that significant performance gains in terms of system throughput and parallel application speed-up are achievable.
In this paper, we present ParaPART, a parallel version of a mesh partitioning tool, called PART, for distributed systems. PART takes into consideration the heterogeneities in processor performance, network performance...
详细信息
暂无评论