Computational efficiency and memory scalability are major subjects in large-scale reservoir *** order to solve large-scale problems within required time,parallel codes running on distributed/shared memory systems are ...
详细信息
Computational efficiency and memory scalability are major subjects in large-scale reservoir *** order to solve large-scale problems within required time,parallel codes running on distributed/shared memory systems are required. This paper presents a parallel simulator developed especially for distributed memory systems,and some numerical results of test cases on parallel computers are *** show that this parallel code behaves satisfactory scalability and computational speedup. The paper also presents Krylov subspace methods with two types of preconditioners,one based on ILU decomposition and the other based on an iterative *** tests show that different Krylov subspace methods with an appropriate preconditioner are able to achieve similar performance.
We present the design, implementation and evaluation of a framework that uses JavaSpaces [1] to support this type of opportunistic adaptive parallel/distributed computing over networked clusters in a non-intrusive man...
详细信息
We present the design, implementation and evaluation of a framework that uses JavaSpaces [1] to support this type of opportunistic adaptive parallel/distributed computing over networked clusters in a non-intrusive manner. The framework targets applications exhibiting coarse-grained parallelism and has three key features: (1) portability across heterogeneous platforms, (2) minimal configuration overheads for participating nodes, and (3) automated system state monitoring (using SNMP) to ensure nonintrusive behavior. Experimental results presented in this paper demonstrate that for applications that can be broken into coarse-grained, relatively independent tasks, the opportunistic adaptive parallel computing framework can provide performance gains. Furthermore, the results indicate that monitoring and reacting to the current system state minimizes the intrusiveness of the framework.
NASA's Information Power Grid (IPG) is an infrastructure designed to harness the power of geographically distributed computers, databases and human expertise, in order to solve large-scale realistic computational ...
详细信息
ISBN:
(纸本)0769509908
NASA's Information Power Grid (IPG) is an infrastructure designed to harness the power of geographically distributed computers, databases and human expertise, in order to solve large-scale realistic computational problems. This type of a metacomputing environment is necessary to present a unified virtual machine to application developers that hides the intricacies of a highly heterogeneous environment and yet maintains adequate security. In this paper, we present a novel partitioning scheme, called MinEX, that dynamically balances processor workloads while minimizing data movement and runtime communication, for applications that are executed in a paralleldistributed fashion on the IPG. Experimental results show that MinEX is an effective load balancer in a distributed IPG environment.
All-to-all communication is one of the most dense collective communication patterns and occurs in many important applications in parallel and distributed computing. In this paper, we present a new all-to-all broadcast...
详细信息
ISBN:
(纸本)0769509908
All-to-all communication is one of the most dense collective communication patterns and occurs in many important applications in parallel and distributed computing. In this paper, we present a new all-to-all broadcast algorithm in multidimensional all-port mesh and torus networks. We propose a broadcast pattern which ensures a balanced traffic load in all dimensions in the network so that the all-to-all broadcast algorithm can achieve a very tight near-optimal transmission time. The algorithm also takes advantage of overlapping of message switching time and transmission time, and the total communication delay asymptotically matches the lower bound of all-to-all broadcast. Finally, the algorithm is conceptually simple, and symmetrical for every message and every node so that it can be easily implemented in hardware and achieves the near-optimum in practice.
Providing variable granularities is an attractive way to achieve good speedups for various classes of parallel applications. A few systems achieve this goal by instrumenting an application with the checking code for t...
详细信息
ISBN:
(纸本)0769509908
Providing variable granularities is an attractive way to achieve good speedups for various classes of parallel applications. A few systems achieve this goal by instrumenting an application with the checking code for the state of shared data. Although these systems can provide arbitrary granularities flexibly, they have severe race conditions inherent to software-only approaches as well as the run-time overhead of the instrumentation. In this paper, we propose a new mechanism, which has low overhead and incurs no race conditions while providing variable granularities in software. The unique idea of our mechanism is to delegate the state checks to the segmentation hardware of the Intel X86. The instrumented code only maintains the state of shared data to use the segmentation hardware. Because the hardware atomically performs the required state checks and corresponding references, our mechanism is free from difficult race conditions. This feature efficiently enhances the response time to remote requests via an interrupt mechanism without additional synchronization overheads for avoiding race conditions. The run-time overhead further decreases owing to the reduced works to be done by software. The evaluation results show that our mechanism exhibits sufficiently low overhead even without any optimization.
We propose the robust algorithm-configured emulation (RACE) scheme for efficient parallel computation and communication in the presence of faults. A wide variety of algorithms originally designed for fault-free meshes...
详细信息
ISBN:
(纸本)0769509908
We propose the robust algorithm-configured emulation (RACE) scheme for efficient parallel computation and communication in the presence of faults. A wide variety of algorithms originally designed for fault-free meshes, tori, and k-ary n-cubes can be transformed to corresponding robust algorithm through RACE. In particular optimal robust algorithms can be derived for total exchange (TE) and ascend/descend operations with a factor of 1+o (1) slowdown. Also, RACE can tolerate a large number of faulty elements, without relying on hardware redundancy or any assumption about the availability of a complete subarray.
Many decision support applications are built upon data mining and OLAP tools and allow users to answer information requests based on a data warehouse that is managed by a powerful DBMS. We focus on tools that generate...
详细信息
ISBN:
(纸本)0769511406
Many decision support applications are built upon data mining and OLAP tools and allow users to answer information requests based on a data warehouse that is managed by a powerful DBMS. We focus on tools that generate sequences of SQL statements in order to produce the requested information. Our thorough analysis revealed that many sequences of queries that are generated by commercial tools are not very efficient. An optimized system architecture is suggested for these applications. The main component is a DSS optimizer that accepts previously generated sequences of queries and remodels them according to a set of optimization strategies, before they are executed by the underlying database system. The advantages of this extended architecture are discussed and a couple of appropriate optimization strategies are identified. Experimental results are given, showing that these strategies are appropriate to optimize query sequences of OLAP applications.
In order to obtain efficiency, current practice in distributedsoftwaresystems design often suffers from a lack of abstraction. An object-oriented design technique based on UML notations and a special type of high-le...
详细信息
ISBN:
(纸本)0769506348
In order to obtain efficiency, current practice in distributedsoftwaresystems design often suffers from a lack of abstraction. An object-oriented design technique based on UML notations and a special type of high-level Petri-Nets is used to demonstrate how designs can be kept sufficiently abstract to be platform independent and re-usable but still support design alternatives and their evaluation w.r.t. availability and principle system performance.
暂无评论