Biological ants organize themselves into forager groups that converge to shortest paths to and from food sources. This has motivated development of a large class of biologically inspired agent-based graph search techn...
详细信息
ISBN:
(纸本)9780889868113
Biological ants organize themselves into forager groups that converge to shortest paths to and from food sources. This has motivated development of a large class of biologically inspired agent-based graph search techniques, called Ant Colony Optimization, to solve diverse combinatorial problems. Our approach to parallel graph search uses multiple ant agent populations distributed across processors and clustered computers to solve largescale graph search problems. We discuss our implementation using the NIST Data Flow System II, and show good scalability of our parallel search algorithm.
Modern science requires a close collaboration of scientists who are possibly scattered all over the world. The ongoing spreading of the internet and the emergence of grid and cloud techniques in recent years have caus...
详细信息
ISBN:
(纸本)9780889868113
Modern science requires a close collaboration of scientists who are possibly scattered all over the world. The ongoing spreading of the internet and the emergence of grid and cloud techniques in recent years have caused intensified efforts in the development of infrastructures that are suitable to enhance a world-wide collaboration of researchers. The term 'eScience' designates a scientific paradigm which has the primary goal to employ a shared digital infrastructure to improve the collaboration of researchers in the core areas of science. Nevertheless, a mere interconnection of hardware resources, programs, and data would be insufficient to achieve this aim. In this paper we will propose the concept of Shared Workspaces which will simplify the worldwide interdisciplinary collaboration. A platform based on this concept implements a controlled and secure communication, facilitates a connection with high performance computers, and enables the exchange and archiving of programs, data, communications and (intermediate) results. It also allows a connection to legacy projects and repositories.
This work presents a parallel implementation of the implicitly restarted Arnoldi/Lanczos method for the solution of eigenproblems approximated by the finite element method. The implicitly restarted Arnoldi/Lanczos use...
详细信息
ISBN:
(纸本)9780889868113
This work presents a parallel implementation of the implicitly restarted Arnoldi/Lanczos method for the solution of eigenproblems approximated by the finite element method. The implicitly restarted Arnoldi/Lanczos uses a restart scheme in order to improve the convergence of the desired portion of the spectrum, maintaining the orthogonality of the Krylov basis. The presented implementation is suitable for distributed memory architectures, specially PC clusters. In the parallel solution, a subdomain by subdomain approach was implemented and overlapping and non-overlapping mesh partitions were used. Compressed data structures in the formats CSRC and CSRC/CSR were used to store the global matrices coefficients. The parallelization of numerical linear algebra operations presented in both Krylov and implicitly restarted methods are discussed. In order to point out the efficiency and applicability of the proposed algorithms, a numerical example is shown.
The progress of semiconductor technology enables to implement a large system to only one chip, and physical problems such as IR-drop, electro migration, etc. become serious problems for VLSI circuit. To alleviate thes...
详细信息
ISBN:
(纸本)9780889868113
The progress of semiconductor technology enables to implement a large system to only one chip, and physical problems such as IR-drop, electro migration, etc. become serious problems for VLSI circuit. To alleviate these problems, VLSI circuit optimization which includes fast and accurate circuit simulator is necessary. This paper describes fast and accurate parallel transient simulator for RLC power grid circuit by GPU (Graphics Processing Unit) using CUDA. The GPU is a processor specified for graphic processing, and its architecture is quite unique. This paper proposes transient simulation method considering the feature of GPU architecture. Experimental results show that proposed transient simulator can achieve 173 times faster simulation than CPU, and the simulation error between proposed simulator and simulator executed on CPU is under 0.01%.
We are eloping a task parallel script language MegaScript for executing large-scale workflows on widely distributed heterogeneous environments. For efficient execution of this language, we have proposed a multi-layere...
详细信息
ISBN:
(纸本)9780889868113
We are eloping a task parallel script language MegaScript for executing large-scale workflows on widely distributed heterogeneous environments. For efficient execution of this language, we have proposed a multi-layered task scheduling scheme: the upper layer making rough global scheduling, and the lower layer making precise local scheduling. However, the cost for local scheduling is still a serious issue. Therefore, we propose an adaptive scheduling scheme appropriate to this kind of workflow. The scheme adaptively switches DAG scheduling and independent task scheduling, reducing the scheduling cost for independent task sets in the workflow. The results of our evaluation show our scheme achieved a 540 times speedup of total scheduling time when each host executes 100 tasks on average without serious extension of the makespan less than 7%.
A recent study characterizing failures in computer networks shows that transient single element (node/link) failures are the dominant failures in large communication networks like the Internet. Thus, having the routin...
详细信息
ISBN:
(纸本)9780889868113
A recent study characterizing failures in computer networks shows that transient single element (node/link) failures are the dominant failures in large communication networks like the Internet. Thus, having the routing paths globally recomputed on a failure does not pay off since the failed element recovers fairly quickly, and the recomputed routing paths need to be discarded. In this paper, we present the first distributed algorithm that computes the alternate paths required by some proactive recovery schemes for handling transient failures. Our algorithm computes paths that avoid a failed node, and provides an alternate path to a particular destination from an upstream neighbor of the failed node. With minor modifications, we can have the algorithm compute alternate paths that avoid a failed link as well. To the best of our knowledge all previous algorithms proposed for computing alternate paths are centralized, and need complete information of the network graph as input to the algorithm.
Multi-dimensional fixed-point fast Fourier transform(FFT) methods were developed and tested on two generalpurpose many-core architecture platforms. One is the highly-parallel fine-grained eXplicit Multi-Threaded (XMT)...
详细信息
ISBN:
(纸本)9780889868113
Multi-dimensional fixed-point fast Fourier transform(FFT) methods were developed and tested on two generalpurpose many-core architecture platforms. One is the highly-parallel fine-grained eXplicit Multi-Threaded (XMT) single-chip parallel architecture that targets reducing single task completion time. The second is 8xDual Core AMD Opteron 8220. The results show that the former outperforms the latter not only in speedup and ease of programming, but also with small data sets (small-scale parallelism). One of our results on XMT was a super-linear speedup (by a factor larger than the number of processors) observed under some rather unique circumstances.
Breadth-first Search (BFS) is a fundamental graph theory algorithm that is extensively used to abstract various challenging computational problems. Due to the fine-grained irregular memory accesses, parallelization of...
详细信息
ISBN:
(纸本)9780889868113
Breadth-first Search (BFS) is a fundamental graph theory algorithm that is extensively used to abstract various challenging computational problems. Due to the fine-grained irregular memory accesses, parallelization of BFS can exhibit limited performance on cache-based systems. In this paper, we study the relationship between the topology of input graphs and the performance of BFS on multicore systems. We propose a model to estimate the scalability of BFS with respect to a given graph. Using this model, we propose a topologically adaptive parallel BFS algorithm on multicore systems. The proposed algorithm estimates scalability of each iteration of BFS with respect to the input graph at runtime. An adaptive barrier is developed for this algorithm, which dynamically adjusts the number of threads participating in the BFS according to the estimated scalability. In this way, we reduce the synchronization overhead. We evaluate the proposed algorithm using various graphs on state-of-the-art multicore systems. The proposed method exhibits improved performance compared with traditional parallel BFS algorithms for which the number of threads is fixed.
Availability is a critical indicator in evaluating modern distributed file systems, however, it's very difficult to correctly evaluate the availability of those systems due to their complexity. For example, in add...
详细信息
ISBN:
(纸本)9780889868113
Availability is a critical indicator in evaluating modern distributed file systems, however, it's very difficult to correctly evaluate the availability of those systems due to their complexity. For example, in addition to measuring the availability in terms of the number of nines, it might be more useful and necessary to characterize other metrics such as performance degradation under faulty conditions, the time from fault to failure, and to discover correlation among failures if exists. In this paper, we propose a user-oriented benchmark tool (D-FACT) to evaluate a distributed file system under various faulty scenarios. There are several advantages of the proposed benchmark tool. First, it provides a mechanism for users to define their customized evaluation policies which are independent of different target file systems. Second, it is able to model the consequences of various real world faults, which provides the flexibility to emulate any user-defined faults, and also lets users focus on the layer of the target file system. Third, it can be extended to apply in a distributed system. A case study on PVFS2 is conducted using our benchmark, and the preliminary results demonstrate its effectiveness.
Data Grid, a class of Grid computing, aims at providing services and infrastructure to data-intensive distributed applications which need to access, transfer and modify large data storages. A common issue on Data Grid...
详细信息
ISBN:
(纸本)9780889868113
Data Grid, a class of Grid computing, aims at providing services and infrastructure to data-intensive distributed applications which need to access, transfer and modify large data storages. A common issue on Data Grids is the data access optimization, which has been addressed through different approaches such as: bio-inspired and replication (LRU, LFU, Economic Model) strategies. However, few of these approaches consider application features to optimize data access operations (read-and-write). These features define the application behavior, which can support the optimization of operations and, consequently, improve the global system performance. Motivated by the need of efficient data access in large scale distributed environments and by the use of application characteristics, this paper proposes a new heuristic to optimize data accesses (read-andwrite operations) based on application historical behavior. Simulation results confirm that the heuristic reduces application execution time when compared to other approaches commonly considered.
暂无评论