Graph Partitioning has several important applications in Computer Science, including VLSI circuit layout, image processing, and distributing workloads for parallel computation. It is known to be NP-hard. In this paper...
详细信息
ISBN:
(纸本)9780889867048
Graph Partitioning has several important applications in Computer Science, including VLSI circuit layout, image processing, and distributing workloads for parallel computation. It is known to be NP-hard. In this paper we present in detail the K-Graph Partitioning Problem and the Dynamic distributed Double Guided Genetic Algorithm. this algorithm consists of agents dynamically created and cooperated in order to solve the problem. Each agent performs its own genetic algorithm, guided by the min-conflict-heuristic. the paper also presents the results of application the algorithm for the $K$-Graph Partitioning Problem using a multilevel paradigm.
this paper presents comprehensive evaluations of parallel double Divide and Conquer for singular value decomposition on a super computer, HPC2500. For bidiagonal SVD, double Divide and Conquer was proposed. It first c...
详细信息
ISBN:
(纸本)9780889867048
this paper presents comprehensive evaluations of parallel double Divide and Conquer for singular value decomposition on a super computer, HPC2500. For bidiagonal SVD, double Divide and Conquer was proposed. It first computes singular values by a compact version of Divide and Conquer. the corresponding singular vectors are then computed by twisted factorization. the speed and accuracy of double Divide and Conquer are as good or even better than standard algorithms such as QR and the original Divide and Conquer. Moreover, it shows high scalability even on a PC cluster, distributed memory architecture. parallel algorithm of dDC and numerical results in some architectural options, matrix sizes and types on HPC2500, SMP cluster is shown.
Large-scale graph problems are becoming increasingly important in science and engineering. the irregular, sparse instances are especially challenging to solve on cache-based architectures as they are known to incur er...
详细信息
ISBN:
(纸本)9780889867048
Large-scale graph problems are becoming increasingly important in science and engineering. the irregular, sparse instances are especially challenging to solve on cache-based architectures as they are known to incur erratic memory access patterns. Yet many of the algorithms also exhibit some degree of regularity with memory accesses. It is important to characterize the locality behavior in order to bridge the gap between algorithm and architecture. In our study we quantify the locality of several fundamental graph algorithms, both sequential and parallel, and correlate our observations withthe algorithmic design. Our study of locality behavior brings insight into the impact of different cache architectures on the performance of both sequential and parallel graph algorithms.
Data clustering is a common technique for data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. Due to the continuous increase of...
详细信息
ISBN:
(纸本)9780889867048
Data clustering is a common technique for data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. Due to the continuous increase of datasets size and the intensive computation of clustering algorithms when used for analyzing large datasets, developing of efficient clustering algorithms is needed for processing time reduction. this paper describes the design and implementation of a recently developed clustering algorithm RACAL [1], which is a RAdius based Clustering ALgorithm. the proposed parallel algorithm (PRACAL) has the ability to cluster large datasets of high dimensions in a reasonable time, which leads to a higher performance computing.
We are developing a task parallel script language named MegaScript for mega-scale parallel processing. MegaScript regards sequential/parallel programs as tasks, and controls them for massively parallel execution. Alth...
详细信息
ISBN:
(纸本)9780889867048
We are developing a task parallel script language named MegaScript for mega-scale parallel processing. MegaScript regards sequential/parallel programs as tasks, and controls them for massively parallel execution. Although MegaScript programs require optimizations and extensions specific to the application and the computing environment, modifying the runtime system or task programs greatly reduces portability and reusability. To satisfy these conflicting requirements, we propose a user-level dynamic extension scheme named Adapter. In this scheme, the user defines a customization code and hooks to it a specific event. the runtime system calls back the code for the event locally, enabling it to extend or optimize system behavior without modifying the runtime or task programs. the results of our evaluation of the scheme show that the overhead and programming cost are both small enough for practical use.
Speculative Locking protocol (SL) is a concurrency control protocol that allows for parallel execution of conflicting transactions through a method of multilevel lending and versioning. the SL protocol shows performan...
详细信息
ISBN:
(纸本)9780889867048
Speculative Locking protocol (SL) is a concurrency control protocol that allows for parallel execution of conflicting transactions through a method of multilevel lending and versioning. the SL protocol shows performance improvements over the standard two-phase locking (2PL) protocol, but relies on several assumptions that would make it unsuitable in real-world scenarios. In this paper, we have proposed an adaptive speculative locking (ASL) protocol that improves performance of real-time distributed database systems by augmenting the SL protocol with four features: distributed real-time database system support;simultaneous multi-threading or page execution;control of transaction execution through transaction queue management;and restricting system memory through the use of virtual memory. the simulation results demonstrate the superiority of the ASL protocol over the SL protocols through the reduction of data contention caused by finite memory and the overall increase in transaction throughput.
this paper propose a new approach to improve the Dynamic distributed Double Guided Genetic Algorithm (D3G2A) dealing with Maximal Constraint Satisfaction Problems. Inspired by the NEO-DARWINISM theory and the nature l...
详细信息
ISBN:
(纸本)9780889867048
this paper propose a new approach to improve the Dynamic distributed Double Guided Genetic Algorithm (D3G2A) dealing with Maximal Constraint Satisfaction Problems. Inspired by the NEO-DARWINISM theory and the nature laws, D3G2A consists in creating agents cooperating together to solve problems. In D3G2A, It was proved that the spent CPU time could be improved. the new approach Inspired by the D 3G2A for CSOP and ΣCSPs, will redistribute the load of species agents more equally in order to better the CPU time. this improvement allows not only reduction in species agent's number but also decrease communications agents cost. thus, a sub-population is composed of chromosomes violating a number of constraints in the same interval. In the present paper, the new approach is first described and then compared withthe old one. Results of experimentations are analyzed and discussed.
Multi-cluster schedulers can dramatically improve average job turn-around time performance by making use of fragmented node resources available throughout the grid. By carefully mapping jobs across potentially many cl...
详细信息
ISBN:
(纸本)9780889867048
Multi-cluster schedulers can dramatically improve average job turn-around time performance by making use of fragmented node resources available throughout the grid. By carefully mapping jobs across potentially many clusters, jobs that would otherwise wait in the queue for local cluster resources can begin execution much earlier;thereby improving system utilization and reducing average queue waiting time. Recent research in this area leverages user-provided estimates of job communication characteristics to effectively partition the job across cluster boundaries. In this paper, we address the impact of inaccuracies in these estimates on overall system performance. Furthermore, we demonstrate that multi-site job scheduling techniques benefit from these estimates, even in the presence of considerable inaccuracy.
parallel applications are notorious for their intractability to performance debugging. Automatic performance analysis techniques, such as those used by Kojak and KappaPI, are promising in alleviating the difficulty of...
详细信息
ISBN:
(纸本)9780889867048
parallel applications are notorious for their intractability to performance debugging. Automatic performance analysis techniques, such as those used by Kojak and KappaPI, are promising in alleviating the difficulty of discovering performance inefficiencies in parallel applications. However, as we show in this paper, the results produced by these tool can be potentially misleading and sometimes, outright incorrect. the reason is that the overhead due to performance inefficiencies originating at a certain point in the program can causally propagate and manifest itself at other points. Current techniques perform a flat analysis, i.e., they do not account for causal propagation. In this paper, we present a method of causal analysis that current analysis techniques can be retrofitted with to account for causal propagation of overhead to arrive at a more accurate description of performance bottlenecks. We also show various advantages rendered by this technique to improving the effectiveness of automatic performance analysis. In this paper, we only tackle overhead related to communication operations in MPI parallel application. In general, however, our technique can be used for non-communication related overhead for any parallel programming paradigm.
Currently, clusters of shared memory symmetric multiprocessors (SMPs) are one of the most common parallelcomputingsystems, for which some existing environments have between 8 to 32 processors per node. Examples of s...
详细信息
ISBN:
(纸本)9780889867048
Currently, clusters of shared memory symmetric multiprocessors (SMPs) are one of the most common parallelcomputingsystems, for which some existing environments have between 8 to 32 processors per node. Examples of such environments include some supercomputers: DataStar p655 (P655 and P655m) and P690 at the San Diego Supercomputing Center, and Seaborg and Bassi at the DOE National Energy Research Scientific computing Center. In this paper, we quantify the performance gap resulting from using different number of processors per node for application execution (for which we use the term processor partitioning), and conduct detailed performance experiments to identify the major application characteristics that affect processor partitioning. We use the STREAM memory benchmarks and Intel's MPI benchmarks to explore the performance impact of different application characteristics. the results are then utilized to explain the performance results of processor partitioning using three NAS parallel Application benchmarks. the experimental results indicate that processor partitioning can have a significant impact on performance of a parallel scientific application as determined by its communication and memory requirements.
暂无评论