This paper presents comprehensive evaluations of parallel double Divide and Conquer for singular value decomposition on a super computer, HPC2500. For bidiagonal SVD, double Divide and Conquer was proposed. It first c...
详细信息
ISBN:
(纸本)9780889867048
This paper presents comprehensive evaluations of parallel double Divide and Conquer for singular value decomposition on a super computer, HPC2500. For bidiagonal SVD, double Divide and Conquer was proposed. It first computes singular values by a compact version of Divide and Conquer. The corresponding singular vectors are then computed by twisted factorization. The speed and accuracy of double Divide and Conquer are as good or even better than standard algorithms such as QR and the original Divide and Conquer. Moreover, it shows high scalability even on a PC cluster, distributed memory architecture. parallel algorithm of dDC and numerical results in some architectural options, matrix sizes and types on HPC2500, SMP cluster is shown.
Data clustering is a common technique for data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. Due to the continuous increase of...
详细信息
ISBN:
(纸本)9780889867048
Data clustering is a common technique for data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. Due to the continuous increase of datasets size and the intensive computation of clustering algorithms when used for analyzing large datasets, developing of efficient clustering algorithms is needed for processing time reduction. This paper describes the design and implementation of a recently developed clustering algorithm RACAL [1], which is a RAdius based Clustering ALgorithm. The proposed parallel algorithm (PRACAL) has the ability to cluster large datasets of high dimensions in a reasonable time, which leads to a higher performance computing.
Many prediction techniques based on historical data have been proposed to reduce over-estimations of job runtimes provided by users. They were shown to improve the accuracy of runtime estimates and scheduling performa...
详细信息
ISBN:
(纸本)9780889867048
Many prediction techniques based on historical data have been proposed to reduce over-estimations of job runtimes provided by users. They were shown to improve the accuracy of runtime estimates and scheduling performance of backfill policies, according to particular error metrics and average performance measures. However, using a more complete set of performancemeasures and a new error metric, we show potential performance problems of using previous prediction techniques for job scheduling. Furthermore, we show simply adding half of the requested runtime to each initial prediction greatly reduces the problems.
Large-scale graph problems are becoming increasingly important in science and engineering. The irregular, sparse instances are especially challenging to solve on cache-based architectures as they are known to incur er...
详细信息
ISBN:
(纸本)9780889867048
Large-scale graph problems are becoming increasingly important in science and engineering. The irregular, sparse instances are especially challenging to solve on cache-based architectures as they are known to incur erratic memory access patterns. Yet many of the algorithms also exhibit some degree of regularity with memory accesses. It is important to characterize the locality behavior in order to bridge the gap between algorithm and architecture. In our study we quantify the locality of several fundamental graph algorithms, both sequential and parallel, and correlate our observations with the algorithmic design. Our study of locality behavior brings insight into the impact of different cache architectures on the performance of both sequential and parallel graph algorithms.
We consider the problem of scheduling parallel applications, represented by directed acyclic graphs (DAGs), onto Grid style resource pools. The core issues are that the availability and performance of grid resources, ...
详细信息
ISBN:
(纸本)9780889866379
We consider the problem of scheduling parallel applications, represented by directed acyclic graphs (DAGs), onto Grid style resource pools. The core issues are that the availability and performance of grid resources, which are already by their nature heterogeneous, can be expected to vary dynamically, even during the course of an execution. Typical scheduling methods in the literature partially address this issue because they consider static heterogenous computing environments (i.e. heterogeneous resources are dedicated and unchanging over time). This paper presents the Grid Task Positioning GTP scheduling method, which addresses the problem by allowing rescheduling of an executing application in response to significant variations in resource characteristics. GTP considers the impact of partial completion of tasks and task migration. We compare the performance of GTP with that of the well-known, and static, Heterogeneous Earliest Finish Time (HEFT) algorithm.
This paper explores the characteristics of job migration between different parallel compute sites in a decentralized Grid scenario sharing a central job pool. Independent users are assumed to submit their jobs to thei...
详细信息
ISBN:
(纸本)9780889867048
This paper explores the characteristics of job migration between different parallel compute sites in a decentralized Grid scenario sharing a central job pool. Independent users are assumed to submit their jobs to their local site MPP installation, which in turn is allowed to decline the local execution of jobs by offering them to a central job pool. The simulation results are obtained using real workload traces and compared to the EASY backfilling single site algorithm. It is shown that even simple job pooling is beneficial for all high utilized sites as it is possible to achieve shorter response times for jobs compared to the best single-site scheduling results. Further, new insights are provided about the amount, characteristics, and distribution of the migrated jobs.
In this paper we present a data model view on the MEDIOGRID project. Based on the general project architecture, we define the data model requirements and technology dependencies. We introduce a two layer data manageme...
详细信息
ISBN:
(纸本)9780889866379
In this paper we present a data model view on the MEDIOGRID project. Based on the general project architecture, we define the data model requirements and technology dependencies. We introduce a two layer data management architecture based on highly interoperable web service interfaces and describe computational orchestration. We suggest a resource centric view and highlight improvements at the scheduling level based on the proposed data model view.
We are developing a task parallel script language named MegaScript for mega-scale parallel processing. MegaScript regards sequential/parallel programs as tasks, and controls them for massively parallel execution. Alth...
详细信息
ISBN:
(纸本)9780889867048
We are developing a task parallel script language named MegaScript for mega-scale parallel processing. MegaScript regards sequential/parallel programs as tasks, and controls them for massively parallel execution. Although MegaScript programs require optimizations and extensions specific to the application and the computing environment, modifying the runtime system or task programs greatly reduces portability and reusability. To satisfy these conflicting requirements, we propose a user-level dynamic extension scheme named Adapter. In this scheme, the user defines a customization code and hooks to it a specific event. The runtime system calls back the code for the event locally, enabling it to extend or optimize system behavior without modifying the runtime or task programs. The results of our evaluation of the scheme show that the overhead and programming cost are both small enough for practical use.
Speculative Locking protocol (SL) is a concurrency control protocol that allows for parallel execution of conflicting transactions through a method of multilevel lending and versioning. The SL protocol shows performan...
详细信息
ISBN:
(纸本)9780889867048
Speculative Locking protocol (SL) is a concurrency control protocol that allows for parallel execution of conflicting transactions through a method of multilevel lending and versioning. The SL protocol shows performance improvements over the standard two-phase locking (2PL) protocol, but relies on several assumptions that would make it unsuitable in real-world scenarios. In this paper, we have proposed an adaptive speculative locking (ASL) protocol that improves performance of real-time distributed database systems by augmenting the SL protocol with four features: distributed real-time database system support;simultaneous multi-threading or page execution;control of transaction execution through transaction queue management;and restricting system memory through the use of virtual memory. The simulation results demonstrate the superiority of the ASL protocol over the SL protocols through the reduction of data contention caused by finite memory and the overall increase in transaction throughput.
Currently, clusters of shared memory symmetric multiprocessors (SMPs) are one of the most common parallelcomputingsystems, for which some existing environments have between 8 to 32 processors per node. Examples of s...
详细信息
ISBN:
(纸本)9780889867048
Currently, clusters of shared memory symmetric multiprocessors (SMPs) are one of the most common parallelcomputingsystems, for which some existing environments have between 8 to 32 processors per node. Examples of such environments include some supercomputers: DataStar p655 (P655 and P655m) and P690 at the San Diego Supercomputing Center, and Seaborg and Bassi at the DOE National Energy Research Scientific computing Center. In this paper, we quantify the performance gap resulting from using different number of processors per node for application execution (for which we use the term processor partitioning), and conduct detailed performance experiments to identify the major application characteristics that affect processor partitioning. We use the STREAM memory benchmarks and Intel's MPI benchmarks to explore the performance impact of different application characteristics. The results are then utilized to explain the performance results of processor partitioning using three NAS parallel Application benchmarks. The experimental results indicate that processor partitioning can have a significant impact on performance of a parallel scientific application as determined by its communication and memory requirements.
暂无评论