QR methods for solving Toeplitz tridiagonal systems are well developed withapplications in numerous interdisciplinary fields. there is a strong motivation to develop faster, more efficient and, more importantly, scal...
详细信息
ISBN:
(纸本)0769521320
QR methods for solving Toeplitz tridiagonal systems are well developed withapplications in numerous interdisciplinary fields. there is a strong motivation to develop faster, more efficient and, more importantly, scalable algorithms to factor such systems due to their significance in many scientific applications. In this paper, we present two parallel QR factorization algorithms used to solve Toeplitz tridiagonal systems. QR factorization is accomplished using Householder reflections and Givens rotations. these parallel algorithms exhibit high scalability and near linear to superlinear speedup on large system sizes when implemented on a distributed system.
A symmetric algorithm is proposed for detecting distributed termination in a dynamic system with asynchronous communication networks. Correctness of the algorithm is proven. In the system, active processes may create ...
详细信息
ISBN:
(纸本)0769521320
A symmetric algorithm is proposed for detecting distributed termination in a dynamic system with asynchronous communication networks. Correctness of the algorithm is proven. In the system, active processes may create new processes or accept outside processes to join the basic computation. No processes can be destroyed or leave the system until the computation terminates. the network model exploited in the algorithm is a combination of a logical ring and computation trees. It is more general and especially suitable for the applications on Internet networks. the algorithm is more efficient than those in previous works in terms of control messages used in the detection protocol.
We characterize high-performance streaming applications as a new and distinct domain of programs that is becoming increasingly important. the StreamIt language provides novel high-level representations to improve prog...
详细信息
ISBN:
(纸本)0769521320
We characterize high-performance streaming applications as a new and distinct domain of programs that is becoming increasingly important. the StreamIt language provides novel high-level representations to improve programmer productivity and program robustness within the streaming domain. At the same time, the StreamIt compiler aims to improve the performance of streaming applications via stream-specific analysis and optimizations. In this paper, we motivate, describe and justify the language features of StreamIt, which include a structured model of streams, a messaging system for control, and a natural textual syntax.
Efficient task scheduling is essential for achieving high performance computing applications for distributed systems. Most of existing real-time systems consider schedulability as a main goal and ignores other effects...
详细信息
ISBN:
(纸本)078038623X
Efficient task scheduling is essential for achieving high performance computing applications for distributed systems. Most of existing real-time systems consider schedulability as a main goal and ignores other effects such as machines failures. In this paper we develop an algorithm to efficiently schedule parallel task graphs (fork-join structures). Our scheduling algorithm considers more than one factor at the same time. these factors are scheduability, reliability of the participating processors and achieved degree of parallelism. To achieve most of these goals, we composed an objective function that combines these different factors simultaneously. the proposed objective function is adjustable to provide the user with a way to prefer one factor to the others. the simulation results indicate that our algorithm produces schedules where the applications deadlines are met, reliability is maximized and the application parallelism is exploited.
the exponentially increasing complexity of many scientific applications and the high cost of supercomputing force us to explore new, sustainable, and affordable high-performance computing platforms. Recent significant...
详细信息
ISBN:
(纸本)0769521320
the exponentially increasing complexity of many scientific applications and the high cost of supercomputing force us to explore new, sustainable, and affordable high-performance computing platforms. Recent significant advances in FPGA technology and the inherent advantages of configurable logic have brought about new research efforts in the configurable computing field: parallelprocessing on configurable chips. We explore here parallel LU factorization of large sparse block-diagonal-bordered (BDB) matrices on a configurable multiprocessor that we have designed and implemented. A dynamic load balancing strategy is proposed and analyzed. Performance results for ieee power test systems are provided. Our research provides evidence that configurable logic can be a viable alternative to high-performance scientific computing.
High-performance applications place great demands on the computation and communication resources of a distributed computing platform. If the availability of the resources changes dynamically, the application performan...
详细信息
ISBN:
(纸本)0769521320
High-performance applications place great demands on the computation and communication resources of a distributed computing platform. If the availability of the resources changes dynamically, the application performance may suffer. this is especially true for cluster environments, which are often heterogeneous and require tedious tuning for high-performance applications. In this paper, we describe a packet probing technique to detect contention on the cluster nodes to which application is mapped. the technique is light-weight and may be a priori tuned to a given network type, so that it is used at an application's run-time. We also show an easy integration of packet probing as a module of a recently developed communication middleware, which provides an application with a run-time access to the dynamic computing system information and which invokes application adaptations.
Code coupling applications can be divided into communicating modules, that may be executed on different clusters in a cluster federation. As a cluster federation comprises of a large number of nodes, there is a high p...
详细信息
ISBN:
(纸本)0769521320
Code coupling applications can be divided into communicating modules, that may be executed on different clusters in a cluster federation. As a cluster federation comprises of a large number of nodes, there is a high probability of a node failure. We propose a hierarchical checkpointing protocol that combines a synchronized checkpointing technique inside clusters and a communication- induced technique between clusters. this protocol fits to the characteristics of a cluster federation (large number of nodes, high latency and low bandwidth networking technologies between clusters). A preliminary performance evaluation performed using a discrete event simulator shows that the protocol is suitable for code coupling applications.
the past few years have seen the emergence of application domains that need to process data elements arriving as a continuous stream. Recently, several architectures to process database queries over these data streams...
详细信息
ISBN:
(纸本)0769521320
the past few years have seen the emergence of application domains that need to process data elements arriving as a continuous stream. Recently, several architectures to process database queries over these data streams have been proposed in the literature. Although these architectures may be suitable for general purpose query processing in a centralized-setting, they have serious limitations when it comes to supporting data mining queries in a distributed-setting. Data mining is an interactive process and it is crucial that we provide the user with interactive response times. In addition, many data mining applications, such as network intrusion detection, need to process data streams arriving at distributed end-points. Centralized processing of data streams for network intrusion detection would be overwhelming. these are fundamental issues for data mining over data streams and have been addressed in this paper. Our schemes give controlled interactive response times when processing data streams in a distributed-setting.
Random number generators are one of the most common numerical library functions used in scientific applications. the standard random number generator provided within Java is fine for most purposes, however it does not...
详细信息
ISBN:
(纸本)0769521320
Random number generators are one of the most common numerical library functions used in scientific applications. the standard random number generator provided within Java is fine for most purposes, however it does not adequately meet the needs of large-scale scientific applications, such as Monte Carlo simulations. Previous work has addressed some of these problems by extending the standard Random API in Java and providing an implementation that includes a choice of several different generator algorithms. One issue that was not addressed in this work was concurrency. Implementations of the standard Java random number generator use synchronized methods to support the use of the generator across multiple Java threads, however this is a sequential bottleneck for parallelapplications. Here we present a proposal for further extending the standard API to support parallel generation of random number streams, which we have implemented in JAPARA, a Java parallel Random Number Generator Library for high-performance computing.
Computational Grids have been proposed as the next generation computing platform for solving large-scale problems in science, engineering, and commerce. there is an enormous amount of interest in applications, called ...
详细信息
ISBN:
(纸本)0769521320
Computational Grids have been proposed as the next generation computing platform for solving large-scale problems in science, engineering, and commerce. there is an enormous amount of interest in applications, called Grid Workflows in which a number of otherwise independent programs are run in a "pipeline". In practice, there are a number of different mechanisms that can be used to couple the models, ranging from loosely coupled file based IO to tightly coupled message passing. In this paper we propose a flexible IO architecture that provides a wide range of mechanisms for building Grid Work/lows without the need for any source code modification and without the need to fix them at design time. Further, the architecture works with legacy applications. We evaluate the performance of our prototype system using a workflow in computational mechanics.
暂无评论