In this paper, a nonsymmetric sparse linear system solver based on the exploitation of multilevel parallelism is proposed. One of the main issues addressed is the application of tearing techniques to enhance large gra...
详细信息
Address computations and indirect, hence double, memory accesses in sparse matrix application software render sparse computations to be inefficient in general. The authors propose memory architectures that support the...
详细信息
Address computations and indirect, hence double, memory accesses in sparse matrix application software render sparse computations to be inefficient in general. The authors propose memory architectures that support the storage of sparse vectors and matrices. In a first design, called vector storage, a matrix is handled as an array of sparse vectors, stored as singly-linked lists. Deletion and insertion of a vector is done row-or column-wise only. In a second design, called matrix storage, a higher level of sophistication is achieved. A sparse matrix is stored as a bi-directionally threaded doubly-linked list of elements. This approach enables both row- and column-wise operations. A pipelined variant with 3-fold interleaved memory and write buffers yields high efficiency, close to one sparse matrix element per memory cycle for all basic vector operations.< >
Emerging high-speed networks provide several hundred megabits per second to several gigabits per second of raw communication bandwidth. However, the maximum achievable throughput available to the end-user or applicati...
详细信息
ISBN:
(纸本)9780818666056
Emerging high-speed networks provide several hundred megabits per second to several gigabits per second of raw communication bandwidth. However, the maximum achievable throughput available to the end-user or application is quite limited. In order to fully utilize the network bandwidth and to improve the performance at the application level, a careful examination of I/O subsystems is essential. In this paper, we study one emerging high-speed network, the Fibre Channel network. The objectives of this study are: to understand how the I/O subsystem relates to network operations; to evaluate and analyze the performance of such a subsystem; and to propose possible approaches for improving the maximum achievable bandwidth. We show (by simply modifying device driver code) a 75% maximum achievable bandwidth improvement. Other ways of improving network performance are also discussed.< >
Model-based evaluation of reliable distributed and parallel systems is difficult due to the complexity of these systems and the nature of the dependability measures of interest. The complexity creates problems for ana...
详细信息
Model-based evaluation of reliable distributed and parallel systems is difficult due to the complexity of these systems and the nature of the dependability measures of interest. The complexity creates problems for analytical model solution techniques, and the fact that reliability and availability measures are based on rare events makes traditional simulation methods inefficient. Importance sampling is a well-known technique for improving the efficiency of rare event simulations. However, finding an importance sampling strategy that works well in general is a difficult problem. The best strategy for importance sampling depends on the characteristics of the system and the dependability measure of interest. This fact motivated the development of an environment for importance sampling that would support the wide variety of model characteristics and interesting measures. The environment is based on stochastic activity networks, and importance sampling strategies are specified using the new concept of the importance sampling governor. The governor supports dynamic importance sampling strategies by allowing the stochastic elements of the model to be redefined based on the evolution of the simulation. The utility of the new environment is demonstrated by evaluating the unreliability of a highly dependable fault-tolerant unit used in the well-known MARS architecture. The model is non-Markovian, with Weibull distributed failure times and uniformly distributed repair times.< >
Communication between processors has long been the bottleneck of distributed network computing. However recent progress in switch-based high-speed local area networks (LANs) may be changing the situation. Asynchronous...
详细信息
ISBN:
(纸本)9780818666056
Communication between processors has long been the bottleneck of distributed network computing. However recent progress in switch-based high-speed local area networks (LANs) may be changing the situation. Asynchronous transfer mode (ATM) is one of the most widely-accepted and emerging high-speed network standards which can potentially satisfy the communication needs of distributed network computing. We investigate distributed network computing over local ATM networks. We first study the performance characteristics involving end-to-end communication in an environment that includes several types of workstations interconnected via a Fore Systems' ASX-100 ATM Switch. We then compare the communication performance of four different application programming interfaces (APIs). The four APIs were Fore Systems ATM API, BSD socket programming interface, Sun's Remote Procedure Call (RPC), and the Parallel Virtual Machine (PVM) message passing library. Each API represents distributed programming at a different communication protocol layer. We evaluate parallel matrix multiplication over the local ATM network. The experimental results show that network computing is promising over local ATM networks.< >
We outline a unified approach for building a library of collective communication operations that performs well on a cross-section of problems encountered in real applications. The target architecture is a two-dimensio...
详细信息
We outline a unified approach for building a library of collective communication operations that performs well on a cross-section of problems encountered in real applications. The target architecture is a two-dimensional mesh with worm-hole routing, but the techniques also apply to higher dimensional meshes and hypercubes. We stress a general approach, addressing the need for implementations that perform well for various sized vectors and grid dimensions, including non-power-of-two grids. This requires the development of general techniques for building hybrid algorithms. Finally, the approach also supports collective communication within a group of nodes, which is required by many scalable algorithms. Results from the Intel Paragon system are included.< >
Compilers have focused on the exploitation of one of functional or data parallelism in the past. The PARADIGM compiler project at the University of Illinois is among the first to incorporate techniques for simultaneou...
Compilers have focused on the exploitation of one of functional or data parallelism in the past. The PARADIGM compiler project at the University of Illinois is among the first to incorporate techniques for simultaneous exploitation of both. The work in this paper describes the techniques used in the PARADIGM compiler and analyzes the optimality of these techniques. It is the first of its kind to use realistic cost models and includes data transfer costs which all previous researchers have neglected. Preliminary results on the CM-5 show the efficacy of our methods and the significant advantages of using functional and data parallelism together for execution of real applications.
In large-scale multiprocessors, whether loosely or tightly coupled, some memory is cheaper to access than other memory. Because direct management of memory on these machines is quite burdensome to the programmer, much...
详细信息
暂无评论