A new class of normalized approximate inverse matrix techniques, based on the concept of sparse normalized approximate factorization procedures are introduced for solving sparse linear systems derived from the finite ...
详细信息
A new class of normalized approximate inverse matrix techniques, based on the concept of sparse normalized approximate factorization procedures are introduced for solving sparse linear systems derived from the finite difference discretization of partial differential equations. Normalized explicit preconditioned conjugate gradient type methods in conjunction with normalized approximate inverse matrix techniques are presented for the efficient solution of sparse linear systems. Theoretical results on the rate of convergence of the normalized explicit preconditioned conjugate gradient scheme and estimates of the required computational work are presented. Application of the new proposed methods on two dimensional initial/boundary value problems is discussed and numerical results are given. The parallel and systolic implementation of the dominant computational part is also investigated.
The ability to teach parallel programming principles and techniques is becoming fundamental to prepare a new generation of programmers able to master the pervasive parallelism made available by hardware vendors. Class...
详细信息
ISBN:
(纸本)9781538649756
The ability to teach parallel programming principles and techniques is becoming fundamental to prepare a new generation of programmers able to master the pervasive parallelism made available by hardware vendors. Classical parallel programming courses leverage either low-level programming frameworks (e.g. those based on Pthreads) or higher level frameworks such as OpenMP or MPI. We discuss our teaching experience within the Master in "Computer Science and networking" where parallel programming is taught leveraging structured parallel programming principles and frameworks. The paper summarizes the results achieved in eight years of experience and shows how the adoption of a structured parallel programming approach improves the efficiency of the teaching process.
For bidiagonal SVD, double Divide and Conquer was proposed. It first computes singular values by a compact version of Divide and Conquer. The corresponding singular vectors are then computed by twisted factorization. ...
详细信息
ISBN:
(纸本)1601320841
For bidiagonal SVD, double Divide and Conquer was proposed. It first computes singular values by a compact version of Divide and Conquer. The corresponding singular vectors are then computed by twisted factorization. The speed and accuracy of double Divide and Conquer are as good or even better than standard algorithms such as QR and the original Divide and Conquer. Moreover, it shows high scalability even on a PC cluster, distributed memory architecture. This paper presents evaluations of parallel double Divide and Conquer for singular value decomposition on a 16-core architecture.
The oncoming many-core platforms is a hot topic these days, and this next generation hardware sets new focus on energy and thermal awareness. With a more and more dense packing of transistors, the system must be made ...
详细信息
ISBN:
(纸本)9780769549392;9781467353212
The oncoming many-core platforms is a hot topic these days, and this next generation hardware sets new focus on energy and thermal awareness. With a more and more dense packing of transistors, the system must be made energy aware to not suffer from overheating and energy waste. As a step towards increased energy efficiency, we intend to add the notion of QoS handling to the OS level and to applications. We suggest the design of a QoS manager as a plug-in OS extension capable of providing applications with the necessary resources leading to better energy efficiency.
With the development of ICT industry, the volume of produced data is experiencing tremendous growth, which motivates more demands of storage capacity. Because of the limited storage capacity of users' terminals, m...
详细信息
ISBN:
(纸本)9781538637906
With the development of ICT industry, the volume of produced data is experiencing tremendous growth, which motivates more demands of storage capacity. Because of the limited storage capacity of users' terminals, more and more applications prefer to upload data to cloud platforms. However, it is well known that security should not be neglected in existing cloud storage architectures. Motivated by the increasing popularity of emerging blockchain technology, we propose a blockchain-based security architecture for distributed cloud storage. Moreover, we customize a genetic algorithm to solve the file block replica placement problem between multiple users and multiple data centers in the distributed cloud storage environment. Numerical experimental results show that the proposed architecture outperforms the traditional cloud storage architectures in terms of security, with acceptable network transmission delay.
distributed systems that are deployed using Jini technology employ the concept of dynamic registration, discovery and utilization of distributed services. Jini uses a central registry (lookup service), which is the pr...
详细信息
ISBN:
(纸本)9781932415605
distributed systems that are deployed using Jini technology employ the concept of dynamic registration, discovery and utilization of distributed services. Jini uses a central registry (lookup service), which is the primary means for service providers to advertise their services and allows clients to locate and enlist the help of those services. As a central component of Jini's runtime infrastructure, reliability and fault tolerance of this lookup service becomes an essential requirement. This paper presents the design and evaluation of a fault-tolerant distributed Jini lookup service that utilizes group communication systems. The proposed design enhances the reliability and performance of the Jini lookup service. In addition, the paper presents experimental results that evaluate the performance and reliability of the proposed distributed lookup service.
With the advent of IP over IEEE 1394 as a network technology, a new contender for a low cost next-generation cluster interconnect is on the horizon. In this paper, we benchmark IEEE 1394 and compare it to other cluste...
详细信息
ISBN:
(纸本)9781932415605
With the advent of IP over IEEE 1394 as a network technology, a new contender for a low cost next-generation cluster interconnect is on the horizon. In this paper, we benchmark IEEE 1394 and compare it to other cluster interconnects, namely Fast Ethernet, and Gigabit Ethernet. For a meaningful comparison, benchmark experiments are carried out at three levels: TCP/IP networking and MPI parallel programming, parallel application benchmarks. Using high-end PCs (Pentium IV 800 MHz) and standard system software (Linux and MPICH), our results show that IEEE 1394 is a viable alternative.
This paper discusses performance improvements achieved in two power system software modules through the use of parallelprocessingtechniques. The first software module, EVARISTE, outputs a voltage stability indicator...
详细信息
This paper discusses performance improvements achieved in two power system software modules through the use of parallelprocessingtechniques. The first software module, EVARISTE, outputs a voltage stability indicator for various power system situations. This module was designed for extended real-rime use and is therefore required to give guaranteed response times. The second module, MEXICO, assesses power system reliability and operating costs by simulating a large number of contingencies for generation and transmission equipment. This module, used for power system planning purposes, uses a Monte-Carlo method to build the various system states, and makes heavy demands on CPU time for running simulations. Like many power system computation packages, both software modules are well-suited to coarse-grain parallelprocessing. The first module was parallelized on a distributed-memory machine and the second on a shared-memory machine. In this paper, we start by a description of the parallelization process used in these two cases, then go on to give details on the performance levels achieved, discussing aspects of programming, parameter selection (number of situations processed, number of processors), and machine characteristics (limitations due to interprocessor communications network, for instance).
The computation core of many big data applications can be expressed as general matrix computations, including linear algebra operations and irregular matrix operations. However, existing parallel programming systems s...
详细信息
ISBN:
(纸本)9780769557854
The computation core of many big data applications can be expressed as general matrix computations, including linear algebra operations and irregular matrix operations. However, existing parallel programming systems such as Spark do not have programming abstraction and efficient implementation for general matrix computations. In this paper, we present MatrixMap, a unified and efficient data-parallel system for general matrix computations. MatrixMap provides powerful yet simple abstraction, consisting of a distributed data structure called bulk key matrix and a computation interface defined by matrix patterns. Users can easily load data into bulk key matrices and program algorithms into parallel matrix patterns. MatrixMap outperforms current state-of-the-art systems by employing three key techniques: matrix patterns with lambda functions for irregular and linear algebra matrix operations, asynchronous computation pipeline with optimized data shuffling strategies for specific matrix patterns and in-memory data structure reusing data in iterations. Moreover, it can automatically handle the parallelization and distribute execution of programs on a large cluster. The experiment results show that MatrixMap is 12 times faster than Spark.
Performance of a software distributed shared memory (DSM) system can be improved if load sharing is employed. However, traditional load sharing algorithms are not directly suitable for DSM systems since they do not co...
详细信息
ISBN:
(纸本)1892512416
Performance of a software distributed shared memory (DSM) system can be improved if load sharing is employed. However, traditional load sharing algorithms are not directly suitable for DSM systems since they do not consider the memory access patterns of tasks. This paper presents a load sharing algorithm that takes into account memory access patterns as well as individual processor load information to distribute tasks in a DSM environment. A vector that keeps track of the frequency of page accesses by tasks is used to determine the processor with the best locality of access. The general idea is to minimize the amount of remote page accesses. Simulation results are presented to illustrate the behavior of the algorithm.
暂无评论