In this paper, we present an heterogeneous parallel computer dedicated to high realism computer graphics. A small network, with a reduced chip set, allows us to reduce rendering time by a very attractive factor. the l...
详细信息
ISBN:
(纸本)3540664432
In this paper, we present an heterogeneous parallel computer dedicated to high realism computer graphics. A small network, with a reduced chip set, allows us to reduce rendering time by a very attractive factor. the low level mechanisms of the network are designed to manage the wide variety of data and algorithms used in computer graphics. Some nodes of the network may be specialized in the most time consuming parts of the algorithm and have specific data paths. thanks to the function composition scheme, we unify boththe management of specialization and of parallelism. those mechanisms allow flexibility and easy design of programs.
We introduce a novel methodology for the quantitative assessment of the effectiveness and portability of models of parallel computation. Specifically, we relate the effectiveness of a model M, adopted for algorithm de...
详细信息
ISBN:
(纸本)3540664432
We introduce a novel methodology for the quantitative assessment of the effectiveness and portability of models of parallel computation. Specifically, we relate the effectiveness of a model M, adopted for algorithm design, with respect to a platform M', where algorithms developed for M are ultimately executed, to the product of cross-simulation slowdowns between M and M'. the portability of M with respect to a class of platforms can be estimated by its minimum effectiveness over the platforms in the class. We apply our methodology to assess the portability of enhanced variants of the BSP model with respect to processor networks, with particular emphasis on multidimensional arrays.
Convolution decomposition allowed creation of fast computation algorithms within the scope of sequential processing [1], [2]. However, in one case, the de- composition methods made the algorithm structure redundant, w...
ISBN:
(纸本)3540663630
Convolution decomposition allowed creation of fast computation algorithms within the scope of sequential processing [1], [2]. However, in one case, the de- composition methods made the algorithm structure redundant, whereas in other case, they imposed restrictions on the decomposition parameters, which need to be mutually prime numbers. parallelprocessing requires structural flexibility of algorithms, therefore the decomposition methods primordially characterized by redundancy and restrictions imposed on the parameters are not effective. the methods oriented to parallelprocessing were created on the basis of the group- theoretic approach to decomposition. the approach is complex in character, that is, it is orientated towards the decomposition of a number of basic functions of digital signal processing — convolution, correlation, discrete Fourier transform (DFT). the objective of this paper is to develop a collection of methods for the parallel computation of convolution by generalizing and extending the results of the group-theoretic decomposition of DFT and convolution [3], [4].
In this paper, we present the design of Pi (it), an ALDOR library to express parallel programs. ALDOR is a general purpose programming language designed for computer algebra and Pi (it) provides an ALDOR low-level int...
详细信息
ISBN:
(纸本)3540664432
In this paper, we present the design of Pi (it), an ALDOR library to express parallel programs. ALDOR is a general purpose programming language designed for computer algebra and Pi (it) provides an ALDOR low-level interface that interacts with hardware or system tools in order to express parallelism. Additionally, Pi (it) provides an API that hides any low-level details such as sending messages, creating threads and provides an interface for data parallelism. this paper presents our design decisions and our implementation as well as examples of how easy ALDOR programmers can implement parallelalgorithms in a high-level abstract way with Pi (it).
In this paper, a multithreaded implementation technique for piecewise execution of memory-intense nested data parallel programs is presented. the execution model and some experimental results are described.
ISBN:
(纸本)3540664432
In this paper, a multithreaded implementation technique for piecewise execution of memory-intense nested data parallel programs is presented. the execution model and some experimental results are described.
In this paper we have presented all-electron full-potential ab initio simulation method with introduction of mixed-basis, and have cited several typical examples which indicate that it is possible to predict propertie...
详细信息
In this paper we have presented all-electron full-potential ab initio simulation method with introduction of mixed-basis, and have cited several typical examples which indicate that it is possible to predict properties of materials prior to experimental. Based on the ab initio calculation of the total energy, cluster variation, and direct methods function, it is possible to bridge the limited scheme of the ab initio treatment to real complex materials. Furthermore, to overcome the limited computer power, we have developed parallelprocessing codes and tested their efficiencies as well.
We discuss several issues relevant for parallel wavelet transforms and their possible implications on the choice of a proper programming paradigm for corresponding multiprocessor implementations.
ISBN:
(纸本)3540664432
We discuss several issues relevant for parallel wavelet transforms and their possible implications on the choice of a proper programming paradigm for corresponding multiprocessor implementations.
In most interior point methods for linear programming, a sequence of weighted linear least squares problems are solved, where the only changes from one iteration to the next are the weights and the right hand side. th...
详细信息
ISBN:
(纸本)3540664432
In most interior point methods for linear programming, a sequence of weighted linear least squares problems are solved, where the only changes from one iteration to the next are the weights and the right hand side. the weighted least squares problems are usually solved as weighted normal equations by the direct method of Cholesky factorization. In this paper, we consider solving the weighted normal equations by a preconditioned conjugate gradient method at every other iteration. We use a class of preconditioners based on a low rank correction to a Cholesky factorization obtained from the previous iteration. Numerical results show that when properly implemented, the approach of combining direct and iterative methods is promising.
the proceedings contain 210 papers. the topics discussed include: adaptive scheduling for task farming with grid middleware;applying human factors to the design of performance tools;building the teraflops/petabytes pr...
ISBN:
(纸本)3540664432
the proceedings contain 210 papers. the topics discussed include: adaptive scheduling for task farming with grid middleware;applying human factors to the design of performance tools;building the teraflops/petabytes production supercomputing center;a coming of age for Beowulf-class computing;using preemptive thread migration to load-balance data-parallel applications;multi-protocol communications and high speed networks;an online algorithm for dimension-bound analysis;improving the performance of distributed shared memory environments on grid multiprocessors;performance analysis of wormhole switching with adaptive routing in a two-dimensional torus;set associative cache behavior optimization;a performance study of modern web server applications;performance evaluation of object oriented middleware;and performance evaluation and benchmarking of native signal processing.
the line grain, data-driven parallelism shown by neural models as the Boltzmann machine cannot be implemented in an entirely efficient way either in general-purpose multicomputers or in networks of computers, which ar...
详细信息
ISBN:
(纸本)3540660682
the line grain, data-driven parallelism shown by neural models as the Boltzmann machine cannot be implemented in an entirely efficient way either in general-purpose multicomputers or in networks of computers, which are nowadays the most common parallel computer architectures. In this paper we present a parallel implementation of a modified Boltzmann machine where the processors, with disjoint subsets of neurons allocated, asynchronously compute the evolution of their neurons by using values that might not be updated for the remaining neurons, thus reducing interprocessor communication requirements. An evolutionary algorithm is used to learn the rules that allow the processors to cooperate by Interchanging the local optima that they find while concurrently exploring different zones of the Boltzmann machine state space. thus, the way the processors interact changes dynamically during execution of the algorithm, adapted to the problem at hand. Good figures for speedup with respect to the Boltzmann machine computation in a uniprocessor computer have been experimentally obtained.
暂无评论