Ray tracing is one of the computer graphics techniques used to render high quality images. Ray tracing complex scenes can require large amounts of CPU time and memory storage. In this paper, we present a parallel impl...
详细信息
Ray tracing is one of the computer graphics techniques used to render high quality images. Ray tracing complex scenes can require large amounts of CPU time and memory storage. In this paper, we present a parallel implementation of ray tracing algorithm on the Intel Delta parallel computer. Two key issues of efficient implementation are load balancing and database distribution. In our database distribution, one part of database is duplicated on each processor and the remaining part is evenly distributed among groups of processors. We balance load among processors by distributing subimages to processors in a global fashion based on previous workload requests.
Commodities-built clusters, a low cost alternative for distributedparallelprocessing, brought high-performance computing to a wide range of users. However the existing widespread tools for distributedparallel progr...
详细信息
ISBN:
(纸本)0769520464
Commodities-built clusters, a low cost alternative for distributedparallelprocessing, brought high-performance computing to a wide range of users. However the existing widespread tools for distributedparallel programming, such as messaging passing libraries, does not attend new software engineering requirements that nave emerged due to increase in complexity of applications. Haskell(#) is a parallel programming language intending to reconcile higher abstraction and modularity with scalable performance. In this paper it is demonstrated the use of Haskell(#) in the programming of three SPMD benchmark programs, which have lower-level MPI implementations available.
We present a parallel algorithm for the Euclidean distance transformation (EDT). It is a `divide-and-conquer' algorithm based on a fast sequential algorithm for the Signed EDT (SEDT). The combining step that follo...
详细信息
ISBN:
(纸本)081864222X
We present a parallel algorithm for the Euclidean distance transformation (EDT). It is a `divide-and-conquer' algorithm based on a fast sequential algorithm for the Signed EDT (SEDT). The combining step that follows the local partial calculation of the SEDT can be done efficiently after reformulating the SEDT problem as the partial calculation of a Voronoi Diagram. This leads to an algorithm with two local calculation steps whose computational complexity is proportional to the number of pixels of the subregions and a global calculation step with complexity proportional to the image perimeter. This article contains a description of the algorithm, a complexity analysis, a discussion on load balance and timings obtained on an iPSC/2.
This talk will introduce the massively parallel computer research and development of the Japanese Real World Computing Project (RWC). RWC is a ten-year project whose budget is expected to be 600 million to 700 million...
详细信息
This talk will introduce the massively parallel computer research and development of the Japanese Real World Computing Project (RWC). RWC is a ten-year project whose budget is expected to be 600 million to 700 million dollars (US). We plan to develop an original 16 K PE stand-alone computer with a parallel C++ language, a more abstract object-oriented language, and a massively parallel operating system.
distributed search is important for finding solutions to hard problems in artificial intelligence. Building distributed search systems can be difficult because the steps required to solve these problems are interdepen...
详细信息
ISBN:
(纸本)9781479941162
distributed search is important for finding solutions to hard problems in artificial intelligence. Building distributed search systems can be difficult because the steps required to solve these problems are interdependent. Fortunately, aspects of search systems exhibit commonalities that allow them to be distributed using several different paradigms. These paradigms can be used as the basis for libraries to make implementing new systems, or distributing existing systems, easier. DisSLib:CC is such a library. DisSLib:CC implements the distributed search paradigm search with a central common search state. In this paradigm, agents collaborate to update a central search state. Systems built with DisSLib:CC require very little extra code to implement compared to the standalone versions. These systems can also show improvements in wall-clock run-times, which can be improved by varying the meta-parameters of the distribution paradigm. One such parameter is the number of steps, transitions, that the agents execute before consulting the common search state and in this paper we show how varying this meta-parameter improves the efficiency of the systems.
This paper presents reduction recognition and parallel code generation strategies for distributed-memory multiprocessors. We describe techniques to recognize a broad range of implicit reduction operations, including t...
详细信息
ISBN:
(纸本)0818684038
This paper presents reduction recognition and parallel code generation strategies for distributed-memory multiprocessors. We describe techniques to recognize a broad range of implicit reduction operations, including those involving statements at multiple loop nesting levels and intermixed with conditional control flow. We introduce two new optimizations: factoring which increases data locality for SUM and PRODUCT reductions, and index encoding which enables a single global communication to accomplish both an extreme value reduction and an extreme value location reduction. We have implemented these techniques in the dHPF compiler for High Performance Fortran (HPF). We evaluate their effectiveness experimentally by compiling several reduction benchmarks with dHPF and two commercial HPF compilers, and comparing the performance of the generated code on an IBM SP2. Our results show that our recognition techniques are more powerful and that our index encoding and factoring optimizations can improve performance by a factor of two where they apply.
The increasingly widespread use of cluster architectures has resulted in many new application scenarios for data replication. While data replication is, in principle, a well understood problem, recovery of replicated ...
详细信息
ISBN:
(纸本)0769516599
The increasingly widespread use of cluster architectures has resulted in many new application scenarios for data replication. While data replication is, in principle, a well understood problem, recovery of replicated systems has not yet received enough attention. In the case of clusters, recovery procedures are particularly important since they have to keep a high level of availability even during recovery. In fact, recovery is part of the normal operations of any cluster as the cluster is expected to continue working while sites leave or join the system. However, traditional recovery techniques usually require stopping processing. Once a quiescent state has been reached, the system proceeds to synchronize the state of failed or new replicas. In this paper, we concentrate on how to perform recovery in a replication middleware without having to stop processing. The proposed protocol focuses on how to minimize the redundancies that take place during concurrent recovery of several sites.
In this paper, we consider the problem of covering vertices distributed over a two-dimensional plane with as small number of unit disks as possible. This restricted version of the set cover problem is motivated by the...
详细信息
ISBN:
(纸本)9781479941162
In this paper, we consider the problem of covering vertices distributed over a two-dimensional plane with as small number of unit disks as possible. This restricted version of the set cover problem is motivated by the problem of reducing the network cost of Wireless Sensor Networks which have been widely used in recent years. The main contribution of the current paper is the proposal of an exact algorithm for solving the minimum set cover by unit disks which outputs an optimum solution in O(n2((10/epsilon)root nlog2 n)) time, where epsilon is the minimum distance between district vertices in the plane. This result indicates that if sensor nodes are "sparsely" distributed over the region so that the distance to the closest sensor is long enough compared with the transmission radius of the base station (i.e., when c = omega(1/root n)), we can significantly reduce the running time of the previous algorithm which solves the ordinary set cover problem in an exact manner.
We describes a highly parallel integrated floating point multiply/divide/square root unit which can achieve optimal speedup with regards to the typical O(n2) implementation cost of these operations. The core is a redu...
详细信息
This paper describes the work of an objectual framework designed to be used in the parallelization of a set of related algorithms. As a concrete application a parallel Ant Colony Optimization algorithm (ACO) for the T...
详细信息
ISBN:
(纸本)0769522106
This paper describes the work of an objectual framework designed to be used in the parallelization of a set of related algorithms. As a concrete application a parallel Ant Colony Optimization algorithm (ACO) for the Travelling Salesman Problem (TSP) is presented. The idea behind the system we are describing is to have a re-usable framework for running several sequential algorithms in a parallel environment. The algorithms that the framework can he used with have several things in common: they have to run in cycles and the work should be possible to be split between several "processing units". The parallel framework uses the message-passing communication paradigm and is organized as a master-slave system. The ACO for TSP implemented by means of the parallel framework proves to have good performances: approximatively linear speedup and low communication cost.
暂无评论