Padded sorting requires n input keys to be output in sorted order in an array with slightly more than n locations, unused locations being filled with a special null value. We show that a deterministic CRCW PRAM with k...
详细信息
Simulated annealing is a powerful and general algorithm for solving combinatorial optimization problems, but takes an extremely long time to execute. the resulting cost error from a reduction of the costly synchroniza...
详细信息
ISBN:
(纸本)0818642009
Simulated annealing is a powerful and general algorithm for solving combinatorial optimization problems, but takes an extremely long time to execute. the resulting cost error from a reduction of the costly synchronization in distributed memory multicomputers is tolerated by the hill climbing nature of simulated annealing. this paper addresses the behavior of algorithms that contain cost error. the new cost error measurement and relaxing synchronization method predicts the amount of cost error that an algorithm will tolerate and still converge. this method also explains three interesting phenomena of the cost error statistically. Finally, this paper applies the new method to the problem composite material stock cutting. Implementation results of the parallel space-decomposition algorithm on Intel iPSC/2 are reported.
this paper uses an abstract machine approach to compare the mechanisms of two parallel machines: the J-Machine and the CM-5. High-level parallel programs are translated by a single optimizing compiler to a fine-graine...
详细信息
ISBN:
(纸本)0818638109
this paper uses an abstract machine approach to compare the mechanisms of two parallel machines: the J-Machine and the CM-5. High-level parallel programs are translated by a single optimizing compiler to a fine-grained abstract parallel machine, TAM. A final compilation step is unique to each machine and optimizes for specifics of the architecture. By determining the cost of the primitives and weighting them by their dynamic frequency in parallel programs, we quantify the effectiveness of the following mechanisms individually and in combination. Efficient processor/network coupling proves valuable. Message dispatch is found to be less valuable without atomic operations that allow the scheduling levels to cooperate. Multiple hardware contexts are of small value when the contexts cooperate and the compiler can partition the register set. Tagged memory provides little gain. Finally, the performance of the overall system is strongly influenced by the performance of the memory system and the frequency of control operations.
We describe an efficient parallel algorithm to solve the single function coarsest partition problem. the algorithm runs in O(logn) time using 0(n log log n) operations on the Arbitrary CRCW PRAM. the previous best kno...
详细信息
Professionals in various fields such as medical imaging, biology and civil engineering require rapid access to huge amounts of uncompressed pixmap image data. To fulfill these requirements, a parallel image server arc...
详细信息
ISBN:
(纸本)081863460X
Professionals in various fields such as medical imaging, biology and civil engineering require rapid access to huge amounts of uncompressed pixmap image data. To fulfill these requirements, a parallel image server architecture is proposed, based on arrays of intelligent disk nodes, with each disk node composed of one processor and one disk. this contribution shows how images can be partitioned into extents and efficiently distributed among available intelligent disk nodes. the image server's performance is analyzed according to various parameters such as the number of cooperating disk nodes, the sizes of image file extents, the available communication throughput and the processing power of disk node and image server processors. Important image access speed improvements are obtained by image extent caching and image part extraction in disk nodes. With T800 transputer-based technology, a system composed of eight disk nodes offers access to three full-color 512×512 pixmap image parts per second (2.4 megabytes per second). For the same configuration but withthe recently announced T9000 transputer, image access throughput is eight images per second (6.8 megabytes per second).
We present an elegant deterministic load balancing strategy for distribution sort that is applicable to a wide variety of parallel disks and parallel memory hierarchies with both single and parallel processors. the si...
详细信息
the proceedings contain 33 papers. the special focus in this conference is on Programming Language Implementation and Logic Programming. the topics include: Executable specifications for language implementation;avoidi...
ISBN:
(纸本)9783540571865
the proceedings contain 33 papers. the special focus in this conference is on Programming Language Implementation and Logic Programming. the topics include: Executable specifications for language implementation;avoiding dynamic delays in functional logic programs;a debugging model for functional logic programs;a conservative approach to meta-programming in constraint logic programming;the versatility of handling disjunctions as constraints;improvements in compile-time analysis for ground prolog;a new top-down parsing algorithm for left-recursive DCGs;specification and implementation of grammar couplings using attribute grammars;programming language specification and prototyping using the MAX system;flang and its implementation;efficient lazy narrowing using demandedness analysis;a demand driven computation strategy for lazy narrowing;functional programming languages with logical variables: a linear logic view;objects with state in contextual logic programming;a novel method for parallel implementation of findall;a parallel implementation for AKL;inlining to reduce stack space;a WAM-based implementation of a logic language with sets;executing bounded quantifications on shared memory multiprocessors;a lattice of abstract graphs;higher-order chaotic iteration sequences;proving the correctness of compiler optimisations based on strictness analysis;abstract complexity of prolog based on WAM;development of rewriting strategies;narrowing approximations as an optimization for equational logic programs;a back end generator;selflog: language and implementation;embedding declarative subprograms into imperative constructs;stack management of runtime structures in distributed implementations and efficient register allocation for large basic blocks.
this paper presents a divide-and-conquer ray-traced volume rendering algorithm and a parallel image compositing method, along withtheir implementation and performance on the Connection Machine CM-5, and networked wor...
详细信息
this paper presents a divide-and-conquer ray-traced volume rendering algorithm and a parallel image compositing method, along withtheir implementation and performance on the Connection Machine CM-5, and networked workstations. this algorithm distributes boththe data and the computations to individual processing units to achieve fast, high-quality rendering of high-resolution data. the volume data, once distributed, is left intact. the processing nodes perform local raytracing of their subvolume concurrently. No communication between processing units is needed during this locally ray-tracing process. A subimage is generated by each processing unit and the final image is obtained by compositing subimages in the proper order, which can be determined a priori. Test results on the CM-5 and a group of networked workstations demonstrate the practicality of our rendering algorithm and compositing method.
the ability to generate visual representations of data, and the ability to enhance data into a suitable form for the purpose of visual representation, form two key components in a scientific visualization system. By a...
详细信息
the ability to generate visual representations of data, and the ability to enhance data into a suitable form for the purpose of visual representation, form two key components in a scientific visualization system. By a visual representation we mean the ability to render the data, using visual cues, such that the important features are readily perceived by the user. By the ability to enhance data we mean the ability to apply transformations to the data so that salient features embedded in the data become discernible and quantifiable. the rendering of data, computer graphics, and the enhancement of data, image processing, have emerged over the last twenty years into separate scientific disciplines. However, in scientific visualization and other applications of empirical data interpretation, we are increasingly confronted withthe need to combine both data rendering and data transformation capabilities under one system framework. this paper describes the design issues and implementation of a program for visualizing and enhancing volume data on distributed memory architectures. Our design is motivated by the desire to interactively view, transform, and interpret volume data acquired using seismic imaging techniques. Experimental results derived from an implementation on the Connection Machine CM-5 are described.
the proceedings contain 70 papers. the topics discussed include: software caching on cache-coherent multiprocessors;cache coherent shared memory hypercube multiprocessors;efficient evaluation of arbitrary set-associat...
ISBN:
(纸本)0818632003
the proceedings contain 70 papers. the topics discussed include: software caching on cache-coherent multiprocessors;cache coherent shared memory hypercube multiprocessors;efficient evaluation of arbitrary set-associative caches on multiprocessors;the scalable tree protocol-a cache coherence approach for large-scale multiprocessors;hierarchical shuffle-exchange and de Bruijn networks;efficient networks for realizing point-to-point assignments in parallel processors;a new class of interconnection networks based on alternating group;hyperweave: a fault-tolerant expandable interconnection network;and programming environment for phase-reconfigurable parallel programming on SuperNode.
暂无评论