We describe an efficient parallel algorithm for hidden-surface removal for terrain maps. the algorithm runs in O(log4 n) steps on the CREW PRAM model with a work bound of O((n+k)polylog(n)) where n and k are the input...
详细信息
ISBN:
(纸本)0818684046
We describe an efficient parallel algorithm for hidden-surface removal for terrain maps. the algorithm runs in O(log4 n) steps on the CREW PRAM model with a work bound of O((n+k)polylog(n)) where n and k are the input and output sizes respectively. In order to achieve the work bound we use a number of techniques, among which our use of persistent data-structures is somewhat novel in the context of parallel algorithms. To the best of our knowledge this is the most efficient parallel algorithm for hidden-surface removal for an important class of 3-D scenes.
In array radar signal processingapplications, the processing demands range from tens of GFLOPS to several TFLOPS. To address this, as well as the, size and power dissipation issues, a special purpose `array signal pr...
详细信息
ISBN:
(纸本)0818684046
In array radar signal processingapplications, the processing demands range from tens of GFLOPS to several TFLOPS. To address this, as well as the, size and power dissipation issues, a special purpose `array signal processing' architecture is proposed. We argue that a combined MIMD-SIMD system can give flexibility, scalability, and programmability as well as high computing density. the MIMD system level, where SIMD modules are interconnected by a fiber-optic real-time network, provides the high level flexibility while the SIMD module level provides the compute density. In this paper we evaluate different design alternatives and show how the VEGA architecture was derived. By examining the applications and the algorithms used, the SIMD mesh processor is found be sufficient. However, the smaller the meshes are the better is the flexibility and efficiency. then, based on prototype VLSI implementations and on instruction statistics, we find that a relatively large pipelined processing element maximizes the performance per area. It is thereby concluded that the small SIMD mesh processor array with powerful processing elements is the best choice. these observations are further exploited in the design of the single-chip SIMD processor array to be included in the MIMD-style overall system. the system scales from 6.4 GFLOPS to several system TFLOPS peak performance.
this paper studies load balancing issues for classes of problems with certain bisection properties. A class of problems has alpha-bisectors if every problem in the class can be subdivided into two subproblems whose we...
详细信息
ISBN:
(纸本)3540649522
this paper studies load balancing issues for classes of problems with certain bisection properties. A class of problems has alpha-bisectors if every problem in the class can be subdivided into two subproblems whose weight (i.e. workload) is not smaller than an alpha-fraction of the original problem. It is shown that the maximum weight of a subproblem produced by Algorithm HF, which partitions a given problem into N subproblems by always subdividing the problem with maximum weight, is at most a factor of [1/alpha] . (1 - alpha)([1/alpha]-2) greater than the theoretical optimum (uniform partition). this bound is proved to be asymptotically tight. Two strategies to use Algorithm HF for load balancing distributed hierarchical finite element simulations are presented. For this purpose, a certain class of weighted binary trees representing the load of such applications is shown to have 1/4-bisectors. the maximum resulting load is at most a factor of 9/4 larger than in a perfectly uniform distribution in this case.
this work presents MUSE, a graphical environment for modeling interactive networked multimedia applications. through an advanced graphic interface and a new highlevel authoring model, it is possible to create complex ...
详细信息
this work describes a Visual Environment for the Development of parallel Real-Time programs, a tool whose aim is to facilitate the generation and debugging of source code of applications developed for the parallel Ker...
详细信息
MPEG-4 is currently being developed by MPEG to specify the technologies for supporting current and emerging multimedia applications. Because of its object-based features and flexible toolbox approach, it is much more ...
详细信息
MPEG-4 is currently being developed by MPEG to specify the technologies for supporting current and emerging multimedia applications. Because of its object-based features and flexible toolbox approach, it is much more complex than previous video coding standards. We believe that software-based implementation on parallel and distributed computing systems is a natural and viable option. In this paper, we describe such an approach on the MPEG-4 video encoder using a cluster of workstations. We propose to use hierarchical Petri Nets as a modeling tool to describe the temporal relations and time constrains among various video objects at different levels. this would allow us to perform scheduling with a guarantee of synchronization among multiple objects. A dynamic shape-adaptive data parallel approach is used in the spatial domain for further speed-up gain. Our preliminary results indicate that real-time MPEG-4 encoding using distributed and parallel computing is achievable.
the proceedings contain 33 papers. the topics discussed include: application experiences withthe Globus toolkit;strings: a high-performance distributed shared memory for symmetrical multiprocessor clusters;two-stage ...
ISBN:
(纸本)0818685794
the proceedings contain 33 papers. the topics discussed include: application experiences withthe Globus toolkit;strings: a high-performance distributed shared memory for symmetrical multiprocessor clusters;two-stage transaction processing in client-server DBMSs;authorization for metacomputing applications;on the effectiveness of distributed checkpoint algorithms for domino-free recovery;Hectiling: an integration of fine and coarse-grained load-balancing strategies;Otter: bridging the gap between MATLAB and ScaLAPACK;matchmaking: distributed resource management for high throughput computing;distant I/O: one-sided access to secondary storage on remote processors;adaptive utilization of communication and computational resources in high-performance distribution systems: the EMOP approach;TeleMed: wide-area, secure, collaborative object computing with Java and CORBA for healthcare;optimizing protocol parameters to large scale PC cluster and evaluation of its effectiveness withparallel data mining;on the effectiveness of distributed checkpoint algorithms for domino-free recovery;autopilot: adaptive control of distributedapplications;prediction and adaptation in active harmony;a resource query interface for network-aware applications;high-performance distributed :digital libraries: building the interspace on the grid;and cooperative caching of dynamic content on a distributed web server.
In this paper, we present a methodology for mapping an Embedded Signal processing (ESP) appfication onto HPC platforms such that the throughput performance is maximized. Previous approaches used a linear pipelined exe...
详细信息
parallel programming continues to be difficult and errorprone, whether starting from specifications or from an existing sequential program. this paper presents (1) a methodology for parallelizing sequential applicatio...
详细信息
In this paper we introduce a page-based Lazy Release Consistency protocol called ADSM that constantly and efficiently adapts to the applications' sharing patterns. Adaptation in ADSM is based on our dynamic catego...
详细信息
In this paper we introduce a page-based Lazy Release Consistency protocol called ADSM that constantly and efficiently adapts to the applications' sharing patterns. Adaptation in ADSM is based on our dynamic categorization of the type of sharing experienced by each page. Pages can be categorized as falsely-shared, migratory, or producer/consumer(s). Migratory and producer/consumer(s) pages are managed in single-writer mode, while falsely-shared data are managed in multiple-writer mode. Coherence is kept with invalidations for most types of the shared data, but updates are used for lock-protected data in migratory state and barrier-protected data in producer/consumer(s) state. We performed experiments with 6 parallelapplications on an 8-node SP2 system, comparing our protocol against standard TreadMarks and a version of TreadMarks that also adapts to sharing patterns. Our results show that ADSM consistently outperforms its competitors;our protocol can improve the TreadMarks speedups by as much as 155%, while surpassing the performance of the adaptive TreadMarks implementation by as much as 67%. Our main conclusions are that our categorization and adaptation strategies are useful techniques for improving the performance of page-based software DSMs, while ADSM is a highly-efficient option for low-cost parallel computing.
暂无评论