In the medical field, volume rendering provides good quality 3D visualizations but is still not interactive enough for a day-to-day practice. the most efficient sequential algorithm is the Shear-Warp algorithm. It ren...
详细信息
ISBN:
(纸本)3540664432
In the medical field, volume rendering provides good quality 3D visualizations but is still not interactive enough for a day-to-day practice. the most efficient sequential algorithm is the Shear-Warp algorithm. It renders up to 10 images per second for a small dataset. the goal of this paper is to present an efficient parallel implementation of the Shear-Warp algorithm for a distributed memory architecture, a cluster of PCs connected with a high speed network.
General purpose microprocessors have long been considered a computing platform unsuited to image processing and vision tasks. the so-called Von-Neuman paradigm and the associated memory bottleneck have motivated the r...
详细信息
ISBN:
(纸本)3540664432
General purpose microprocessors have long been considered a computing platform unsuited to image processing and vision tasks. the so-called Von-Neuman paradigm and the associated memory bottleneck have motivated the research into various forms of parallelprocessing and of special processors for vision. the outcome of this long standing effort is negligible, if one considers the computing platforms that became a true product. Recently, the micro-architecture of some general purpose microprocessors has been augmented with extensions to support multimedia processing. It is worthwhile considering how much speed-up can be actually obtained by the limited SIMD processing mode that is embedded in these extensions. this paper presents experimental results obtained on a very simple algorithm, the Haar transform, that has been coded for the HP and the Intel multimedia microengines. Preliminary results reported here show that the system environment (type and dimensions of first and second level caches, and compiler efficiency) affects considerably the theoretical speed-up due to the SIMD microengine.
We describe a software library for dynamic load balancing of finite element codes;the application code has to provide the current distributed mesh and information on the calculation and communication requirements, and...
详细信息
ISBN:
(纸本)3540664432
We describe a software library for dynamic load balancing of finite element codes;the application code has to provide the current distributed mesh and information on the calculation and communication requirements, and receives from the library all necessary information to re-allocate the application data. the library computes a new partitioning, either via direct mesh migration or via parallel graph re-partitioning, by interfacing to the ParMetis or Jostle package. We describe the functionality of the DRAMA library and we present some results.
Geographical Information Systems (GIS) are able to manipulate spatial data. Such spatial data can be available in a variety of formats, one of the most important of which is the vector-topological. this format retains...
详细信息
A sort-last 3D parallel rendering machine distributes the triangles to draw to different processors. When building such a machine with each processor having a texture cache, the texture locality is worse and the perfo...
详细信息
Real-time multi-media applications need large processing power and yet require a low-power implementation in an embedded programmable parallel processor context. Our main contribution in this context is the proposal o...
详细信息
ISBN:
(纸本)3540664432
Real-time multi-media applications need large processing power and yet require a low-power implementation in an embedded programmable parallel processor context. Our main contribution in this context is the proposal of a formalized DTSE (data transfer and storage exploration) methodology, which allows to significantly reduce system bus load and hence overall system performance and also power consumption. We demonstrate the complementarity of this methodology by coupling the DTSE with a state-of-the-art performance optimizing and parallelizing compiler. Experiments on two real-life video and image processing applications show that this combined approach heavily reduces the memory accesses and bus-loading and hence power and also significantly reduces the total execution time. Decomposing the detailed parallelization and DTSE issues into two different stages is important to obtain the benefits of boththe stages without exploding the complexity of solving all the issues simultaneously.
the crossbreeding between advanced microprocessor design and Field Programmable Gate AI-rays (FPGAs) has produced the Field Programmable Processor Array: code named FPPA. the first integrated version has been targeted...
详细信息
ISBN:
(纸本)0769500439
the crossbreeding between advanced microprocessor design and Field Programmable Gate AI-rays (FPGAs) has produced the Field Programmable Processor Array: code named FPPA. the first integrated version has been targeted for low power consumption parallelprocessing. the FPPA is composed of a 10x10 array of RISC microcontrollers offering up to 500 MIPS at 5 MHz for processors (20 MHz for communications). the very low power feature of the core processor results in a I Watt power consumption for the whole array at 5 MHz and makes it particularly interesting for portable devices that require quite complex algorithms. In addition, FPPA principle, i.e., fault-tolerant large array of cells interconnected with an asynchronous communication scheme, is applicable on alternative structures for the cell architecture.
Maintaining and reusing parallel numerical applications is not an easy task. We propose an OO design which enables very good code reuse for both sequential and parallel linear algebra applications. A linear algebra cl...
详细信息
ISBN:
(纸本)3540664432
Maintaining and reusing parallel numerical applications is not an easy task. We propose an OO design which enables very good code reuse for both sequential and parallel linear algebra applications. A linear algebra class library called LAKe is implemented using our design method. We show how the same code is used to implement boththe sequential and the parallel version of the iterative methods implemented in LAKe. We show that polymorphism is insufficient to achieve our goal and that both genericity and polymorphism are needed. We propose a new design pattern as a part of the solution. Some numerical experiments validate our approach and show that efficiency is not sacrified.
Large-vocabulary continuous-speech recognition (LVCR) speaker-independent systems which integrate cross-word context dependent acoustic models and n-gram language models are difficult to parallelize because of their i...
详细信息
ISBN:
(纸本)3540664432
Large-vocabulary continuous-speech recognition (LVCR) speaker-independent systems which integrate cross-word context dependent acoustic models and n-gram language models are difficult to parallelize because of their interwoven structure, large dynamic data structures, and complex object-oriented software design. this paper shows how retrospective decomposition can be achieved if a quantitative analysis is made of dynamic system behaviour. A design which accommodates unforeseen effects and future modifications is presented.
In the last two or three decades, the task scheduling onto parallelprocessing systems have been extensively studied. the structure of the parallelprocessing systems of the scheduling problem which many researchers h...
详细信息
暂无评论