Various researchers have realized the value of implementing loop fusion to evaluate dense (pointwise) array expressions. Recently, the method of template metaprogramming in C++ has been used to significantly speed-up ...
详细信息
Various researchers have realized the value of implementing loop fusion to evaluate dense (pointwise) array expressions. Recently, the method of template metaprogramming in C++ has been used to significantly speed-up the evaluation of array expressions, allowing C++ programs to achieve performance comparable to or better than FORTRAN for numerical analysis applications. Unfortunately, the template metaprogramming technique suffers from several limitations in applicability, portability, and potential performance. We present a framework for evaluating dense array expressions in object-oriented programming languages. We demonstrate how this technique supports both common subexpression elimination and threaded implementation and compare its performance to object-library and hand-generated code.
Computing distribution of light in a given environment is an important prob- lem in computer-aided photo~realistic image generation. Radiosity method has been proposed to address this problem which requires an enormou...
Computing distribution of light in a given environment is an important prob- lem in computer-aided photo~realistic image generation. Radiosity method has been proposed to address this problem which requires an enormous amount of calculation and memory. Hierarchical radiosity method is a recent approach that reduces these computational requirements by careful error analysis. It has its idea from the solution methods of N-body problems. Although hier- archical approach has greatly reduced the amount of calculations, satisfactory results still cannot be obtained in terms of processing time. Exploiting paral- lelism is a practical way to reduce the computation time further. In this thesis, we have designed and implemented a parallel hierarchical radiosity algorithm for distributed memory computers. Due to its highly irregular computational structure, hierarchical radiosity algorithms do not yield easily to paralleliza- tion on distributed memory machines. Dynamically changing computational patterns of the algorithm cause severe load imbalances. Therefore, we have developed a dynamic load balancing technique for the parallel hierarchical radiosity calculation.
Positron Emission Tomography (PET) images can be reconstructed using Fourier transform methods. This paper describes the performance of a fully 3-D Backprojection-Then-Filter (BPF) algorithm on the Gray T3E machine an...
详细信息
Positron Emission Tomography (PET) images can be reconstructed using Fourier transform methods. This paper describes the performance of a fully 3-D Backprojection-Then-Filter (BPF) algorithm on the Gray T3E machine and on a cluster of workstations. PET reconstruction of small animals is a class of problems characterized by poor counting statistics. The low-count nature of these studies necessitates 3-D reconstruction in order to improve the sensitivity of the PET system: by including axially oblique Lines Of Response (LORs), the sensitivity of the system can be significantly improved by the 3-D acquisition and reconstruction. The BPF method is widely used in clinical studies because of its speed and easy implementation. Moreover, the BPF method is suitable for on-time 3-D reconstruction as it does not need any sinogram or rearranged data. In order to investigate the possibility of on-line processing, we reconstruct a phantom using the data stored in the list-mode format by the data acquisition system, We show how the intrinsically parallel nature of the BPF method makes it suitable for on-line reconstruction on a MIMD system such as the Gray T3E. Lastly, we analyze the performance of this algorithm on a cluster of workstations.
In the sort-last-sparse parallel volume rendering system on distributed memory multicomputers, as the number of processors increases, in the rendering phase, we can get a good speedup because each processor renders im...
详细信息
ISBN:
(纸本)0769503500
In the sort-last-sparse parallel volume rendering system on distributed memory multicomputers, as the number of processors increases, in the rendering phase, we can get a good speedup because each processor renders images locally without communicating with other processors. However, in the compositing phase, a processor has to exchange local images with other processors. When the number of processors is over a threshold, the image compositing time becomes a bottleneck. In this paper, we proposed three compositing methods, the binary-swap with bounding rectangle method, the binary-swap with run-length encoding and static load-balancing method, and the binary-swap with bounding rectangle and run-length encoding method, to efficiently reduce the compositing time in the sort-last-sparse parallel volume rendering system on distributed memory multicomputers. The proposed methods were implemented on an SP2 parallel machine along with the binary-swap compositing method. The experimental results show that the binary-swap with bounding rectangle and run-length encoding method has the best performance among the four methods.
We study the problem of exploiting parallelism from search-based AI systems on distributed machines. We propose stack-splitting, a technique for implementing or-parallelism, which when coupled with appropriate schedul...
详细信息
We study the problem of exploiting parallelism from search-based AI systems on distributed machines. We propose stack-splitting, a technique for implementing or-parallelism, which when coupled with appropriate scheduling strategies leads to: (i) reduced communication during distributed execution; and, (ii) distribution of larger grain- sized work to processors. The modified technique can also be implemented on shared memory machines and should be quite competitive with existing methods. Indeed, an implementation has been carried out on shared memory machines, and the results are reported here.
parallel algorithms, based on a distributed memory machine modal, for an exhaustive search technique for motion vector estimation in video compression are being designed and evaluated. Results from the execution on a ...
详细信息
parallel algorithms, based on a distributed memory machine modal, for an exhaustive search technique for motion vector estimation in video compression are being designed and evaluated. Results from the execution on a 16,384 processor MasPar MP-1 (an SIMD machine), a 140 node Intel Paragon XP/S and a 16 node IBM SP2 (two MIMD machines), and the 16 processor PASM prototype ia partitionable SIMD/MIMD mixed-mode machine) are presented. The trade-offs of using different modes of parallelism (SIMD, SPMD, and mixed-mode) and different data partitioning schemes (thc rectangular and stripe subimagemethods are examined. The analytical and experimental results shown in this application study will help practitioners to predict and contrast the performance of different approaches to parallel implementation of this important video compression technique. The results presented are also applicable to a large class of image and video processing tasks. Case studies, such as the one presented here, are a necessary step in developing software tools for mapping an application task onto a single parallel machine and for mapping a set of independent application tasks, or the subtasks of a single application task, onto a heterogeneous suite of parallel machines.
作者:
Mailly, PGastard, MCupo, AUniv Paris 06
CNRSURA 1488 Inst Neurosci Dept Neurobiol Signaux Intercellulaires F-75252 Paris 05 France Univ Paris 06
CNRSURA 1488 Inst Neurosci Dept Neurochim Anat F-75252 Paris France CNRS
Inst Pharmacol Mol & Cellulaire UPR 411 F-06560 Valbonne France
Using a specific rat monoclonal anti-idiotypic antibody eve have examined the subcellular distribution of delta-opioid receptors in various neuronal subtypes of the rat spinal cord. The immunofluorescence was detected...
详细信息
Using a specific rat monoclonal anti-idiotypic antibody eve have examined the subcellular distribution of delta-opioid receptors in various neuronal subtypes of the rat spinal cord. The immunofluorescence was detected with a confocal microscope and in some cases serial images were processed for a three-dimensional (3-D) reconstruction of the neurons. Immunolabelling was found to be distributed throughout the spinal cord grey matter specially in the most superficial layers of the dorsal horn, around the central canal and in the region of motoneurons of the ventral horn. The 3-D reconstruction made on large neurons of lamina IX in the ventral horn and on neurons of lamina X around the central canal allowed the visualization of d-opioid receptors in the cytoplasm of the soma and proximal neurites of immunofluorescent neurons. Some immunolabelled receptors were also detected at the level of the plasma membrane of the cell bodies and in the nuclear matrix. Interestingly, a particular arrangement of delta-opioid receptors organized along parallel alignments was observed on the plasma membrane of some neurons. This study emphasizes the potential usefulness of a 3-D reconstruction in the study of the spatial arrangement of cellular components. (C) 1999 Elsevier Science B.V. All rights reserved.
This paper presents a parallel implementation of connected component labeling algorithms for gray and binary images on a one-dimensional DSP array. The system is a distributed memory MIMD and all the algorithms are de...
详细信息
This paper presents a parallel implementation of connected component labeling algorithms for gray and binary images on a one-dimensional DSP array. The system is a distributed memory MIMD and all the algorithms are developed considering this platform. Performance results of several parallel connected component labeling methods are evaluated. The multi-DSP system has demonstrated a viable performance.
Very long running queries in database systems are not uncommon in non traditional application domains such as imageprocessing or data warehousing analysis. Query optimization, therefore, is important. However, estima...
详细信息
Very long running queries in database systems are not uncommon in non traditional application domains such as imageprocessing or data warehousing analysis. Query optimization, therefore, is important. However, estimates of the query characteristics before query execution are usually inaccurate. Further, system configuration and resource availability may change during long evaluation period. As a result, queries are often evaluated with sub-optimal plan configurations. To remedy this situation, we have designed a novel approach to re-optimize suboptimal query plan configurations on-the-fly with Conquest, an extensible and distributed query processing system. A dynamic optimizer considers reconfiguration cost as well as execution cost in determining the best query plan configuration. Experimental results are presented.
The proceedings contain 15 papers from the conference on parallel and distributedmethods for imageprocessing II. The topics discussed include: parallel DSP with memory and I/O processors;analog VLSI implementation o...
详细信息
The proceedings contain 15 papers from the conference on parallel and distributedmethods for imageprocessing II. The topics discussed include: parallel DSP with memory and I/O processors;analog VLSI implementation of a morphological associative memory;real-time parallel video imageprocessing on a PC cluster;thread concept for automatic task parallelization in image analysis;new parallel vision environment in heterogeneous networked computing and toolkit for parallelimageprocessing.
暂无评论