Volume visualization is an important tool in many scientific applications, requiring intensive processing and dealing with large amounts of data. Therefore, the size of these data always exceeds the processing and vis...
详细信息
Volume visualization is an important tool in many scientific applications, requiring intensive processing and dealing with large amounts of data. Therefore, the size of these data always exceeds the processing and visualization capacities of the conventional workstations. The authors present their work in progress concerning techniques of data prefetching and data prediction to volume visualization on low cost PC clusters. The approach is based on the distributed shared memory (DSM) paradigm (M.K. Zuffo et al., 1998), which can greatly facilitate parallel programming efforts on volume visualization applications.
Volume rendering has great potential for parallelization due to the tremendous number of computations necessary. Besides the enormous computational power needed, the memory interface is usually of crucial importance a...
详细信息
Volume rendering has great potential for parallelization due to the tremendous number of computations necessary. Besides the enormous computational power needed, the memory interface is usually of crucial importance and frequently the bottleneck. The paper presents an implementation of a parallel ray casting algorithm for orthogonal projections on a new single-chip SIMD architecture. Concurrent processing of rays is scheduled such that redundant memory accesses of the individual processing elements can be detected by the channel controller. Hence, data can be read efficiently in block-wise manner. For improved image quality, a permutation of the Shear-Warp algorithm with trilinear interpolation is used. The steps of the ray casting algorithm are carefully mapped onto the architecture avoiding expensive floating point operation, giving superior performance over previously reported results. A detailed analysis illustrates the timing of the individual computations and memory accesses, identifying the costliest parts of the implementation.
Hydrophobicity of polymeric insulating material surface such as silicone rubber insulator (SIR) was studied by using image data analysis of the sample surface. Using PVM (parallel virtual machine) programming procedur...
详细信息
Hydrophobicity of polymeric insulating material surface such as silicone rubber insulator (SIR) was studied by using image data analysis of the sample surface. Using PVM (parallel virtual machine) programming procedure for UNIX operating system, hydrophobicity of polymer surface was evaluated from their surface images of water droplets on the sample surface. The hydrophobic surface images under the spraying of distilled water were taken by CCD camcorder. The video image was divided into each image frame and PVM program analyzed each image data, simultaneously. Image indexes such as size and shape factor, fc, of droplets can be evaluated from each hydrophobic image frame. Then, the distribution of the size and fc were evaluated. Time variation of the hydrophobic indexes during high electric field application was also evaluated, where electric field deformed the droplets.
The current Java memory model is flawed and has many unintended implications. As multithreaded programming becomes increasingly popular in Java and hardware memory architectures become more aggressively parallel, it i...
详细信息
ISBN:
(纸本)0769514081
The current Java memory model is flawed and has many unintended implications. As multithreaded programming becomes increasingly popular in Java and hardware memory architectures become more aggressively parallel, it is of significant importance to provide a framework for formally analyzing the Java memory model. The Mur/spl phi/ verification system is applied to study the commit/reconcile/fence (CRF) memory model, one of the proposed thread semantics to replace the present Java memory model. The CRF proposal is formally specified using the Mur/spl phi/ description language. A suite of test programs is designed to reveal pivotal properties of the model. The results demonstrate the feasibility of applying model checking techniques to language level memory model specifications. Not only can it help the designers to debug their designs, it also provides a formal mechanism for Java programmers to understand the subtleties of the Java memory model.
A relatively new trend in parallel programming scheduling is the so-called mixed task and data scheduling. It has been shown that mixing task and data parallelism to solve large computational applications often yields...
详细信息
A relatively new trend in parallel programming scheduling is the so-called mixed task and data scheduling. It has been shown that mixing task and data parallelism to solve large computational applications often yields better speedups compared to either applying more task parallelism or pure data parallelism. In this paper we present a new compile-time heuristic, named critical path and allocation (CPA), for scheduling data-parallel task graphs. Designed to have a very low cost, its complexity is much lower compared to existing approaches, such as TSAS, TwoL or CPR, by one order of magnitude or even more. Experimental results based on graphs derived from real problems as well as synthetic graphs, show that the performance loss of CPA relative to the above algorithms does not exceed 50%. These results are also confirmed by performance measurements of two real applications (i.e., complex matrix multiplication and Strassen matrix multiplication) running on a cluster of workstations.
Proposes a new AND-type flash memory cell with an assist gate (AG), which has achieved a 20-MB/s programming throughput. For high-speed parallel programming on the order of kilobytes, fast cell programming (10 ps) and...
详细信息
Proposes a new AND-type flash memory cell with an assist gate (AG), which has achieved a 20-MB/s programming throughput. For high-speed parallel programming on the order of kilobytes, fast cell programming (10 ps) and an extremely low channel current (I/sub ds/ /spl les/ 100 nA/cell) are necessary. These features were achieved by using the low current source-side injection method in which the AG was used as a program gate. The memory cell size has also been reduced to 0.104 /spl mu/m/sup 2/ by taking advantage of an AG using field isolation and a self-aligned floating gate. These technologies are the keys to giga-scale flash memories, of which the main application is content downloading.
Speculative thread-level parallelization is a promising way to speed up codes that compilers fail to parallelize. While several speculative parallelization schemes have been proposed for different machine sizes and ty...
详细信息
ISBN:
(纸本)0769511627
Speculative thread-level parallelization is a promising way to speed up codes that compilers fail to parallelize. While several speculative parallelization schemes have been proposed for different machine sizes and types of codes, the results so far show that it is hard to deliver scalable speedups. Often, the problem is not true dependence violations, but sub-optimal architectural design. Consequently, we attempt to identify and eliminate major architectural bottlenecks that limit the scalability of speculative parallelization. The solutions that we propose are: low-complexity commit in constant time to eliminate the task commit bottleneck, a memory-based overflow area to eliminate stall due to speculative buffer overflow, and exploiting high-level access patterns to minimize speculation-induced traffic. To show that the resulting system is truly scalable, we perform simulations with up to 128 processors. With our optimizations, the speedups for 128 and 64 processors reach 63 and 48, respectively. The average speedup for 64 processors is 32, nearly four times higher than without our optimizations.
This paper proposes a set of extensions to the OpenMP programming model to express complex pipelined computations. This is accomplished by defining, in the form of directives, precedence relations among the tasks orig...
详细信息
This paper proposes a set of extensions to the OpenMP programming model to express complex pipelined computations. This is accomplished by defining, in the form of directives, precedence relations among the tasks originated from work-sharing constructs. The proposal is based on the definition of a name space that identifies the work parceled out by these work-sharing constructs. Then the programmer defines the precedence relations using this name space. This relieves the programmer from the burden of defining complex synchronization data structures and the insertion of explicit synchronization actions in the program that make the program difficult to understand and maintain. This work is transparently done by the compiler with the support of the OpenMP runtime library. The proposal is motivated and evaluated with a synthetic multi-block example. The paper also includes a description of the compiler and runtime support in the framework of the NanosCompiler for OpenMP.
UPC, or Unified parallel C, is a parallel extension of ANSI C. UPC is developed around the distributed shared-memory programming model with constructs that can allow programmers to exploit memory locality, by placing ...
详细信息
UPC, or Unified parallel C, is a parallel extension of ANSI C. UPC is developed around the distributed shared-memory programming model with constructs that can allow programmers to exploit memory locality, by placing data close to the threads that manipulate them in order to minimize remote accesses. Under the UPC memory sharing model, each thread owns a private memory and has a logical association (affinity) with a partition of the shared memory. This paper discusses an early release of UPC Bench, a benchmark designed to reveal UPC compilers performance weaknesses to uncover opportunities for compiler optimizations. The experimental results from UPC Bench over the Compaq AlphaServer SC show that UPC Bench is capable of discovering such compiler performance problems. Further, it shows that if such performance pitfalls are avoided through compiler optimizations, distributed shared memory programming paradigms can result in high-performance, while the ease of programming is enjoyed.
暂无评论