Direct volume rendering of irregular 3D datasets demands high computational power and memory bandwidth. Recent research in optimizing volume rendering algorithms are exploring the high processing power offered by a ne...
详细信息
We propose different implementations of the sparse matrix-dense vector multiplication (SPMV) for finite fields and rings Z /m Z. We take advantage of graphic card processors (GPU) and multi-core architectures. Our aim...
详细信息
The proceedings contain 15 papers. The topics discussed include: a general lock-free algorithm for parallel state space construction;GPU-PRISM: an extension of prism for general purpose graphics processing units;three...
ISBN:
(纸本)9780769542652
The proceedings contain 15 papers. The topics discussed include: a general lock-free algorithm for parallel state space construction;GPU-PRISM: an extension of prism for general purpose graphics processing units;three high performance architectures in the parallel APMC boat;industrial strength distributed explicit state model checking;a BSP algorithm for the state space construction of security protocols;implementation of smith-waterman algorithm in openCL for GPUs;enhancing the scalability of simulations by embracing multiple levels of parallelization;parallel particle-based reaction diffusion: a GPU implementation;parallel computing algorithms for reverse-engineering and analysis of genome-wide gene regulatory networks from gene expression profiles;parameter scanning by parallel model checking with applications in systems biology;and predicting the effects of parameters changes in stochastic models through parallel synthetic experiments and multivariate analysis.
The proceedings contain 12 papers. The topics discussed include: ring pipelined algorithm for the algebraic path problem on the CELL broadband engine;performance evaluation of optimized implementations of finite diffe...
ISBN:
(纸本)9780769542768
The proceedings contain 12 papers. The topics discussed include: ring pipelined algorithm for the algebraic path problem on the CELL broadband engine;performance evaluation of optimized implementations of finite difference method for wave propagation problems on GPU architecture;exploring data streaming to improve 3D FFT implementation on multiple GPUs;effective dynamic scheduling on heterogeneous multi/manycore desktop platforms;towards a power-aware application level scheduler for a multithreaded runtime environment;I/O performance evaluation on multicore clusters with atmospheric model environment;OpenMP-based parallelalgorithms for solving Kronecker descriptors;parallel implementations of an immune network model using POSIX threads and OpenMP;and parallel implementation of a computational model of the HIS using OpenMP and MPI.
Numerical analysis of Markovian models is relevant for performance evaluation and probabilistic analysis of systems' behavior from several fields such as Bioinformatics, Economics, and Engineering. These models ca...
详细信息
The proceedings contain 14 papers. The topics discussed include: parallel simulation of bevel gear cutting processes with OpenMP Tasks;evaluation of multicore processors for embedded systems by parallel benchmark prog...
ISBN:
(纸本)3642022847
The proceedings contain 14 papers. The topics discussed include: parallel simulation of bevel gear cutting processes with OpenMP Tasks;evaluation of multicore processors for embedded systems by parallel benchmark program using openMP;extending automatic parallelization to optimize high-level abstractions for multicore;scalability evaluation of barrier algorithms for OpenMP;evaluating OpenMP 3.0 run time systems on unbalanced task graphs;dynamic task and data placement over numa architectures: an OpenMP runtime perspective;providing observability for OpenMP 3.0 applications;performance profiling for OpenMP tasks;a proposal to extend the OpenMP tasking model for heterogeneous architectures;identifying inter-task communication in shared memory programming models;and a proposal to extend the OpenMP tasking model for heterogeneous architectures.
The demands for high quality, real-time performance and multi-format video support in consumer multimedia products are ever increasing. In particular, the future multimedia systems require efficient video coding algor...
详细信息
ISBN:
(纸本)9783642031373
The demands for high quality, real-time performance and multi-format video support in consumer multimedia products are ever increasing. In particular, the future multimedia systems require efficient video coding algorithms and corresponding adaptive high-performance computational platforms. The H.264/AVC video coding algorithms provide high enough compression efficiency to be utilized in these systems, and multimedia processors are able to provide the required adaptability, but the algorithms complexity demands for more efficient computing platforms. Heterogeneous (re-)configurable systems composed of multimedia processors and hardware accelerators constitute the main part of such platforms. In this paper, we survey the hardware accelerator architectures for Context-based Adaptive Binary Arithmetic Coding (CABAC) of Main and High profiles of H.264/AVC. The purpose of the survey is to deliver a critical insight in the proposed solutions, and this way facilitate further research on accelerator architectures, architecture development methods and supporting EDA tools. The architectures are analyzed, classified and compared based on the core hardware acceleration concepts, algorithmic characteristics, video resolution support and performance parameters, and some promising design directions are discussed. The comparative analysis shows that the parallel pipeline accelerator architecture seems to be the most promising.
The GCA (Global Cellular Automata) model consists of a collection of cells which change their states synchronously depending on the states of their neighbors like in the classical CA (Cellular Automata) model. In diff...
详细信息
ISBN:
(纸本)9783642031373
The GCA (Global Cellular Automata) model consists of a collection of cells which change their states synchronously depending on the states of their neighbors like in the classical CA (Cellular Automata) model. In differentiation to the CA model the neighbors are not fixed and local, they are variable and global. The GCA model is applicable to a wide range of parallelalgorithms. In this paper a general purpose multiprocessor architecture for the massively parallel GCA model is presented. In contrast to a special purpose implementation of a GCA algorithm the multiprocessor system allows the implementation in a flexible way through programming. The architecture mainly consists of a set of processors (Nios II) and a network. The Nios II features a general-purpose RISC CPU architecture designed to address a wide range of applications. The network is a well-known omega network. Only read-accesses through the network are necessary in the GCA model leading to a simplified structure. A system with up to 32 processors was implemented as a prototype on an FPCA. The analysis and implementation results have shown that the performance of the system scales with the number of processors.
In this paper we consider parallelalgorithms to partition an array with respect to a pivot. We focus on implementations for current widely available multi-core architectures. After reviewing existing algorithms, we p...
详细信息
ISBN:
(纸本)9783540685487
In this paper we consider parallelalgorithms to partition an array with respect to a pivot. We focus on implementations for current widely available multi-core architectures. After reviewing existing algorithms, we propose a modification to obtain the minimal number of comparisons. We have implemented these algorithms and drawn an experimental comparison.
暂无评论