Heterogeneous multi-core architectures have become an integral component of high performance systems and high performance scientific computing (HPC). the use of these systems has been vital for research applications b...
详细信息
this paper presents a streaming processor specifically designed for adaptronic and biomedical engineering applications. the main characteristics of the streaming processor are the flexibility to implement floating-poi...
详细信息
ISBN:
(纸本)9781467348904;9781467348911
this paper presents a streaming processor specifically designed for adaptronic and biomedical engineering applications. the main characteristics of the streaming processor are the flexibility to implement floating-point-based scientific computations commonly performed in the digital signal processing application. the floating-point operators are connected to dual-port memories through separated 3 operand-buses and 2 resultant-buses. Synthesized with 130-nm technology, the Spectron can be clocked at 480 MHz. the processor can perform 4 parallel streaming/pipeline floating-point operations using its FPMAC and CORDIC cores, resulting in a performance of about 4 x 485 = 1.94 GFlops (Giga Floating-point operation per second), which is suitable for high performance image processing in biomedical electronic engineering applications.
the proceedings contain 79 papers. the topics discussed include: secure and energy-efficient data aggregation with malicious aggregator identification in wireless sensor networks;dynamic data race detection for correl...
ISBN:
(纸本)9783642246685
the proceedings contain 79 papers. the topics discussed include: secure and energy-efficient data aggregation with malicious aggregator identification in wireless sensor networks;dynamic data race detection for correlated variables;distributed mining of constrained frequent sets from uncertain data;set-to-set disjoint-paths routing in recursive dual-net;redflag: a framework for analysis of kernel-level concurrency;redflag: a framework for analysis of kernel-level concurrency;fault-tolerant routing based on approximate directed routable probabilities for hypercubes;adaptive resource remapping through live migration of virtual machines;anonymous communication over invisible mix rings;lightweight transactional arrays for read-dominated workloads;cascading multi-way bounded wait timer management for moody and autonomous systems;and world-wide distributed multiple replications in parallel for quantitative sequential simulation.
the proceedings contain 79 papers. the topics discussed include: secure and energy-efficient data aggregation with malicious aggregator identification in wireless sensor networks;dynamic data race detection for correl...
ISBN:
(纸本)9783642246494
the proceedings contain 79 papers. the topics discussed include: secure and energy-efficient data aggregation with malicious aggregator identification in wireless sensor networks;dynamic data race detection for correlated variables;distributed mining of constrained frequent sets from uncertain data;set-to-set disjoint-paths routing in recursive dual-net;redflag: a framework for analysis of kernel-level concurrency;redflag: a framework for analysis of kernel-level concurrency;fault-tolerant routing based on approximate directed routable probabilities for hypercubes;adaptive resource remapping through live migration of virtual machines;anonymous communication over invisible mix rings;lightweight transactional arrays for read-dominated workloads;cascading multi-way bounded wait timer management for moody and autonomous systems;and world-wide distributed multiple replications in parallel for quantitative sequential simulation.
We consider the deconvolution of 3D Fluorescence Microscopy RGB images, describing the benefits arising from facing medical imaging problems on modern graphics processing units (GPUs), that are non expensive parallel ...
详细信息
ISBN:
(数字)9783642314643
ISBN:
(纸本)9783642314636;9783642314643
We consider the deconvolution of 3D Fluorescence Microscopy RGB images, describing the benefits arising from facing medical imaging problems on modern graphics processing units (GPUs), that are non expensive parallelprocessing devices available on many up-to-date personal computers. We found that execution time of CUDA version is about 2 orders of magnitude less than the one of sequential algorithm. Anyway, the experiments lead some reflections upon the best setting for the CUDA-based algorithm. that is, we notice the need to model the GPUs architectures and their characteristics to better describe the performance of GPU-algorithms and what we can expect of them.
Matrix multiplication is an essential building block of many linear algebra operations and applications. this paper presents parallelalgorithms with shared A or B matrix in the memory for the special massively multit...
详细信息
A pareto optimal temporal partition methodology was developed for splitting and mapping large data flow graph (DFG) to the coarse-grained reconfigurable architecture (CGRA). A multi-objective genetic algorithm (MOGA) ...
详细信息
ISBN:
(纸本)9780769546766
A pareto optimal temporal partition methodology was developed for splitting and mapping large data flow graph (DFG) to the coarse-grained reconfigurable architecture (CGRA). A multi-objective genetic algorithm (MOGA) derived from the SPEA-II algorithm was first time introduced to the temporal partition realm for simultaneously optimizing multiple mutually exclusive objectives. Experiments carried out on the ESL (electronic system level) model of the REmus processor show that MOGA based temporal partition algorithms is superior than heuristic algorithm by reducing execution delay 5%-28%, communication overheads 16%-37% without degradation the resource efficiency. Furthermore, comparisons with weight-based multi-objective simulated annealing algorithm show the pareto optimal algorithm can achieve slight better latency objective (3%), while dramatically decrease the communication overheads by at most 21% and the resource efficiency doesn't get worse.
this article introduces a C++ template library dedicated at vectorizing algorithms for different target architectures: Multi-Target parallel Skeleton (MTPS). Skeletons describing the data structures and algorithms are...
详细信息
Many phenomena and artifacts such as road networks, social networks and the web can be modeled as large graphs and analyzed using graph algorithms. However, given the size of the underlying graphs, efficient implement...
详细信息
ISBN:
(纸本)9781450307475
Many phenomena and artifacts such as road networks, social networks and the web can be modeled as large graphs and analyzed using graph algorithms. However, given the size of the underlying graphs, efficient implementation of basic operations such as connected component analysis, approximate shortest paths, and linkbased ranking (***) becomes challenging. this paper presents an empirical study of computations on such large graphs in three well-studied platform models, viz., a relational model, a data-parallel model, and a special-purpose in-memory model. We choose a prototypical member of each platform model and analyze the computational efficiencies and requirements for five basic graph operations used in the analysis of real-world graphs viz., PageRank, SALSA, Strongly Connected Components (SCC), Weakly Connected Components (WCC), and Approximate Shortest Paths (ASP). Further, we characterize each platform in terms of these computations using model-specific implementations of these algorithms on a large web graph. Our experiments show that there is no single platform that performs best across different classes of operations on large graphs. While relational databases are powerful and flexible tools that support a wide variety of computations, there are computations that benefit from using special-purpose storage systems and others that can exploit data-parallel platforms. Copyright 2012 ACM.
Erasure codes can improve the availability of distributed storage in comparison with replication systems. In this paper, we focus on investigating how to map systematically the Reed-Solomon and Cauchy Reed-Solomon era...
详细信息
ISBN:
(纸本)9783642281440;9783642281457
Erasure codes can improve the availability of distributed storage in comparison with replication systems. In this paper, we focus on investigating how to map systematically the Reed-Solomon and Cauchy Reed-Solomon erasure codes onto the Cell/B.E. and GPU multicore architecture. A method for the systematic mapping of computation kernels of encoding/decoding algorithms onto the Cell/B.E. architecture is proposed. this method takes into account properties of the architecture on all three levels of its parallelprocessing hierarchy. the performance results are shown to be very promising. the possibility of using GPUs is studied as well, based on the Cauchy version of Reed-Solomon codes.
暂无评论