Adaptive mesh refinement and iterative traversals of unknowns on such adaptive grids are fundamental building blocks for PDE solvers. We discuss a respective integrated approach for grid refinement and processing of u...
详细信息
ISBN:
(纸本)9783642281440;9783642281457
Adaptive mesh refinement and iterative traversals of unknowns on such adaptive grids are fundamental building blocks for PDE solvers. We discuss a respective integrated approach for grid refinement and processing of unknowns that is based on recursively structured triangular grids and space-filling element orders. In earlier work, the approach was demonstrated to be highly memory-and cache-efficient. In this paper, we analyse the cache efficiency of the traversal algorithms using the I/O model. Further, we discuss how the nested recursive traversal algorithms can be efficiently implemented. For that purpose, we compare the memory throughput of respective implementations with simple stream benchmarks, and study the dependence of memory throughput and floating point performance from the computational load per element.
A wide variety of optimization problems requires the combination of Bioinspired and parallel Computing to address the complexity needed to get optimal solutions in reduced times. the multicore era allows the researche...
详细信息
Despite the processor industry having more or less successfully invested already 10 years to develop better and increasingly parallel multicore architectures, both software community and educational institutions appea...
ISBN:
(纸本)9783642297373;9783642297366
Despite the processor industry having more or less successfully invested already 10 years to develop better and increasingly parallel multicore architectures, both software community and educational institutions appear still to rely on the sequential computing paradigm as the primary mechanism for expressing the (very often originally inherently parallel) functionality, especially in the arena of general purpose computing. In that respect, parallel programming has remained a hobby of highly educated specialists and is still too often being considered as too difficult for the average programmer. Excuses are various: lack of education, lack of suitable easy-to-use tools, too architecture-dependent mechanisms, huge existing base of sequential legacy code, steep learning curves, and inefficient architectures. It is important for the scientific community to analyze the situation and understand whether the problem is with hardware architectures, software development tools and practices, or both. Although we would be tempted to answer this question (and actually try to do so elsewhere), there is strong need for wider academic discussion on these topics and presentation of research results in scientific workshops and conferences.
the proceedings contain 79 papers. the topics discussed include: secure and energy-efficient data aggregation with malicious aggregator identification in wireless sensor networks;dynamic data race detection for correl...
ISBN:
(纸本)9783642246685
the proceedings contain 79 papers. the topics discussed include: secure and energy-efficient data aggregation with malicious aggregator identification in wireless sensor networks;dynamic data race detection for correlated variables;distributed mining of constrained frequent sets from uncertain data;set-to-set disjoint-paths routing in recursive dual-net;redflag: a framework for analysis of kernel-level concurrency;redflag: a framework for analysis of kernel-level concurrency;fault-tolerant routing based on approximate directed routable probabilities for hypercubes;adaptive resource remapping through live migration of virtual machines;anonymous communication over invisible mix rings;lightweight transactional arrays for read-dominated workloads;cascading multi-way bounded wait timer management for moody and autonomous systems;and world-wide distributed multiple replications in parallel for quantitative sequential simulation.
While simulated-annealing is currently the most widely used method for performing FPGA placement, it does not scale to very large designs. Modern many-core architectures (including GPUs) offer a promising alternative ...
详细信息
ISBN:
(纸本)9781467308595
While simulated-annealing is currently the most widely used method for performing FPGA placement, it does not scale to very large designs. Modern many-core architectures (including GPUs) offer a promising alternative to traditional multi-core processors for improving runtime performance. In this work, we propose a GPU-accelerated simulated-annealing variant for FPGA placement. Our approach uses the Star+ wire-length model along with a novel method of efficiently generating large sets of independent swap operations, providing a high level of parallelism. Speedups from 5.4-89.2x (median 20.2x) were achieved over a single-core CPU-only implementation.
this paper investigates techniques to speed up HSSI bit-error rate (BER) and jitter testing. the proposed oversampling-based transmitter test scheme accelerates transmitter jitter and eye diagram testing by means of a...
详细信息
ISBN:
(纸本)9781467308595
this paper investigates techniques to speed up HSSI bit-error rate (BER) and jitter testing. the proposed oversampling-based transmitter test scheme accelerates transmitter jitter and eye diagram testing by means of a multiphase bit-error rate test circuit (BERT). parallel BERT elements are able to digitize the input signal jitter behavior in a multiphase manner. We accurately extract the transmitter jitter in time domain and finish the whole transmitter test within tens of milliseconds, exceeding the current norm of 100 ms.
the proceedings contain 79 papers. the topics discussed include: secure and energy-efficient data aggregation with malicious aggregator identification in wireless sensor networks;dynamic data race detection for correl...
ISBN:
(纸本)9783642246494
the proceedings contain 79 papers. the topics discussed include: secure and energy-efficient data aggregation with malicious aggregator identification in wireless sensor networks;dynamic data race detection for correlated variables;distributed mining of constrained frequent sets from uncertain data;set-to-set disjoint-paths routing in recursive dual-net;redflag: a framework for analysis of kernel-level concurrency;redflag: a framework for analysis of kernel-level concurrency;fault-tolerant routing based on approximate directed routable probabilities for hypercubes;adaptive resource remapping through live migration of virtual machines;anonymous communication over invisible mix rings;lightweight transactional arrays for read-dominated workloads;cascading multi-way bounded wait timer management for moody and autonomous systems;and world-wide distributed multiple replications in parallel for quantitative sequential simulation.
We discuss parallel and distributed algorithms for large-scale matrix completion on problems with millions of rows, millions of columns, and billions of revealed entries. We focus on in-memory algorithmsthat run on a...
详细信息
ISBN:
(纸本)9781467346498;9780769549057
We discuss parallel and distributed algorithms for large-scale matrix completion on problems with millions of rows, millions of columns, and billions of revealed entries. We focus on in-memory algorithmsthat run on a small cluster of commodity nodes;even very large problems can be handled effectively in such a setup. Our DALS, ASGD, and DSGD++ algorithms are novel variants of the popular alternating least squares and stochastic gradient descent algorithms;they exploit thread-level parallelism, in-memory processing, and asynchronous communication. We provide some guidance on the asymptotic performance of each algorithm and investigate the performance of both our algorithms and previously proposed MapReduce algorithms in large-scale experiments. We found that DSGD++ outperforms competing methods in terms of overall runtime, memory consumption, and scalability. Using DSGD++, we can factor a matrix with10B entries on 16 compute nodes in around 40 minutes.
Polynomial resultants are of fundamental importance in symbolic computations, especially in the field of quantifier elimination. In this paper we show how to compute the resultant res(y) (f, g) of two bivariate polyno...
详细信息
ISBN:
(纸本)9783642281440
Polynomial resultants are of fundamental importance in symbolic computations, especially in the field of quantifier elimination. In this paper we show how to compute the resultant res(y) (f, g) of two bivariate polynomials f, g is an element of Z[x, y] on a CUDA-capable graphics processing unit (GPU). We achieve parallelization by mapping the bivariate integer resultant onto a sufficiently large number of univariate resultants over finite fields, which are then lifted back to the original domain. We point out, that the commonly proposed special treatment for so called unlucky homomorphisms is unnecessary and how this simplifies the parallel resultant algorithm. All steps of the algorithm are executed entirely on the GPU. Data transfer is only used for the input polynomials and the resultant. Experimental results show the considerable speedup of our implementation compared to host-based algorithms.
the paper aims to analyze the concept of viability in fractal enterprises and IS architectures from a holistic and viable systems perspective. the methodology is based on the conceptual framework of the Viable Systems...
详细信息
ISBN:
(纸本)9783642292309
the paper aims to analyze the concept of viability in fractal enterprises and IS architectures from a holistic and viable systems perspective. the methodology is based on the conceptual framework of the Viable Systems Approach (VSA) whereby the monitoring of fractal enterprise viability is put in place thanks to the "abilities" of government to manage the operative structure efficiently and to govern the system strategically. In particular, by means of systems viability monitoring, fractal enterprises, are governed in terms of structure (i.e. component and relational consonance) and system (interaction and performance resonance).
暂无评论