The proceedings contain 73 papers. The special focus in this conference is on Theoretical Aspects of Computer Science. The topics include: Causal and distributed semantics for concurrent processes;editorial note;alter...
ISBN:
(纸本)9783540565031
The proceedings contain 73 papers. The special focus in this conference is on Theoretical Aspects of Computer Science. The topics include: Causal and distributed semantics for concurrent processes;editorial note;alternation for two-way machines with sublogarithmic space;separating the lower levels of the sublogarithmic space hierarchy;locating P/poly optimally in the extended low hierarchy;measure, stochasticity, and the density of hard languages;halting problem of one binary horn clause is undecidable;decidability and undecidability results for duration calculus;the complexity of logic-based abduction;treewidth of chordal bipartite graphs;on paths in networks with valves;scheduling interval ordered tasks in parallel;an O(vn)-worst-case-time solution to the granularity problem;the synthesis problem of petri nets;general refinement and recursion operators for the petri box calculus;on fairness in distributed automated deduction;divide-and-conquer algorithms on the hypercube;a first-order isomorphism theorem;splittings, robustness and structure of complete sets;defying upward and downward separation;counting, selecting, and sorting by query-bounded machines;enrichment by reduction;counting overlap-free binary words;the limit set of recognizable substitution systems;partially commutative lyndon words;parallelarchitectures: design and efficient use;weighted closest pairs;rectilinear path queries in a simple rectilinear polygon;multi-list ranking: complexity and applications;a decomposition theorem for probabilistic transition systems;local automata and completion;efficient compression of wavelet coefficients for smooth and fractal-like data and on the equivalence of two-way pushdown automata and counter machines over bounded languages.
As we increase the number of cores on a processor die, the on-chip cache hierarchies that support these cores are getting larger, deeper, and more complex. As a result, non-uniform memory access effects are now preval...
详细信息
ISBN:
(纸本)9781450315722
As we increase the number of cores on a processor die, the on-chip cache hierarchies that support these cores are getting larger, deeper, and more complex. As a result, non-uniform memory access effects are now prevalent even on a single chip. To reduce execution time and energy consumption, data access locality should be exploited. This is especially important for task-based programming systems, where a scheduler decides when and where on the chip the code segments, i.e., tasks, should execute. Capturing locality for structured task parallelism has been done effectively, but the more difficult case, unstructured parallelism, remains largely unsolved-little quantitative analysis exists to demonstrate the potential of locality-aware scheduling, and to guide future scheduler implementations in the most fruitful direction. This paper quantifies the potential of locality-aware scheduling for unstructured parallelism on three different many-core processors. Our simulation results of 32-core systems show that locality-aware scheduling can bring up to 2.39x speedup over a randomized schedule, and 2.05x speedup over a state-of-the-art baseline scheduling scheme. At the same time, a locality-aware schedule reduces average energy consumption by 55% and 47%, relative to the random and the baseline schedule, respectively. In addition, our 1024-core simulation results project that these benefits will only increase: Compared to 32-core executions, we see up to 1.83x additional locality benefits. To capture such potentials in a practical setting, we also perform a detailed scheduler design space exploration to quantify the impact of different scheduling decisions. We also highlight the importance of locality-aware stealing, and demonstrate that a stealing scheme can exploit significant locality while performing load balancing. Over randomized stealing, our proposed scheme shows up to 2.0x speedup for stolen tasks.
algorithms for computational imaging and computer vision are rapidly evolving, and hardware must follow suit: the next generation of image signal processors (ISPs) must be “programmable” to support new algorithms cr...
详细信息
ISBN:
(纸本)9781509035090
algorithms for computational imaging and computer vision are rapidly evolving, and hardware must follow suit: the next generation of image signal processors (ISPs) must be “programmable” to support new algorithms created with high-level frameworks. In this work, we compare flexible ISP architectures, using applications written in the Darkroom image processing language. We target two fundamental architecture classes: programmable in time, as represented by SIMD, and programmable in space, as typified by coarse grain reconfigurable array architectures (CGRA). We consider several optimizations on these two base architectures, such as register file partitioning for SIMD, bus based routing and pipelined wires for CGRA, and line buffer variations. After these optimizations on average, CGRA provides 1.6x better energy efficiency and 1.4x better compute density versus a SIMD solution, and 1.4x the energy efficiency and 3.1x the compute density of an FPGA. However the cost of providing general programmability is still high: compared to an ASIC, CGRA has 6x worse energy and area efficiency, and this ratio would be roughly 10x if memory dominated applications were excluded.
The proceedings contain 65 papers. The topics discussed include: FireSim: FPGA-accelerated cycle-exact scale-out system simulation in the public cloud;PROMISE: an end-to-end design of a programmable mixed-signal accel...
ISBN:
(纸本)9781538659847
The proceedings contain 65 papers. The topics discussed include: FireSim: FPGA-accelerated cycle-exact scale-out system simulation in the public cloud;PROMISE: an end-to-end design of a programmable mixed-signal accelerator for machine-learning algorithms;computation reuse in DNNs by exploiting input similarity;criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies;constructing a weak memory model;a hardware accelerator for tracing garbage collection;get out of the valley: power-efficient address mapping for GPUs;scheduling page table walks for irregular GPU applications;a case for richer cross-layer abstractions: bridging the semantic gap with expressive memory;non-speculative store coalescing in total store order;ProtoGen: automatically generating directory cache coherence protocols from atomic specifications;Spandex: a flexible interface for efficient heterogeneous coherence;Flexon: a flexible digital neuron for efficient spiking neural network simulations;space-time algebra: a model for neocortical computation;density tradeoffs of non-volatile memory as a replacement for SRAM based last level cache;ACCORD: enabling associativity for gigascale DRAM caches by coordinating way-install and way-prediction;scaling datacenter accelerators with compute-reuse architectures;FLIN: enabling fairness and enhancing performance in modern NVMe solid state drives;and lazy persistency: a high-performing and write-efficient software persistency technique.
The following topics were dealt with: multiple processor architectures; networks and grids; non-numerical algorithms including sorting and graph algorithms; computation models; numerical parallelalgorithms; schedulin...
The following topics were dealt with: multiple processor architectures; networks and grids; non-numerical algorithms including sorting and graph algorithms; computation models; numerical parallelalgorithms; scheduling and performance evaluation including compiling, thread migration and meta computing; and high performance computing applications including computational chemistry, command and control, and finance.
Motivated from parallel network mapping, we provide efficient query complexity and round complexity bounds for graph reconstruction using distance queries, including a bound that improves a previous sequential complex...
详细信息
ISBN:
(纸本)9781450380706
Motivated from parallel network mapping, we provide efficient query complexity and round complexity bounds for graph reconstruction using distance queries, including a bound that improves a previous sequential complexity bound. Our methods use a high-probability parametric parallelization of a graph clustering technique of Thorup and Zwick, which may be of independent interest.
暂无评论