the proceedings contain 23 papers from the 13thinternationalconference on parallelarchitectures and compilationtechniques (PACT 2004). the topics discussed include: code generation in the polyhedral model is easie...
详细信息
the proceedings contain 23 papers from the 13thinternationalconference on parallelarchitectures and compilationtechniques (PACT 2004). the topics discussed include: code generation in the polyhedral model is easier than you think;adding limited reconfigurability to superscalar processors;architectural support for enhanced SMT job scheduling;the energy impact of aggressive loop fusion;scalable high performance cross-module inlining and fast paths in concurrent programs.
the proceedings contain 32 papers. the topics discussed include: variational path profiling;extended whole program paths;instruction based memory distance analysis and its application;maximizing CMP throughput with me...
详细信息
ISBN:
(纸本)076952429X
the proceedings contain 32 papers. the topics discussed include: variational path profiling;extended whole program paths;instruction based memory distance analysis and its application;maximizing CMP throughput with mediocre cores;an event-driven multithreaded dynamic optimization framework for helper threading on multi-core processors;compiler directed early register release;automatic selection of compiler options using non-parametric inferential statistics;efficient techniques for advanced data dependence analysis;optimizing compiler for the CELL processor;a distributed control patharchitecture for VLIW processors;performance analysis of system overheads in TCP/IP workloads;a simple divide-and-conquer approach for neural-class branch prediction;trace cache sampling filter;communication optimizations for fine-grained UPC applications;memory state compressor for giga-scale checkpoint/restore;and multiple page size modeling and optimization.
Hardware development critically depends on cycle-accurate RTL simulation. However, as chip complexity increases, conventional single-threaded simulation becomes impractical due to stagnant single-core performance. PAR...
详细信息
ISBN:
(纸本)9798400710797
Hardware development critically depends on cycle-accurate RTL simulation. However, as chip complexity increases, conventional single-threaded simulation becomes impractical due to stagnant single-core performance. PARENDI is an RTL simulator that addresses this challenge by exploiting the abundant fine-grained parallelism inherent in RTL simulation and efficiently mapping it onto the massively parallel Graphcore IPU (Intelligence Processing Unit) architecture. PARENDI scales up to 5888 cores on 4 Graphcore IPU sockets. It allows us to run large RTL designs up to 4x faster than the most powerful state-of-the-art x64 multicore systems. To achieve this performance, we developed new partitioning and compilationtechniques and carefully quantified the synchronization, communication, and computation costs of parallel RTL simulation: the paper comprehensively analyzes these factors and details the strategies that PARENDI uses to optimize them.
暂无评论