Caches are known to consume a large part of total microprocessor power. Traditionally, voltage scaling has been used to reduce both dynamic and leakage power in caches. However, aggressive voltage reduction causes pro...
详细信息
Code positioning is a well-known compiler optimization aiming at the improvement of the instruction cache behavior. A contiguous mapping of code fragments in memory avoids overlapping of cache sets and thus decreases ...
详细信息
ISBN:
(纸本)9781450307130
Code positioning is a well-known compiler optimization aiming at the improvement of the instruction cache behavior. A contiguous mapping of code fragments in memory avoids overlapping of cache sets and thus decreases the number of cache conflict misses. We present a novel cache-aware code positioning optimization driven by worst-case execution time (WCET) information. For this purpose, we introduce a formal cache model based on a conflict graph which is able to capture a broad class of cache architectures. This cache model is combined with a formal WCET timing model, resulting in a cache conflict graph weighted with WCET data. This conflict graph is then exploited by heuristics for code positioning of both basic blocks and entire functions. Code positioning is able to decrease the accumulated cache misses for a total of 18 real-life benchmarks by 15.5% on average for an automotive processor featuring a 2-way set-associative cache. These cache miss reductions translate to average WCET reductions by 6.1%. For direct-mapped caches, even larger savings of 18.8% (cache misses) and 9.0% (WCET) were achieved.
The proceedings contain 42 papers. The topics discussed include: an accurate and efficient simulation-based analysis for worst case interruption delay;automatic performance model construction for the fast software exp...
详细信息
ISBN:
(纸本)1595935436
The proceedings contain 42 papers. The topics discussed include: an accurate and efficient simulation-based analysis for worst case interruption delay;automatic performance model construction for the fast software exploration of new hardware designs;supporting precise garbage collection in Java Bytecode-to-C ahead-of-time compiler for embeddedsystems;adapting compilation techniques to enhance the packing of instructions into registers;a network agent for diagnosis and analysis of real-time Ethernet networks;memory optimization by counting points in integer transformations of parametric polytopes;incremental elaboration for run-time reconfigurable hardware designs;adaptive and flexible dictionary code compression for embedded applications;automated compile-time and run-time techniques to increase usable memory in MMU-less embeddedsystems;and scalable subgraph mapping for acyclic computation accelerators.
In the past decades, embedded system designers moved from simple, predictable system designs towards complex systems equipped with caches, branch prediction units and speculative execution. This step was necessary in ...
详细信息
ISBN:
(纸本)9781450307130
In the past decades, embedded system designers moved from simple, predictable system designs towards complex systems equipped with caches, branch prediction units and speculative execution. This step was necessary in order to fulfill increasing requirements on computational power. Static analysis techniques considering such speculative units had to be developed to allow the estimation of an upper bound of the execution time of a program. This bound is called worst-case execution time (WCET). Its knowledge is crucial to verify whether hard real-time systems satisfy their timing constraints, and the WCET is a key parameter for the design of embeddedsystems. In this paper, we propose a WCET-driven branch prediction aware optimization which reorders basic blocks of a function in order to reduce the amount of jump instructions and mispredicted branches. We employed a genetic algorithm which rearranges basic blocks in order to decrease the WCET of a program. This enables a first estimation of the possible optimization potential at the cost of high optimization runtimes. To avoid time consuming repetitive WCET analyses, we developed a new algorithm employing integer-linear programming (ILP). The ILP models the worst-case execution path (WCEP) of a program and takes branch prediction effects into account. This algorithm enables short optimization runtimes at slightly decreased optimization results. In a case study, the genetic algorithm is able to reduce the benchmarks' WCET by up to 24.7% whereas our ILP-based approach is able to decrease the WCET by up to 20.0%.
The proceedings contain 31 papers. The topics discussed include: sustaining Moore's law in embedded computing through probabilistic and approximate design: retrospects and prospects;complete nanowire crossbar fram...
ISBN:
(纸本)9781605586267
The proceedings contain 31 papers. The topics discussed include: sustaining Moore's law in embedded computing through probabilistic and approximate design: retrospects and prospects;complete nanowire crossbar framework optimized for the multi-spacer patterning technique;exploiting residue number system for power-efficient digital signal processing in embedded processors;fast enumeration of maximal valid subgraphs for custom-instruction identification;hybrid multithreading for VLIW processors;spatial complexity of reversibly computable DAG;mapping stream programs onto heterogeneous multiprocessor systems;optimal loop parallelization for maximizing iteration-level parallelism;slicing based code parallelization for minimizing inter-processor communication;fine-grain performance scaling of soft vector processors;fine-grained parallel application specific computing for RNA secondary structure prediction using SCFGs on FPGA;and streaming FFT on REDEFINE-v2: an application-architecture design space exploration.
In this tutorial we discuss the impact of multicore architectures for embedded devices at different levels, ranging from heterogeneous/homogeneous ISAs to the organization and software development.
ISBN:
(纸本)9781605589039
In this tutorial we discuss the impact of multicore architectures for embedded devices at different levels, ranging from heterogeneous/homogeneous ISAs to the organization and software development.
Caches are known to consume a large part of total microprocessor power. Traditionally, voltage scaling has been used to reduce both dynamic and leakage power in caches. However, aggressive voltage reduction causes pro...
详细信息
ISBN:
(纸本)9781605589039
Caches are known to consume a large part of total microprocessor power. Traditionally, voltage scaling has been used to reduce both dynamic and leakage power in caches. However, aggressive voltage reduction causes process-variation-induced failures in cache SRAM arrays, which compromise cache reliability. We present Multi-Copy Cache (MC2), a new cache architecture that achieves significant reduction in energy consumption through aggressive voltage scaling, while maintaining high error resilience (reliability) by exploiting multiple copies of each data item in the cache. Unlike many previous approaches, MC2 does not require any error map characterization and therefore is responsive to changing operating conditions (e.g., Vdd-noise, temperature and leakage) of the cache. MC2 also incurs significantly lower overheads compared to other ECC-based caches. Our experimental results on embedded benchmarks demonstrate that MC2 achieves up to 60% reduction in energy and energy-delay product (EDP) with only 3.5% reduction in IPC and no appreciable area overhead.
In 2010, a wave of consolidation swept over the Electronic System Level (ESL) design industry. It brought ESL providers together with mainstream EDA houses and created opportunities for new ESL ventures. This paper co...
详细信息
ISBN:
(纸本)9781605589053
In 2010, a wave of consolidation swept over the Electronic System Level (ESL) design industry. It brought ESL providers together with mainstream EDA houses and created opportunities for new ESL ventures. This paper contains short summaries of presentations in a special session focusing on the future of ESL. The session has two goals: the first is to present the state of the art in ESL tools and practice and, second, share a vision of the technical challenges that the next generation of ESL companies should address. The session includes a mix of perspectives from both ESL solution vendors and end-users and touches all all four ESL use cases: software virtual platforms, performance analysis, high level synthesis and verification.
We present a fine-grain dynamic instruction placement algorithm for small L0 scratch-pad memories (spms), whose unit of transfer can be an individual instruction. Our algorithm captures a large fraction of instruction...
详细信息
ISBN:
(纸本)9781605589039
We present a fine-grain dynamic instruction placement algorithm for small L0 scratch-pad memories (spms), whose unit of transfer can be an individual instruction. Our algorithm captures a large fraction of instruction reuse missed by coarse-grain placement algorithms whose unit of transfer is restricted to loops or functions within the capacity of spms. Evaluation of L0 spms with our fine-grain algorithm in 17 applications shows that the energy consumed by instruction storage hierarchy is reduced by 38% and 31% compared to that of L0 instruction caches and L0 spms with an ideal coarse-grain algorithm, respectively.
暂无评论