Tuning cache architectures in platforms for embedded applications can dramatically reduce energy consumption. This paper presents an optimization mechanism based on the Design of Experiments (DoE) for adjusting two-le...
详细信息
ISBN:
(纸本)9781457706530
Tuning cache architectures in platforms for embedded applications can dramatically reduce energy consumption. This paper presents an optimization mechanism based on the Design of Experiments (DoE) for adjusting two-level cache memory hierarchies in order to reduce the energy consumption of embedded applications. DoE is a technique used to plan experiments and in this work it was adapted for the architecture exploration problem. Preliminary results for 6 applications from the Mibench benchmark suite show an average reduction of about 6% in the energy consumption for data caches, and it has shown itself to be simpler and with lower computational cost, when compared to existing heuristics.
The aim of the paper is to introduce techniques in order to tune sequential in-core sorting algorithms in the frameworks of two applications. The first application is parallel sorting when the processor speeds are not...
详细信息
The aim of the paper is to introduce techniques in order to tune sequential in-core sorting algorithms in the frameworks of two applications. The first application is parallel sorting when the processor speeds are not identical in the parallel system. The second application is the Zeta-Data Project [M. Koskas, A hierarchical database management algorithm, in: Annales 67 du Lamsade, vol. 2, 2004, pp. 277-317. [9]] whose aim is to develop novel algorithms for databases issues. About 50% of the work done in building indexes is devoted to sorting sets of integers. We develop and compare algorithms built to sort with equal keys. Algorithms are variations of the 3Way-Quicksort of Sedgewick. In order to observe performances and to fully exploit functional units in processors, and also in order to optimize the use of the memory system and the different functional units, we use hardware performance counters that are available on most modem microprocessors. We also develop analytical results for one of our algorithms and compare expected results with the measures. For the two applications, we show, through fine experiments on an Athlon processor (a three-way superscalar x86 processor), that L1 data cache misses are not the central problem, but a subtle proportion of independent retired instructions should be advised to get performance for in-core sorting. (C) 2006 Elsevier B.V. All rights reserved.
The efficient utilization of a two-level directly executable memory system is investigated. After defining the time and space product resulting from static allocation of the most often referenced pages, from paging, a...
详细信息
暂无评论