We present a programming methodology and runtime performance case study comparing the declarative data flow coordination language S-Net with Intel's Concurrent Collections (CnC). As a coordination language S-Net a...
详细信息
ISBN:
(纸本)9781479941155
We present a programming methodology and runtime performance case study comparing the declarative data flow coordination language S-Net with Intel's Concurrent Collections (CnC). As a coordination language S-Net achieves a near-complete separation of concerns between sequential software components implemented in a separate algorithmic language and their parallel orchestration in an asynchronous data flow streaming network. We investigate the merits of S-Net and CnC with the help of a relevant and non-trivial linear algebra problem: tiled Cholesky decomposition. We describe two alternative S-Net implementations of tiled Cholesky factorization and compare them with two CnC implementations, one with explicit performance tuning and one without, that have previously been used to illustrate Intel CnC. Our experiments on a 48-core machine demonstrate that S-Net manages to outperform CnC on this problem.
Following the successful WCET Tool Challenges in 2006 and 2008, the third event in this series was organized in 2011, again with support from the ARTIST DESIGN Network of Excellence. Following the practice established...
详细信息
ISBN:
(纸本)9781632663153
Following the successful WCET Tool Challenges in 2006 and 2008, the third event in this series was organized in 2011, again with support from the ARTIST DESIGN Network of Excellence. Following the practice established in the previous Challenges, the WCET Tool Challenge 2011 (WCC'11) defined two kinds of problems to be solved by the Challenge participants with their tools, WCET problems, which ask for bounds on the execution time, and flow-analysis problems, which ask for bounds on the number of times certain parts of the code can be executed. The benchmarks to be used in WCC'11 were debie1, PapaBench, and an industrial-strength application from the automotive domain provided by Daimler. Two default execution platforms were suggested to the participants, the ARM7 as "simple target" and the MPC5553/5554 as a "complex target," but participants were free to use other platforms as well. Ten tools participated in WCC'11: aiT, Astrée, Bound-T, FORTAS, METAMOC, OTAWA, SWEET, TimeWeaver, TuBound and WCA.
Dynamic branch predictors account for between 10% and 40% of a processor's dynamic power consumption. This power cost is proportional to the number of accesses made to that dynamic predictor during a program's...
详细信息
Dynamic branch predictors account for between 10% and 40% of a processor's dynamic power consumption. This power cost is proportional to the number of accesses made to that dynamic predictor during a program's execution. In this paper we propose the combined use of local delay region scheduling and profiling with an original adaptive branch bias measurement. The adaptive branch bias measurement takes note of the dynamic predictor's accuracy for a given branch and decides whether or not to assign a static prediction for that branch. The static prediction and local delay region scheduling information is represented as two hint bits in branch instructions. We show that, with the combined use of these two methods, the number of dynamic branch predictor accesses/updates can be reduced by up to 62%. The associated average power saving is very encouraging; for the example high-performance embedded architecture n average global processor power saving of 6.22% is achieved.
暂无评论