Computational biology applications expression profile is a critical performance metric in high end genomic data processing. These profiles are compute intensive and offers a wide range of computation pattern ranging f...
详细信息
Next generation multimedia mobile phones that use the high bandwidth 3G cellular radio network consume more power. Multimedia algorithms such as speech, video transcodecs have very large instruction foot prints and co...
详细信息
ISBN:
(纸本)9781604233926
Next generation multimedia mobile phones that use the high bandwidth 3G cellular radio network consume more power. Multimedia algorithms such as speech, video transcodecs have very large instruction foot prints and consequently stalled due to instruction cache misses. The conflicts in on-chip caches contribute a large fraction of the CPU cycle penalty and hence increase in power consumption. Many commercial tools are developed to minimize such cache misses by adequately placing the frequently called procedures in an application. Followed by profile extraction, these tools use cache line coloring and post compilation techniques for cache hit optimization. The major impediment in the optimal performance of such tools is their static layout profile, which does not consider the dynamic behavior of the application. We propose a methodology called DCP (dynamic code placement) for positioning code at run time for good instruction cache performance and have implemented in high end processors. The dynamic application profile is completely transparent to the developer’s code. This technique optimizes the code footprint in memory layout of a program. It improves i-cache mapping to increase the number of cache hits and eventually reduce the CPU stalls. Our optimization is powered with static as well as detail run time profile information that extracts the relevant, temporal behavior of the applications. Moreover, while mapping code in instruction cache, the effect of inter-procedural code positioning is also considered. Improvement over the Pettis and Hansen approach (PH) is also shown in results. Though majority of multimedia applications can be optimized by our framework, application dominated with the function pointers do not work correctly. The technique incurs low overheads and enhances the cache hits architecture correlation. For a range of applications we show that instruction miss rates can be reduced by 19-68%. Using a simple model this corresponds to execution time re
Architectures are increasingly becoming difficult to fully utilize. The growing trend towards the multiple peripherals on single chip complex embedded system having multiple peripherals has fueled the energy aspect of...
详细信息
ISBN:
(纸本)1424407265
Architectures are increasingly becoming difficult to fully utilize. The growing trend towards the multiple peripherals on single chip complex embedded system having multiple peripherals has fueled the energy aspect of compute and data intensive applications. Our experiments show that although high performance applications tends to be more cycle efficient, but there energy efficiency is reduced by many factors, such as optimal architecture utilization, poor compilation optimization, to name a few. Our methodology exploits parallelism, inherent in multimedia DSP applications, as well as in multimedia DSP processors. Our proposed techniques include profile based compilation-approach which makes the source-to-source transformation more energy efficient. The profile monitor identifies the application expression slacks with respect to the underlying hardware architecture in order to selectively apply different transformation schemes depending on the observed static and runtime profile and to filter out unnecessary optimization iteration. We also propose a stochastic filtering technique to further reduce the optimization search space and hence offline compilation overhead due to huge compiler optimization options. Our experiments show that the proposed techniques increase the parallelism by close to 51% for Viterbi decoder, 79% for MPEG-2, 32% for H-263, and 84% for MPEG-4 without loosing performance benefits.
The synergy of software and hardware leads to efficient application expression profile (AEP) not only in terms of execution time and energy but also optimal architecture usage. We present an architecture-based paramet...
详细信息
ISBN:
(纸本)9781604236750
The synergy of software and hardware leads to efficient application expression profile (AEP) not only in terms of execution time and energy but also optimal architecture usage. We present an architecture-based parametric optimization of 'C' source code for iterative compilation. Successive source-level, code transformations are applied in order to evaluate an application expression profile on complex multimedia processors. The proposed new code transformation methodology determines appropriate parameters for compiler optimization in order to satisfy user constraints on code size, energy, execution time and optimal target architecture usage. The optimization is based on a multicriteria, objective function. The constraints of this objective function are formulated using a penalty method;a genetic algorithm finds solutions eventually. We examined the performance improvement across typical different multimedia applications on a multimedia processor, TM1302 (Philips). Candidate applications include m100, m200, nlivq, MPEG-1, G-721 and H-264L. Experimental results show that our approach reduces cache misses by an average of 36% (max. 71%), improves typically energy dissipation up to 17% and CPU performance up to 60% for an H-264L video codec algorithm. However, the code size tends to be large, which inevitably leads to a larger memory size. The approach is general and can easily be integrated in multimedia processor compilers.
Future ubiquitous computing will be consisted of set-top boxes and handheld devices based on enormous computation power and capability to handle multimedia applications workload. Efficient architecture utilization and...
详细信息
ISBN:
(纸本)1424405556
Future ubiquitous computing will be consisted of set-top boxes and handheld devices based on enormous computation power and capability to handle multimedia applications workload. Efficient architecture utilization and optimal application binary will be two important performance metrics for these embedded systems. In real-time system, the conventional method of system design cannot be used. In these methods, the cycle and code size are primarily considered, while power dissipation issue is completely ignored, that inevitably lead to expensive cooling mechanism and eventually increase the system overall cost while reducing reliability. An integrated approach that considers energy-cycle performance at architecture as well as application level is essential for energy efficient application development. This paper focuses on distributed optimization paradigm, based on vertical profiling. The proposal is based on enhance local optimization and peer-to-peer (P2F) source code compilation across application build flow. The earlier procedures incorporate additional steps for pre/post profiling during compilation, scheduling and linking phases. This iterative activity is carried out by methods based on wait-for-graphs or profile monitors. These methods introduce a centralized evaluation at source code (for each transformation and monitored parameters during successive approximation). The proposal introduces the asynchronous evaluation in source to source (sts) process. As a result the code transformation do not trapped in local optimization, rather look for global optimization (for both energy and cycle). methodology is readily adaptable in an iterative compilation environment where application source code is optimized to satisfy user constraints on code size, energy, execution time and optimal target architecture usage. Experimental results show that our approach enhances parallelism upto 38% in G-721 speech codec, increases architecture correlation upto 17% in video transcode
This paper focuses on architecture-aware source-level code transformation methods for low energy consumption in complex VLIW processors. Though energy issue has long been addressed at hardware level, but a significant...
详细信息
Architectures are increasingly becoming difficult to fully utilize. The growing trend towards the multiple peripherals on single chip complex embedded system having multiple peripherals has fueled the energy aspect of...
详细信息
Architectures are increasingly becoming difficult to fully utilize. The growing trend towards the multiple peripherals on single chip complex embedded system having multiple peripherals has fueled the energy aspect of compute and data intensive applications. Our experiments show that although high performance applications tends to be more cycle efficient, but there energy efficiency is reduced by many factors, such as optimal architecture utilization, poor compilation optimization, to name a few. Our methodology exploits parallelism, inherent in multimedia DSP applications, as well as in multimedia DSP processors. Our proposed techniques include profile based compilation-approach which makes the source-to-source transformation more energy efficient. The profile monitor identifies the application expression slacks with respect to the underlying hardware architecture in order to selectively apply different transformation schemes depending on the observed static and runtime profile and to filter out unnecessary optimization iteration. We also propose a stochastic filtering technique to further reduce the optimization search space and hence offline compilation overhead due to huge compiler optimization options. Our experiments show that the proposed techniques increase the parallelism by close to 51% for Viterbi decoder, 79% for MPEG-2, 32% for H-263, and 84% for MPEG-4 without loosing performance benefits.
This paper focuses on distributed optimization paradigm, based on vertical profiling. The proposal is based on enhance local optimization and peer-to-peer (P2P) source code compilation across application build flow. T...
详细信息
This paper focuses on distributed optimization paradigm, based on vertical profiling. The proposal is based on enhance local optimization and peer-to-peer (P2P) source code compilation across application build flow. The earlier procedures incorporate additional steps for pre/post profiling during compilation, scheduling and linking phases. This iterative activity is carried out by methods based on wait-for-graphs or profile monitors. These methods introduce a centralized evaluation at source code (for each transformation and monitored parameters during successive approximation). The proposal introduces the asynchronous evaluation in source to source (sts) process. As a result the code transformation do not trapped in local optimization, rather look for global optimization (for both energy and cycle). methodology is readily adaptable in an iterative compilation environment where application source code is optimized to satisfy user constraints on code size, energy, execution time and optimal target architecture usage. Experimental results show that our approach enhances parallelism up to 38% in G-721 speech codec, increases architecture correlation up to 17% in video transcodec H-264L, improves power efficiency as much as 43% for resVQ DSP algorithms. The technique incurs low overheads and enhances the application architecture correlation.
Handheld embedded systems are crucial to obtain high performance for execution time as well as efficient battery usage. Unfortunately, most of the compilation techniques to obtain an efficient binary code for complex ...
详细信息
Handheld embedded systems are crucial to obtain high performance for execution time as well as efficient battery usage. Unfortunately, most of the compilation techniques to obtain an efficient binary code for complex multimedia processors lack the optimal energy-cycle code. This work describes the methodology of the compilation for next generation handheld devices, which will support data-compute intensive applications at small form factors. We designed and implemented optimization framework which supports conventional code optimization scheme along with additional energy efficient benefits for multimedia applications such as MPEG-2 transcodec, H-264L transcodec. Optimization space is searched with genetic algorithm. Whole scheme reduces energy, on both as per cycle basis and as the total energy used over the lifetime of an application. The optimized G-728 audio codec meets real-time constraints on the Nexperia series of multimedia processor with low energy consumption. Furthermore, the performance improves by a factor of 0.489 and the energy consumption decreases by a factor of 0.203 over the baseline executable code
Bioinformatics applications expression profile is a critical performance metric in high end genomic data processing. These profiles are compute intensive and offers a wide range of computation pattern ranging from dat...
详细信息
暂无评论