Efficient implementation of DSP applications is critical for many embedded systems. Optimizing compilers for application programs, written in C, largely focus on code generation and scheduling, which, with their growi...
详细信息
We present our auto tuned heterogeneous parallel programming abstraction for the wave front pattern. An exhaustive search of the tuning space indicates that correct setting of tuning factors can average 37x speedup ov...
详细信息
We present our auto tuned heterogeneous parallel programming abstraction for the wave front pattern. An exhaustive search of the tuning space indicates that correct setting of tuning factors can average 37x speedup over a sequential baseline. Our best automated machine learning based heuristic obtains 92% of this ideal speedup, averaged across our full range of wave front examples.
Visualisation of the activities which occur inside a computer is an important aspect of computer architecture education. At the University of Edinburgh we are using a hierarchical computer architecture design and simu...
详细信息
Visualisation of the activities which occur inside a computer is an important aspect of computer architecture education. At the University of Edinburgh we are using a hierarchical computer architecture design and simulation environment (HASE) to build a number of architectural models for use in research and teaching. A new facility within HASE, JavaHASE, allows models to be translated into applets which can be accessed via the WWW. JavaHASE applets are programmable simulation models in which the code and data memory contents can be altered, the simulation re-run in the applet and the results used to visualise the activities taking place within the model (data movements, state changes, register/memory content changes, etc). These applets are being used in various ways in teaching.
We present a new high speed cycle-approximate simulator, addressing an important, neglected category of multi-core systems: deeply-embedded cache-incoherent MPSoCs. We take advantage of the unique properties of these ...
详细信息
ISBN:
(纸本)9781479901043
We present a new high speed cycle-approximate simulator, addressing an important, neglected category of multi-core systems: deeply-embedded cache-incoherent MPSoCs. We take advantage of the unique properties of these systems to increase the parallelism of the simulation. In doing so we achieve performance not possible using previous simulation techniques, without compromising the accuracy of the results. We present quantitative performance results across a large range of simulated NoC designs, comprising 1 to 64 cores. On average we simulate at 5.9 MIPS, with simulation speeds reaching 373 MIPS in the best case. Comparing against FPGA implementations we demonstrate that the simulator manages this with an average timing error of only 2.1%.
The outbreak and spreading of the COVID-19 pandemic have had a significant impact on transportation *** analyzing the impact of the pandemic on the transportation system,the impact of the pandemic on the social econom...
详细信息
The outbreak and spreading of the COVID-19 pandemic have had a significant impact on transportation *** analyzing the impact of the pandemic on the transportation system,the impact of the pandemic on the social economy can be reflected to a certain extent,and the effect of anti-pandemic policy implementation can also be *** addition,the analysis results are expected to provide support for policy ***,most of the relevant studies analyze the impact of the pandemic on the overall transportation system from the macro perspective,while few studies quantitatively analyze the impact of the pandemic on individual spatiotemporal travel *** on the license plate recognition(LPR)data,this paper analyzes the spatiotemporal travel patterns of travelers in each stage of the pandemic progress,quantifies the change of travelers'spatiotemporal behaviors,and analyzes the adjustment of travelers'behaviors under the influence of the *** are three different behavior adjustment strategies under the influence of the pandemic,and the behavior adjustment is related to the individual's past travel *** paper quantitatively assesses the impact of the COVID-19 pandemic on individual travel *** the method proposed in this paper can be used to quantitatively assess the impact of any long-term emergency on individual micro travel behavior.
Different realistic humanoid motion can be used in vary situations in animation. It also plays an important role in virtual reality. In this paper, we propose a novel method to generate different realistic humanoid mo...
详细信息
Customised processor performance generally increases as additional custom instructions are added. However, performance is not the only metric that modern systems must take into account; die area and energy efficiency ...
详细信息
Customised processor performance generally increases as additional custom instructions are added. However, performance is not the only metric that modern systems must take into account; die area and energy efficiency are equally important. Resource sharing during synthesis of instruction set extensions (ISEs) can reduce significantly the die area and energy consumption of a customised processor. This may increase the number of custom instructions that can be synthesized with a given area budget. Resource sharing involves combining the graph representations of two or more ISEs which contain a similar sub-graph. This coupling of multiple sub-graphs, if performed naively, can increase the latency of the extension instructions considerably. And yet, as we show in this paper, an appropriate level of resource sharing provides a significantly simpler design with only modest increases in average latency for extension instructions. Based on existing resource-sharing techniques, this study presents a new heuristic that controls the degree of resource sharing between a given set of custom instructions. Our main contributions are the introduction of a parametric method for exploring the trade-offs that can be achieved between instruction latency and implementation complexity, and the coupling of design-space exploration with fast area-delay models for the operators comprising each ISE. We present experimental evidence that our heuristic exposes a broad range of design points, allowing advantageous trade-offs between die area and latency to be found and exploited.
This paper introduces an adaptive parallel pipeline pattern which follows the GRASP (grid-adaptive structured parallelism) methodology. GRASP is a generic methodology to incorporate structural information at compile t...
详细信息
This paper introduces an adaptive parallel pipeline pattern which follows the GRASP (grid-adaptive structured parallelism) methodology. GRASP is a generic methodology to incorporate structural information at compile time into a parallel program that enables it to adapt automatically to dynamic variations in resource performance. GRASP instruments the pipeline with a series of pragmatic rules, which depend on particular performance thresholds based on the computation/communication patterns of the program and the availability of resources in the grid. Our parallel pipeline pattern is implemented as a parameterisable C/MPI API using a variable-size input data vector and a stage function array. We have evaluated its efficiency using a numerical benchmark stage function in a non-dedicated computational grid environment.
It is estimated that the amount of data coming out of an optical fibre is doubling every nine months and, thus, the growth rate in network bandwidth by far exceeds that of transistor density stated by Moore's law....
详细信息
It is estimated that the amount of data coming out of an optical fibre is doubling every nine months and, thus, the growth rate in network bandwidth by far exceeds that of transistor density stated by Moore's law. This causes excessive strain on network infrastructure nodes such as routers which need to operate at line rate in order to keep up with the external bandwidth requirements. Consequently, manufacturers of network processors have developed a wide range of technologies including highly parallel and specialised architectures to cope with ever increasing processing demands. Software tool support, however, lags behind and most research in compiling for network processors has focused on improved sequential and parallel code generation. In this paper we show that not code, but data organisation is the key obstacle to overcome in order to achieve high performance on network infrastructure applications. We evaluate three specialised data transformations (structure splitting, array regrouping, and software caching) against the industrial EEMBC networking benchmarks and real-world data sets. We demonstrate that speedups of up to 2.62 can be achieved, but at the same time no single solution performs equally well across all network traffic scenarios. This clearly indicates that adaptive data transformation schemes are necessary to ensure optimal performance under varying network loads.
Cloud providers are facing a complex problem in configuring software applications ready for deployment on their infrastructures. Hierarchical Task Network (HTN) planning can provide effective means to solve such deplo...
详细信息
暂无评论