Due to their streaming nature memory bandwidth is critical for most digital signal processing applications. To accommodate for these bandwidth requirements digital signal processors are typically equipped with dual me...
详细信息
Fault tolerance is an important issue in Grid computing as the availability of Grid resources can not be guaranteed. Effective scheduling methods must include fault tolerant mechanisms to preserve the execution of DAG...
详细信息
ISBN:
(纸本)9789639799028
Fault tolerance is an important issue in Grid computing as the availability of Grid resources can not be guaranteed. Effective scheduling methods must include fault tolerant mechanisms to preserve the execution of DAG applications, despite the presence of a processor failure. To address this, we designed the DAG rewinding mechanism, an event-driven process executed when a failure is detected at some rescheduling point. The rewinding mechanism preserves the execution of the application by recomputing and migrating those tasks which will disrupt the forward execution of succeeding tasks. The mechanism rewinds the progress of the application to a previous state, thereby preserving the execution despite the failed processor(s). This paper extends our work in the area by adding the rewinding mechanism to our previous dynamic scheduling methods GTP and GTP=c. We show how to integrate the rewinding mechanism within our dynamic execution models. Copyright 2007 ICST.
Instruction set simulators (ISS) have many uses in embedded software and hardware development and are typically based on dynamic binary translation (DBT), where frequently executed regions of guest instructions are co...
详细信息
Much compiler-orientated work in the area of mapping parallel programs to parallel architectures has ignored the issue of external workload. Given that the majority of platforms will not be dedicated to just one task ...
详细信息
ISBN:
(纸本)9781450302418
Much compiler-orientated work in the area of mapping parallel programs to parallel architectures has ignored the issue of external workload. Given that the majority of platforms will not be dedicated to just one task at a time, the impact of other jobs needs to be addressed. As mapping is highly dependent on the underlying machine, a technique that is easily portable across platforms is also desirable. In this paper we develop an approach for predicting the optimal number of threads for a given data-parallel application in the presence of external workload. We achieve 93.7% of the maximum speedup available which gives an average speedup of 1.66 on 4 cores, a factor 1.24 times better than the OpenMP compiler's default policy. We also develop an alternative cooperative model that minimizes the impact on external workload while still giving an improved average speedup. Finally, we evaluate our approach on a separate 8-core machine giving an average 1.33 times speedup over the default policy showing the portability of our approach. Copyright 2011 ACM.
Region-based JIT compilation operates on translation units comprising multiple basic blocks and, possibly cyclic or conditional, control flow between these. It promises to reconcile aggressive code optimisation and lo...
详细信息
ISBN:
(纸本)9781450328777
Region-based JIT compilation operates on translation units comprising multiple basic blocks and, possibly cyclic or conditional, control flow between these. It promises to reconcile aggressive code optimisation and low compilation latency in performancecritical dynamic binary translators. Whilst various region selection schemes and isolated code optimisation techniques have been investigated it remains unclear how to best exploit such regions for efficient code generation. Complex interactions with indirect branch tables and translation caches can have adverse effects on performance if not considered carefully. In this paper we present a complete code generation strategy for a region-based dynamic binary translator, which exploits branch type and control flow profiling information to improve code quality for the common case. We demonstrate that using our code generation strategy a competitive region-based dynamic compiler can be built on top of the LLVM JIT compilation framework. For the ARM V5T target ISA and SPEC CPU 2006 benchmarks we achieve execution rates of, on average, 867 MIPS and up to 1323 MIPS on a standard X86 host machine, outperforming state-of-the-art QEMU-ARM by delivering a speedup of 264%. Copyright is held by the owner/author(s). Publication rights licensed to ACM.
Resource sharing can be applied during data-path synthesis of Instruction-Set Extensions (ISEs) in order to obtain flexibility and area efficiency. The design space of resource sharing solutions can be explored in ord...
详细信息
HASE is a design and simulation environment that allows for rapid development and exploration of computer architectures at multiple levels of abstraction. The great flexibility of the graphical display has enabled the...
详细信息
ISBN:
(纸本)1565552687
HASE is a design and simulation environment that allows for rapid development and exploration of computer architectures at multiple levels of abstraction. The great flexibility of the graphical display has enabled the creation of models (Tomasulo's algorithm, DLX architecture, etc.) which have proved to be useful in their own right, particularly for teaching and demonstration purposes. In order to make the models widely accessible, two different ways of exporting them via the www have been investigated, WEBRASE and JAVAHASE. WEBHASE uses a viewer applet to visualise pre-run HA5E simulations whilst JAVAHASE allows existing simulation models to be translated into fully interactive simulation applets.
A DSM (Distributed-Shared Memory) cluster is an attractive parallel computing platform for scientific research as it provides programming advantages within a scalable and cost-effective hardware solution. This benefit...
详细信息
ISBN:
(纸本)9781622763511
A DSM (Distributed-Shared Memory) cluster is an attractive parallel computing platform for scientific research as it provides programming advantages within a scalable and cost-effective hardware solution. This benefit derives from the fact that a DSM system provides a shared-memory abstraction on top of a distributed-memory machine by caching data replicas locally. In this respect, a coherence protocol is a vital component responsible for assuring data consistency across all replicas. The design of coherence protocols impacts a DSM system in terms of both performance and accuracy. Performance is often measured via simulation and various verification techniques have been proposed to deal with protocol accuracy. Nevertheless, integrating accuracy verification into a DSM cluster simulation to ensure correct simulation results is still an open issue. In this paper, we address three properties of a coherence protocol (safety, liveness, and inclusion) without which errors may occur in the simulation results. We propose a Specification-based Parameter-Model Interaction (SPMI) technique to detect these cases in a particular DSM cluster simulation called DSIMCLUSTER. Our experimental results demonstrate that with SPMI, DSIMCLUSTER can ensure the coherence protocol properties and provides a correct reflection in the simulation model of the memory characteristics of real shared-memory and distributed-shared memory multiprocessors.
Large scale parallel programming projects may become heterogeneous in both language and architectural model. We propose that skeletal programming techniques can alleviate some of the costs involved in designing and po...
详细信息
Large scale parallel programming projects may become heterogeneous in both language and architectural model. We propose that skeletal programming techniques can alleviate some of the costs involved in designing and porting such programs, illustrating our approach with a simple program which combines shared memory and message passing code. We introduce Activity Graphs as a simple and practical means of capturing model independent aspects of the operational semantics of skeletal parallel programs. They are independent of low level details of parallel implementation and so can act as an intermediate layer for compilation to diverse underlying models. Activity graphs provide a notion of parallel activities, dependencies between activities, and the process groupings within which these take place. The compilation process uses a set of graph generators (templates) to derive the activity graph. We describe simple schemes for transforming activity graphs into message passing programs, targeting both MPI and BSP.
Children and adolescents are at an age where they are beginning to gain autonomy over choosing the foods they eat, yet may not have adequate support or information to make informed choices. This paper describes the de...
详细信息
ISBN:
(纸本)9781605583952
Children and adolescents are at an age where they are beginning to gain autonomy over choosing the foods they eat, yet may not have adequate support or information to make informed choices. This paper describes the design of a heuristic-based health game called MunchCrunch to help this age group learn more about healthy and unhealthy foods to develop balanced eating habits. Copyright 2009 ACM.
暂无评论