检索结果-内蒙古大学图书馆

11th International Workshop on Software and Compilers for Embedded systems, SCOPES 2008

作者： Murray, Alastair Franke, Björn University of Edinburgh School of Informatics Institute for Computing Systems Architecture United Kingdom

ISBN: (纸本)9781450378437

Due to their streaming nature memory bandwidth is critical for most digital signal processing applications. To accommodate for these bandwidth requirements digital signal processors are typically equipped with dual memory banks that enable simultaneous access to two operands if the data is partitioned appropriately. Fully automated and compiler integrated approaches to data partitioning and memory bank assignment, however, have found little acceptance by DSP software developers. This is partly due to their inflexibility and inability to cope with certain manual data pre-assignments, e.g. due to I/O constraints. In this paper we present a different and more flexible approach, namely source-level dual memory assignment where code generation targets DSP-C, a standardised C language extension widely supported by industrial C compilers for DSPs. Additionally, we present a novel partitioning algorithm based on soft colouring that is more efficient and scalable than the currently known best integer linear programming algorithm, whilst achieving competitive code quality. We have evaluated our scheme on an Analog Devices TigerSHARC DSP and achieved speedups of up to 1.57 on 13 UTDSP benchmarks. © 2008 by EDAA.

关键词： Digital signal processing

来源：评论

学校读者我要写书评

暂无评论

Reliable DAG scheduling on grids with rewinding and migration 1

Reliable DAG scheduling on grids with rewinding and migratio...

引用

1st International Conference on Networks for Grid Applications, GridNets 2007

作者： Hernandez, Israel Cole, Murray Institute for Computing Systems Architecture School of Informatics University of Edinburgh United Kingdom

ISBN: (纸本)9789639799028

Fault tolerance is an important issue in Grid computing as the availability of Grid resources can not be guaranteed. Effective scheduling methods must include fault tolerant mechanisms to preserve the execution of DAG applications, despite the presence of a processor failure. To address this, we designed the DAG rewinding mechanism, an event-driven process executed when a failure is detected at some rescheduling point. The rewinding mechanism preserves the execution of the application by recomputing and migrating those tasks which will disrupt the forward execution of succeeding tasks. The mechanism rewinds the progress of the application to a previous state, thereby preserving the execution despite the failed processor(s). This paper extends our work in the area by adding the rewinding mechanism to our previous dynamic scheduling methods GTP and GTP=c. We show how to integrate the rewinding mechanism within our dynamic execution models. Copyright 2007 ICST.

关键词： Fault tolerance

来源：评论

学校读者我要写书评

暂无评论

Efficient asynchronous interrupt handling in a full-system instruction set simulator 2016

Efficient asynchronous interrupt handling in a full-system i...

引用

17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools and Theory for Embedded systems, LCTES 2016

作者： Spink, Tom Wagstaff, Harry Franke, Björn Institute for Computing Systems Architecture School of Informatics University of Edinburgh United Kingdom

ISBN: (纸本)9781450343169

Instruction set simulators (ISS) have many uses in embedded software and hardware development and are typically based on dynamic binary translation (DBT), where frequently executed regions of guest instructions are compiled into host instructions using a just-in-time (JIT) compiler. Full-system simulation, which necessitates handling of asynchronous interrupts from e.g. timers and I/O devices, complicates matters as control flow is interrupted unpredictably and diverted from the current region of code. In this paper we present a novel scheme for handling of asynchronous interrupts, which integrates seamlessly into a region-based dynamic binary translator. We first show that our scheme is correct, i.e. interrupt handling is not deferred indefinitely, even in the presence of code regions comprising control flow loops. We demonstrate that our new interrupt handling scheme is efficient as we minimise the number of inserted checks. Interrupt handlers are also presented to the JIT compiler and compiled to native code, further enhancing the performance of our system. We have evaluated our scheme in an ARM simulator using a region-based JIT compilation strategy. We demonstrate that our solution reduces the number of dynamic interrupt checks by 73%, reduces interrupt service latency by 26% and improves throughput of an I/O bound workload by 7%, over traditional per-block schemes. © 2016 ACM.

关键词： Simulators

来源：评论

学校读者我要写书评

暂无评论

A workload-aware mapping approach for data-parallel programs 11

A workload-aware mapping approach for data-parallel programs

引用

Proceedings of the 6th International Conference on High Performance and Embedded architectures and Compilers

作者： Grewe, Dominik Wang, Zheng O'Boyle, Michael F. P. Institute for Computing Systems Architecture School of Informatics University of Edinburgh United Kingdom

ISBN: (纸本)9781450302418

Much compiler-orientated work in the area of mapping parallel programs to parallel architectures has ignored the issue of external workload. Given that the majority of platforms will not be dedicated to just one task at a time, the impact of other jobs needs to be addressed. As mapping is highly dependent on the underlying machine, a technique that is easily portable across platforms is also desirable. In this paper we develop an approach for predicting the optimal number of threads for a given data-parallel application in the presence of external workload. We achieve 93.7% of the maximum speedup available which gives an average speedup of 1.66 on 4 cores, a factor 1.24 times better than the OpenMP compiler's default policy. We also develop an alternative cooperative model that minimizes the impact on external workload while still giving an improved average speedup. Finally, we evaluate our approach on a separate 8-core machine giving an average 1.33 times speedup over the default policy showing the portability of our approach. Copyright 2011 ACM.

关键词： Mapping

来源：评论

学校读者我要写书评

暂无评论

Efficient code generation in a region-based dynamic binary translator 14

Efficient code generation in a region-based dynamic binary t...

引用

Proceedings of the 2014 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems

作者： Spink, Tom Wagstaff, Harry Franke, Björn Topham, Nigel Institute for Computing Systems Architecture School of Informatics University of Edinburgh United Kingdom

ISBN: (纸本)9781450328777

Region-based JIT compilation operates on translation units comprising multiple basic blocks and, possibly cyclic or conditional, control flow between these. It promises to reconcile aggressive code optimisation and low compilation latency in performancecritical dynamic binary translators. Whilst various region selection schemes and isolated code optimisation techniques have been investigated it remains unclear how to best exploit such regions for efficient code generation. Complex interactions with indirect branch tables and translation caches can have adverse effects on performance if not considered carefully. In this paper we present a complete code generation strategy for a region-based dynamic binary translator, which exploits branch type and control flow profiling information to improve code quality for the common case. We demonstrate that using our code generation strategy a competitive region-based dynamic compiler can be built on top of the LLVM JIT compilation framework. For the ARM V5T target ISA and SPEC CPU 2006 benchmarks we achieve execution rates of, on average, 867 MIPS and up to 1323 MIPS on a standard X86 host machine, outperforming state-of-the-art QEMU-ARM by delivering a speedup of 264%. Copyright is held by the owner/author(s). Publication rights licensed to ACM.

关键词： Quality control

来源：评论

学校读者我要写书评

暂无评论

Exploring the unified design-space of custom-instruction selection and resource sharing

Exploring the unified design-space of custom-instruction sel...

引用

International Conference on Embedded Computer systems

作者： Zuluaga, Marcela Topham, Nigel Institute for Computing Systems Architecture School of Informatics University of Edinburgh United Kingdom

ISBN: (纸本)9781424479382

Resource sharing can be applied during data-path synthesis of Instruction-Set Extensions (ISEs) in order to obtain flexibility and area efficiency. The design space of resource sharing solutions can be explored in order to find the trade-offs between area and instruction latency that suit the design goals. On the other hand, area is a proven global constraint that should be considered in the ISE selection process, since maximizing speedup as a unique goal assumes the availability of unlimited resources. Thus, a selection process should be aware of the area requirements of a subset of ISE candidates. However, when resource sharing is used for ISE data-path synthesis, the area and profitability of the subset cannot be known until resource sharing is attempted. This paper proposes a hardware/software partitioning framework in which the selection of ISEs interacts with the resource sharing process in order drive the exploration of the selection design space towards implementation alternatives that are likely to increase the utilization of the given area resources. On the benchmarks analyzed in this paper, our techniques find solutions that under a fixed area constraint, achieve speedups from 8% to 238% higher than previous selection techniques. Furthermore, unlike previous approaches, the proposed framework allows the exploration, at the selection level, of the design space of trade-offs between speedup and area that are available to the designer. ©2010 IEEE.

关键词： Digital storage

来源：评论

学校读者我要写书评

暂无评论

JAVAHASE: Automatic generation of applets from HASE simulation models

JAVAHASE: Automatic generation of applets from HASE simulati...

引用

2003 Summer Computer Simulation Conference, SCSC 2003

作者： Mallet, F. Ibbett, R.N. Institute for Computing Systems Architecture School of Informatics University of Edinburgh United Kingdom

ISBN: (纸本)1565552687

HASE is a design and simulation environment that allows for rapid development and exploration of computer architectures at multiple levels of abstraction. The great flexibility of the graphical display has enabled the creation of models (Tomasulo's algorithm, DLX architecture, etc.) which have proved to be useful in their own right, particularly for teaching and demonstration purposes. In order to make the models widely accessible, two different ways of exporting them via the www have been investigated, WEBRASE and JAVAHASE. WEBHASE uses a viewer applet to visualise pre-run HA5E simulations whilst JAVAHASE allows existing simulation models to be translated into fully interactive simulation applets.

关键词： Computer architecture

来源：评论

学校读者我要写书评

暂无评论

Specification-based parameter-model interaction: Towards a correct reflection of memory characteristics in a DSM cluster simulation

Specification-based parameter-model interaction: Towards a c...

引用

Summer Computer Simulation Conference 2005, SCSC 2005, Part of the 2005 Summer Simulation Multiconference, SummerSim 2005

作者： Marurngsith, Worawan Ibbett, Roland N. Institute for Computing Systems Architecture School of Informatics University of Edinburgh United Kingdom

ISBN: (纸本)9781622763511

A DSM (Distributed-Shared Memory) cluster is an attractive parallel computing platform for scientific research as it provides programming advantages within a scalable and cost-effective hardware solution. This benefit derives from the fact that a DSM system provides a shared-memory abstraction on top of a distributed-memory machine by caching data replicas locally. In this respect, a coherence protocol is a vital component responsible for assuring data consistency across all replicas. The design of coherence protocols impacts a DSM system in terms of both performance and accuracy. Performance is often measured via simulation and various verification techniques have been proposed to deal with protocol accuracy. Nevertheless, integrating accuracy verification into a DSM cluster simulation to ensure correct simulation results is still an open issue. In this paper, we address three properties of a coherence protocol (safety, liveness, and inclusion) without which errors may occur in the simulation results. We propose a Specification-based Parameter-Model Interaction (SPMI) technique to detect these cases in a particular DSM cluster simulation called DSIMCLUSTER. Our experimental results demonstrate that with SPMI, DSIMCLUSTER can ensure the coherence protocol properties and provides a correct reflection in the simulation model of the memory characteristics of real shared-memory and distributed-shared memory multiprocessors.

关键词： Specifications

来源：评论

学校读者我要写书评

暂无评论

Coordinating heterogeneous parallel systems with skeletons and Activity Graphs

引用

Journal of systems Integration 2001年第2期10卷 127-143页

作者： Cole, Murray Zavanella, Andrea Institute for Computing Systems Architecture Division of Informatics University of Edinburgh United Kingdom Dipartimento di Informatica Universitá di Pisa Pisa Italy

Large scale parallel programming projects may become heterogeneous in both language and architectural model. We propose that skeletal programming techniques can alleviate some of the costs involved in designing and porting such programs, illustrating our approach with a simple program which combines shared memory and message passing code. We introduce Activity Graphs as a simple and practical means of capturing model independent aspects of the operational semantics of skeletal parallel programs. They are independent of low level details of parallel implementation and so can act as an intermediate layer for compilation to diverse underlying models. Activity graphs provide a notion of parallel activities, dependencies between activities, and the process groupings within which these take place. The compilation process uses a set of graph generators (templates) to derive the activity graph. We describe simple schemes for transforming activity graphs into message passing programs, targeting both MPI and BSP.

关键词： Parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

MunchCrunch: A game to learn healthy-eating heuristics 09

MunchCrunch: A game to learn healthy-eating heuristics

引用

8th International Conference on Interaction Design and Children, IDC 2009

作者： Mansour, Anna Barve, Mugdha Bhat, Sushama Do, Ellen Yi-Luen Georgia Institute of Technology School of Interactive Computing United States College of Architecture and Health Systems Institute

ISBN: (纸本)9781605583952

Children and adolescents are at an age where they are beginning to gain autonomy over choosing the foods they eat, yet may not have adequate support or information to make informed choices. This paper describes the design of a heuristic-based health game called MunchCrunch to help this age group learn more about healthy and unhealthy foods to develop balanced eating habits. Copyright 2009 ACM.

关键词： Computer programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：