作者:
Babak FalsafiParallel Systems Architecture Laboratory
Institute of Computer and Communication SciencesSchool of Computer andCommunication SciencesEcole Polytechnique Fédérale de LausanneLausanneCH-1015Switzerland
Agile hardware design is an approach to developing hardware systems that draws inspiration from the principles and practices of agile software *** emphasizes collaboration,flexibility,iterative development,and quick a...
详细信息
Agile hardware design is an approach to developing hardware systems that draws inspiration from the principles and practices of agile software *** emphasizes collaboration,flexibility,iterative development,and quick adaptation to changing *** agile hardware design,the focus is on delivering functionalhardware systems in shorter development cycles while maintaining high-quality and customer *** particular,agile hardware design is of great interest in the open-source hardware ***-sourcehardware development—such as RISC-V—is at the forefront of initiatives to democratize hardware and drive innovation in chip design *** design is instrumental for the RISC-V community because it supportsrapid iteration,accommodates the evolving RISC-V standard and the addition of custom extensions,improvescommunity collaboration and time-to-market,and addresses the design challenges associated with complex architectural features.
The modeling of atmospheric processes in the context of weather and climate simulations is an important and computationally expensive challenge. The temporal integration of the underlying PDEs requires a very large nu...
详细信息
Effcient time integration schemes are necessary to capture the complex processes involved in atmospheric ows over long periods of time. In this work, we propose a high-order, implicit-explicit numerical scheme that co...
详细信息
One of the most critical challenges that new highperformance systems face is the lack of system software supportfor these large scale systems. Investment on system stack componentsis essential in the development, debu...
详细信息
ISBN:
(纸本)9781509036837
One of the most critical challenges that new highperformance systems face is the lack of system software supportfor these large scale systems. Investment on system stack componentsis essential in the development, debugging and optimizationof the new emerging programming models. These emergingmodels have the promise to better utilize the vast hardwareresources available in current and future systems. To aid in thedevelopment of applications and new system stacks, runtimes, asinstances of their respective execution models, need to producefacilities to introspect their inner workings and allow an indepthattribution of performance bottlenecks and computationalpatterns. In other words, the runtime systems need to reducetheir opacity to observers so that users of a novel programexecution model can adapt their designs to fit the intended modelusage, regardless of the layer that they are working on. Thisdesign/development loop (akin to co-design) enables synergisticopportunities across the entire computational stack. This paper presents the design and implementation of a simple"gray" box performance attribution harness running inside a finegrain runtime system: the Open Community Runtime (OCR). We showcase what such a framework can indicate regarding theruntime behavior while running at scale. To this end, we havedesigned a set of synthetic scenarios aimed to test the runtime attheir best and worst cases. We present an analysis of the mostimportant runtime features, properties and idiosyncrasies thatwill affect the development of new runtime features, algorithmicselection, and application development.
Codelet model is a fine-grained, event-driven hybrid parallel model inspired by dataflow, whose performance depends on the scheduling policy. How to design optimal codelet scheduling policy based on the features of ta...
详细信息
ISBN:
(纸本)9781509032068
Codelet model is a fine-grained, event-driven hybrid parallel model inspired by dataflow, whose performance depends on the scheduling policy. How to design optimal codelet scheduling policy based on the features of tasks is important to the codelet-based system performance. In this paper, we propose an adaptive codelet scheduling policy by combing "pure" genetic algorithm for tasks with complex dependencies. It is verified that the policy is effective based on bunches of experimental results.
Prior mobile malware defensive means is usually retroactive, which may either lead to high false negatives or can hardly recover systems states from malware activities. PreCrime is a proactive malware detection scheme...
详细信息
Analysis of massive graphs has emerged as an important area for massively parallel computation. In this paper, it is shown how the Fresh Breeze trees-of-chunks memory model may be used to perform breadth-first search ...
详细信息
The quadruped/biped reconfigurable walking robot with parallel leg mechanism can realize not only the quadruped walking, but also the biped walking. The converting process from the quadruped to the biped includes lock...
详细信息
Instruction-grain lifeguards monitor executing programs at the granularity of individual instructions to quickly detect bugs and security attacks, but their fine-grain nature incurs high monitoring overheads. This art...
详细信息
Distributed shared-memory (DSM) multiprocessors provide a scalable hardware platform, but lack the necessary redundancy for mainframe-level reliability and availability. Chip-level redundancy in a DSM server faces a k...
详细信息
Distributed shared-memory (DSM) multiprocessors provide a scalable hardware platform, but lack the necessary redundancy for mainframe-level reliability and availability. Chip-level redundancy in a DSM server faces a key challenge: the increased latency to check results among redundant components. To address performance overheads, we propose a checking filter that reduces the number of checking operations impeding the critical path of execution. Furthermore, we propose to decouple checking operations from the coherence protocol, which simplifies the implementation and permits reuse of existing coherence controller hardware. Our simulation results of commercial workloads indicate average performance overhead is within 4% (9% maximum) of tightly coupled DMR solutions.
暂无评论