synchronous dataflow graph (SDFG in short) is a formalism frequently considered in electronic design and software compilers to model communications between components with different rates. The development of efficient...
详细信息
synchronous dataflow graph (SDFG in short) is a formalism frequently considered in electronic design and software compilers to model communications between components with different rates. The development of efficient algorithms to evaluate the maximum throughput of SDFGs is a challenging question. This paper presents a mathematical framework to perform schedulability analysis and to compute the maximum throughput of SDFGs. This work focuses on strictly K-Periodic schedules for which a fixed set of execution times coupled with a period are associated with each task and define a schedule of every task executions. This class of schedules can always reach maximal throughput;we present an algorithm that computes the exact maximum throughput by iteratively generating K-periodic schedules until we reach optimality. The complexity of this iterative algorithm is studied by using the well-established benchmarking suite SDF3, and compared against the most common throughput analysis techniques. We show several orders of magnitude improvement over state-of-the-art both in terms of computation time, and size of the final schedules.
Mixed applications that gather real-time tasks and best effort jobs require a research effort in order to be effectively modeled and executed. Therefore, in this study we define a general and intuitive communication m...
详细信息
ISBN:
(纸本)9781450347877
Mixed applications that gather real-time tasks and best effort jobs require a research effort in order to be effectively modeled and executed. Therefore, in this study we define a general and intuitive communication model between multi-periodic real-time tasks. We first demonstrate that the communications between real-time tasks can be directly expressed as a "synchronous Data-flow graph". This modeling allows precise definition of the system latency. Accordingly, we develop an exact evaluation method to calculate the worst case latency of a system from a given input to a connected outcome. Then, we frame this value using two algorithms that compute its upper and lower bounds. Finally, we show that these bounds can be computed using a polynomial amount of computation time, while the time required to compute the exact value increases linearly according to the average repetition factor. Furthermore, the gap between the exact result and its upper (resp. lower) bound is evaluated between 10 and 15 % (resp. 20 and 30%).
The search of a mapping of a synchronous Data Flow graph (SDFG) on a distributed architecture that achieves a given throughput while satisfying memory constraints is a difficult challenge. Solving this problem calls f...
详细信息
ISBN:
(纸本)9781509028160
The search of a mapping of a synchronous Data Flow graph (SDFG) on a distributed architecture that achieves a given throughput while satisfying memory constraints is a difficult challenge. Solving this problem calls for evaluating throughput and buffer capacities associated to a mapping. Since the available mapping evaluation methods are not polynomial with respect to the SDFG description, mapping techniques using them are not scalable. This paper develops a polynomial method for the evaluation of any given SDFG mapping on a distributed architecture. The method is based on a simple transformation of the SDFG to model communications through a Network on Chip. The key result is that the size of the memory required in order to guarantee the liveness or a given throughput of an application may be evaluated in polynomial time. Experimentally, computing the memory size guaranteeing liveness of a mapping of a 670-node H264 graph on a 4-cluster architecture takes 70 ms on an Intel Core i5-660 processor and grows linearly with graph size.
In this article, we propose a novel technique that estimates a tight upper bound of the worst-case response time (WCRT) of a synchronousdataflow (SDF) graph when the SDF graph shares processors with other real-time t...
详细信息
In this article, we propose a novel technique that estimates a tight upper bound of the worst-case response time (WCRT) of a synchronousdataflow (SDF) graph when the SDF graph shares processors with other real-time tasks. When an SDF graph is executed at runtime under a self-timed or static assignment scheduling policy on a multi-processor system, static scheduling of the SDF graph does not guarantee the satisfaction of latency constraints since changes to the schedule may result in timing anomalies. To estimate the WCRT of an SDF graph with a given mapping and scheduling result, we first construct a task instance dependency graph that depicts the dependency between node executions in a static schedule. The proposed technique combines two techniques in a novel way: schedule time bound analysis and response time analysis. The former is used to consider the interference between task instances in the same SDF graph, and the latter is used to consider the interference from other real-time tasks. Through extensive experiments with synthetic examples and benchmarks, we verify the superior performance of the proposed technique compared to other existent techniques.
Multi-processor systems-on-chips are widely adopted in implementing modern streaming applications to satisfy the ever increasing computation requirements. To take advantage of this kind of platform, it is necessary to...
详细信息
Multi-processor systems-on-chips are widely adopted in implementing modern streaming applications to satisfy the ever increasing computation requirements. To take advantage of this kind of platform, it is necessary to map tasks of the application properly to different processors, so as to fully exploit the inherent task-level parallelism and satisfy the stringent timing requirements. We propose the Parallelism graph to capture the task-level parallelism of the application and transform the mapping problem to a graph partitioning problem. The graph partitioning problem is formulated as an Integer Linear Programming problem, which is solved optimally using the ILP solver. To reduce the complexity, a two-step local search algorithm, i.e., the greedy partition and refinement algorithm, is proposed. Since one-shot heuristics cannot guarantee the solution quality, evolutionary algorithms are widely used to search the solution space such that better results can be found. We also integrate the idea of parallelism enhancement into the genetic algorithm and propose a hybrid genetic algorithm to improve the performance. Sets of synthesized synchronous Data Flow graphs and some practical applications are used to evaluate the performance of the proposed algorithms. Experiment results demonstrate that the proposed algorithms outperform available algorithms. (C) 2016 Elsevier Inc. All rights reserved.
In this paper, we propose a new single appearance schedule for synchronousdataflow programs to minimize data memory and code memory size simultaneously. While a single appearance schedule promises only one appearance...
详细信息
ISBN:
(纸本)0780394518
In this paper, we propose a new single appearance schedule for synchronousdataflow programs to minimize data memory and code memory size simultaneously. While a single appearance schedule promises only one appearance of each node definition in the generated code, it requires significant amount of data memory overhead compared with a buffer optimal schedule allowing multiple appearance. The key idea of the proposed technique is to make a dynamic decision of loop count to make a schedule quasi-static. The proposed quasi-static schedule produces a single appearance schedule code with minimum data memory requirement. We prove that every buffer optimal schedule can be transformed to our single appearance schedule which requires optimal buffer size for arbitrary synchronous dataflow graphs. The only penalty for the proposed technique is slight performance overhead of computing loop counts dynamically. In order to minimize the overhead we propose optimization techniques. Experimental results show that the proposed algorithm reduces 20% total memory with less than 1% performance overhead compared with the previous single appearance schedule algorithms.
STT-MRAM has been recently researched to replace DRAM in order to reduce the cell size and save the leakage power consumption. Although the read operation in STT-MRAM is acceptable in terms of performance and energy c...
详细信息
STT-MRAM has been recently researched to replace DRAM in order to reduce the cell size and save the leakage power consumption. Although the read operation in STT-MRAM is acceptable in terms of performance and energy consumption, the write operation discourages the adoption of the STT-MRAM as main memory. A promising approach to overcome the poor write operation is to reduce the planar cell size which decreases the retention time, the write latency and the write energy consumption since the change of the cell size requires no additional manufacturing process. However, since refresh is required in the reduced retention time memory just like DRAM, the leakage energy consumption may increase compared with a traditional STT-MRAM with long retention time. This paper solves the buffer mapping problem onto a system with multiple retention time memories for a stream application to minimize the energy consumption. Experimental results show that a system with two or three different retention time STT-MRAMs reduces 45-75 % of write energy consumption compared with a single long retention time STT-MRAM.
This article proposes a lifetime aware buffer assignment method for streaming applications like multimedia specified in a synchronousdataflow (SDF) graph on a DRAM/PRAM hybrid memory in which the endurance of PRAM is...
详细信息
This article proposes a lifetime aware buffer assignment method for streaming applications like multimedia specified in a synchronousdataflow (SDF) graph on a DRAM/PRAM hybrid memory in which the endurance of PRAM is limited. We determine whether buffers are assigned to DRAM or PRAM to minimize the writing frequency of PRAM. To solve the problems, we formulate them using Answer Set Programming. Experimental results show that the proposed approach increases the PRAM lifetime by 63% compared with no optimization, and shows the tradeoff between PRAM and DRAM size to guarantee a lifetime constraint.
暂无评论