Many computation-intensive or recursive applications commonly found in digital signal processing and image processing applications can be represented by data-flowgraphs (DFGs). In our previous work, we proposed a new...
详细信息
ISBN:
(纸本)1581134622
Many computation-intensive or recursive applications commonly found in digital signal processing and image processing applications can be represented by data-flowgraphs (DFGs). In our previous work, we proposed a new technique, extended retiming, which can be combined with minimal unfolding to transform a DFG into one which is rate-optimal. The result, however, is a DFG with split nodes, a concise representation for pipelined schedules. This model and the extraction of the pipelined schedule it represents have heretofore not been explored. In this paper, we demonstrate one scheduling algorithm for such graphs, and then discuss a way to reduce the hardware requirements of the resulting schedule. In the process, we state and prove a tight upper bound on the minimum number of processors required to execute the static schedule produced by our algorithms. Finally, we demonstrate our methods on a specific example. Copyright 2002 ACM.
FPGA-based configurable computing machines are evolving rapidly. They offer the ability to deliver very high performance at a fraction of the cost when compared to supercomputers. The first generation of configurable ...
详细信息
FPGA-based configurable computing machines are evolving rapidly. They offer the ability to deliver very high performance at a fraction of the cost when compared to supercomputers. The first generation of configurable computers (those with multiple FPGAs connected using a specific interconnect) used statically reconfigurable FPGAs. On these configurable computers, computations are performed by partitioning an entire task into spatially interconnected subtasks. Such configurable computers are used in logic emulation systems and for functional verification of hardware. In general, configurable computers provide the ability to reconfigure rapidly to any desired custom form. Hence, the available resources can be reused effectively to cut down the hardware costs and also improve the performance. In this paper, we introduce the concept of temporal partitioning to partition a task into temporally interconnected subtasks. Specifically, we present algorithms for temporal partitioning and scheduling data flow graphs for configurable computers. We are given a configurable computing unit (RPU) with a logic capacity of S-RPU and a computational task represented by an acyclic data now graph G = (V, E). Computations with logic area requirements that exceed S-RPU cannot be completely mapped on a configurable computer (using traditional spatial mapping techniques). However, a temporal partitioning of the dataflow graph followed by proper scheduling can facilitate the configurable computer based execution. Temporal partitioning of the dataflow graph is a k-way partitioning of G = (V, E) such that each partitioned segment will not exceed S-RPU in its logic requirement. Scheduling assigns an execution order to the partitioned segments so as to ensure proper execution. Thus, for each segment in {s(1), s(2), ..., s(k)}, scheduling assigns a unique ordering s(i) --> j, 1 less than or equal to i less than or equal to k, 1 less than or equal to j less than or equal to k, such that the comp
The need for considering BIST requirements during the scheduling and assignment stages of behavioral synthesis has been demonstrated in previous research and techniques for reducing BIST resources of a data path durin...
详细信息
The need for considering BIST requirements during the scheduling and assignment stages of behavioral synthesis has been demonstrated in previous research and techniques for reducing BIST resources of a data path during these stages of synthesis have been developed. However, the degree of freedom that can be exploited during scheduling and assignment to minimize these resources is often limited by the data and control dependencies of a behavior. In this paper, we propose transformation of a behavior before scheduling and assignment, namely introducing redundant computations, such that the resulting data path is testable using few BIST resources. The transformation makes use of spare capacity of modules to add redundancy that enables test paths to be shared among the modules. A technique for identifying potential BIST resource sharing problems in a behavior and resolving them by redundant computation is presented. Introduction of redundant computations is performed without compromising the latency and functional resource requirement of the behavior.
This paper introduces a heuristic to solve the combined scheduling, resource binding, and wordlength selection problem for multiple wordlength systems. The algorithm involves an iterative refinement of operator wordle...
详细信息
ISBN:
(纸本)0769509932;0769509940
This paper introduces a heuristic to solve the combined scheduling, resource binding, and wordlength selection problem for multiple wordlength systems. The algorithm involves an iterative refinement of operator wordlength information, leading to a scheduled and bound data-flow graph. Scheduling is performed with incomplete wordlength information during the intermediate stages of this refinement process. Results show significant area savings over known alterative approaches.
One of problems of software design is the estimation of quality of the designed programs. Also estimation of quality of the programs is a major problem of software certification. Currently, in addition to the expert m...
详细信息
We examine the problem of detecting negative cycles in a dynamic graph, which is a fundamental problem that arises in electronic design automation and systems theory. We introduce the concept of adaptive negative cycl...
详细信息
This paper presents an application domain driven approach to the design of embedded systems on silicon, and it shea how this approach is used to design a chip for a multiwindow TV application. We discuss all major des...
详细信息
This paper presents an application domain driven approach to the design of embedded systems on silicon, and it shea how this approach is used to design a chip for a multiwindow TV application. We discuss all major design steps in a logical order starting with an application domain analysis. This lead to the choice of Kahn data flow graphs as the programming paradigm for high-throughput signal applications. Based on this analysis we designed a multiprocessor architecture which uses run-time reconfiguration. Finally, attention is spent toward the physical implementation and the deep-submicron problems we had to solve. The result is a chip that can manage up to 25 internal real-time video streams. The chip combines the flexibility of a programmable solution with the cost effectiveness of a consumer product.
As more data processing functions are integrated into systems-on-chip, data path is becoming a critical part of the whole VLSI design. However, traditional physical design methodology can not satisfy the data path per...
详细信息
ISBN:
(纸本)0780364457;0780364465
As more data processing functions are integrated into systems-on-chip, data path is becoming a critical part of the whole VLSI design. However, traditional physical design methodology can not satisfy the data path performance requirement because it has no knowledge of the data path bit-sliced structure. In this paper, an Abstract Physical Model (APM) is proposed to extract bit-slice regularity information from data Plow Graph (DFG) and it is used for interconnect and congestion planning. A two step, heuristic algorithm is introduced to optimize the linear placement of APM to satisfy both the wire length and routing track budget.
We reexamine the limits of parallelism available in programs, using run-time reconstruction of program data-flowgraphs. While limits of parallelism have been examined in the context of superscalar and VLIW machines, ...
详细信息
ISBN:
(纸本)3540679561
We reexamine the limits of parallelism available in programs, using run-time reconstruction of program data-flowgraphs. While limits of parallelism have been examined in the context of superscalar and VLIW machines, we also wish to study the causes of observed parallelism by examining the structure of the reconstructed data-flow graph. One aspect of structure analysis that we focus on is the isolation of instructions involved only in address calculations. We examine how address calculations present in RISC instruction streams generated by optimizing compilers affect the shape of the data-flow graph and often significantly reduce available parallelism.
Program instructions that consume and produce small operands can be executed in hardware circuitry of less than full size. We compare different proposed models of accounting for the usefulness of bit-positions in oper...
详细信息
暂无评论