This work concerns automatic hardware synthesis from dataflow graph (DFG) specification for fast HW/SW cosynthesis. A node in DFG represents a coarse grain block such as FIR and DCT and a port in a block may consume ...
详细信息
ISBN:
(纸本)1581139373
This work concerns automatic hardware synthesis from dataflow graph (DFG) specification for fast HW/SW cosynthesis. A node in DFG represents a coarse grain block such as FIR and DCT and a port in a block may consume multiple data samples per invocation, which distinguishes our approach from behavioral synthesis and complicates the problem. In the presented design methodology, a dataflow graph with specified algorithm can be mapped to various hardware structures according to the resource allocation and schedule information. This simplifies the management of the area/performance tradeoff in hardware design and widens the design space of hardware implementation of a dataflow graph compared with the previous approaches. Through experiments with some examples, the usefulness of the proposed technique is demonstrated.
This paper proposes the architecture and a model of business process support (BPS) systems for e-government organizational boundaries. Today, processes are mostly adapted to existing technologies; in contrast, the pro...
详细信息
This paper proposes the architecture and a model of business process support (BPS) systems for e-government organizational boundaries. Today, processes are mostly adapted to existing technologies; in contrast, the proposed architecture intends to support typical e-government processes. The proposed system consists of three main IT components: a workflow system for administration and control of information flows; Web services enabling data transfer between computers; and a Web server controlling all applications for communication with all users in each organization. The proposed model of the workflow enactment server consists of two nodes: the rule base collection, and the workflow engine. The workflow engine consists of two components A and B for process execution and control.
Code motion (CM) is a technique for program transformation aimed at removing program statements that are redundant due to recomputing previously produced values. Such transformations improve the program performance an...
详细信息
Code motion (CM) is a technique for program transformation aimed at removing program statements that are redundant due to recomputing previously produced values. Such transformations improve the program performance and decrease its size, so redundancy elimination is employed by all optimizing compilers. Most of the algorithms used in production compilers perform syntactic code motion [1] in the sense that they find and remove redundancy between lexically identical expressions. This paper presents an improvement over the well-known SSAPRE algorithm that was previously limited to syntactic code motion. The proposed modification increases the algorithm strength by capturing the redundancy carried through assignment statements, including code patterns that cannot be optimized by any existing CM technique. The presented algorithm can be thought of as semantic code motion.
Efficient program execution on multiprocessor computers requires both sufficient parallelism and good data locality. Recent research found that, using a combination of loop shifting, loop fusion, and array contraction...
详细信息
ISBN:
(纸本)0769521975
Efficient program execution on multiprocessor computers requires both sufficient parallelism and good data locality. Recent research found that, using a combination of loop shifting, loop fusion, and array contraction, one can reduce the memory required to execute a sequence of serial loops, thereby to improve the cache locality. This paper studies how to extend such a memory-reduction scheme to a sequence of DOALL loops, which are executed in parallel on multiprocessors. Two methods are proposed to overcome difficulties caused by loop-carried dependences. data copy-in is performed to remove anti-dependences between different parallel threads, and computation duplication is performed to remove flow dependences. Experiments performed on a number of benchmark programs show that the proposed technique improves both cache locality and parallel execution speed for the DOALL loops. The scheme achieves an average speedup of 1.41 for 17 programs on a 4-processor SUN machine.
In this paper, we propose an efficient variable-length FFT processor architecture suitable for multi-mode and multi-standard OFDM communication systems. The FFT processor is based on radix-2/sup 2/ DIF FFT algorithm a...
详细信息
ISBN:
(纸本)078038251X
In this paper, we propose an efficient variable-length FFT processor architecture suitable for multi-mode and multi-standard OFDM communication systems. The FFT processor is based on radix-2/sup 2/ DIF FFT algorithm and also supports non-power-of-4 FFT computation. The design contains an efficient processing element (PE), which can execute radix-2/sup 2/ butterfly (BF) operations, as well as radix-2 BF operations. Moreover, in order to achieve high-performance variable-length FFT operations and data accesses, an efficient variable-length address generator and twiddle factor generator are designed. The design has the merits of low complexity and high speed performance. The designs consider seven different FFT lengths including 64, 256, 512, 1024, 2048, 4096, and 8192 points, which cover all the required FFT lengths by 802.11a, 802.16a, DAB, DVB-T, VDSL and ADSL.
Whether context-sensitive program analysis is more effective than context-insensitive analysis is an ongoing discussion. There is evidence that context-sensitivity matters in complex analyses like pointer analysis or ...
详细信息
Whether context-sensitive program analysis is more effective than context-insensitive analysis is an ongoing discussion. There is evidence that context-sensitivity matters in complex analyses like pointer analysis or program slicing. One might think that the context itself matters, because empirical data shows that context-sensitive program slicing is more precise and under some circumstances even faster than context-insensitive program slicing. Based on some experiments, we will show that this is not the case. The experiment requires backward slices to return to call sites specified by an abstract call stack. Such call stacks can be seen as a poor man's dynamic slicing: for a concrete execution, the call stack is captured, and static slices are restricted to the captured stack. The experiment shows that there is no significant increase in precision of the restricted form of slicing compared to the unrestricted traditional slicing. The reason is that a large part of an average slice is due to called procedures
This paper presents a new approach to design an adaptive multimode neuro fuzzy chip (AMNFC) with on-chip learning and highly efficient resource utilization capabilities for a car-backing system. The design process is ...
详细信息
ISBN:
(纸本)0780385675
This paper presents a new approach to design an adaptive multimode neuro fuzzy chip (AMNFC) with on-chip learning and highly efficient resource utilization capabilities for a car-backing system. The design process is performed by a high-level datapath synthesis that is based on an optimal scheduling and a resource allocation algorithm. A novel dataflow graph (DFG) scheduling algorithm suitable for parallel structure computation has been developed for designing a neuro-fuzzy chip. The proposed algorithm fulfills two major objectives. First, it simultaneously optimizes both the schedule and allocation of functional units, registers, and multiplexers with respect to a minimal cost of the hardware resources and the total time of execution. Second, it implements an adaptive multimode neural-fuzzy system with reconfiguration capability. Computer simulations and experimental results have successfully validated the effectiveness of the proposed design approach for a car-backing system.
A graph G = (V, E) is said to be pancyclic if it contains cycles of all lengths from 4 to |V| in G. Let F/sub e/ be the set of faulty edges. In this paper, we show that an n-dimensional Mobius cube, n /spl ges/ 1, con...
详细信息
ISBN:
(纸本)0769521355
A graph G = (V, E) is said to be pancyclic if it contains cycles of all lengths from 4 to |V| in G. Let F/sub e/ be the set of faulty edges. In this paper, we show that an n-dimensional Mobius cube, n /spl ges/ 1, contains a fault-free Hamiltonian path when |F/sub e/| /spl les/ n-1. We also show that an n-dimensional Mobius cube, n /spl ges/ 2, is pancyclic when |F/sub e/| /spl les/ n-2. Since an n-dimensional Mobius cube is regular of degree n, both results are optimal in the worst case.
This paper deals with the computation of a single super-resolution image from a set of low-resolution images, where the motion fields are not constrained to be parametric. In our approach, the inversion process, in wh...
详细信息
This paper deals with the computation of a single super-resolution image from a set of low-resolution images, where the motion fields are not constrained to be parametric. In our approach, the inversion process, in which the super-resolved image is inferred from the input data, is interleaved with the computation of a set of dense optical flow fields. The case of arbitrary motion presents several significant challenges. First of all, the super-resolution setting dictates that the optic flow computations must be very precise. Furthermore, we have to consider the possibility that certain parts of the scene, which are visible in the super-resolved image, are occluded in some of the input images. Such occlusions must be identified and dealt with in the restoration process. We propose a Bayesian approach to tackle these problems. In this framework, the input images are regarded as sub-sampled and noisy versions of the unknown high-quality image. Also, the input data is considered incomplete, in the sense that we do not know which pixels from the evolving super-resolution image are occluded in particular images from the input set. This will be modeled by introducing so-called visibility maps, which are treated as hidden variables. We describe an EM-algorithm, which iterates between estimating values for the hidden quantities, and optimizing the flow-fields and the super-resolution image. The approach is illustrated with a synthetic and a challenging real-world example.
The program dependence graph (PDG) itself and the computed slices within the program dependence graph are results that should be presented to the user in a comprehensible form, if not used in subsequent analyses. A gr...
详细信息
The program dependence graph (PDG) itself and the computed slices within the program dependence graph are results that should be presented to the user in a comprehensible form, if not used in subsequent analyses. A graphical presentation would be preferred as it is usually more intuitive than textual ones. This work describes how a layout for the PDGs can be generated to enable an appealing presentation. However, experience shows that the graphical presentation is less helpful than expected and a textual presentation is superior. Therefore, this work contains an approach to textually present slices of PDGs in source code. The innovation of this approach is the fine-grained visualization of arbitrary node sets based on tokens and not on complete lines like in other approaches. Furthermore, a major obstacle in visualization and comprehension of slices is the loss of locality. Thus, this work presents a simple, yet effective, approach to limit the range of a slice. This approach enables a visualization of slices where the local effects stand out against the more global effects. A second, more sophisticated approach visualizes the influence range of chops for variables and procedures. This enables a visualization of the impact of procedures and variables on the complete system.
暂无评论