In high-performance systems execution time is of crucial importance justifying advanced optimization techniques. Traditionally, optimization is based on static program analysis. The quality of program optimizations, h...
详细信息
ISBN:
(纸本)0769509878
In high-performance systems execution time is of crucial importance justifying advanced optimization techniques. Traditionally, optimization is based on static program analysis. The quality of program optimizations, however, can be substantially improved by utilizing runtime information. Probabilistic data-flow frameworks compute the probability with what data-flow facts may hold at some program point based on representative profile runs. Advanced optimizations can use this information in order to produce highly efficient code. In this paper we introduce a novel optimization technique in the context of High Performance Fortran (HPF) that is based on probabilistic data-flow information. We consider statically undefined attributes which play an important role for parallelization and compute for those attributes the probabilities to hold some specific value during runtime. For the most probable attribute values highly-optimized, specialized code is generated. In this way significantly better performance results can be achieved. The implementation of our optimization is done in the context of VFC, a source-to-source parallelizing compiler for HPF/F90.
Array intensive computations are characterized by processing of large arrays stored in external memory in multiple loops. Synthesizing these computations onto FPGAs involves automatic translation of the behavioral des...
详细信息
ISBN:
(纸本)0769508316
Array intensive computations are characterized by processing of large arrays stored in external memory in multiple loops. Synthesizing these computations onto FPGAs involves automatic translation of the behavioral description into state machines controlled by a clock such that the execution time of the program as a whole is the minimum and area requirement does not exceed a predefined limit. The synthesis algorithm also needs to efficiently sequence the array, accesses taking into account memory access requirements such as pipelining. In this paper we present two algorithms each with a specific emphasis to handle this synthesis problem. Our heuristic algorithm generates good solutions in a very short time (less than a second), while our mixed integer linear programming (MILP) based algorithm can generate optimal solution given sufficient time. Both try to minimize execution time and area. Our algorithms not only look at individual loops to exploit parallelism but also consider them together while deciding the clock. The overall execution time is minimized and not just the number of cycles or the cycle time. They also efficiently synthesize memory accesses to fully exploit the memory pipelining. We compare these two algorithms in terms of their relative strengths.
A parallel version of the evolutionary graph generation (EGG) system, called the distributed EGG (DEGG) system, was developed on a cluster of PCs using a message-passing interface (MPI). To demonstrate the capability ...
详细信息
A parallel version of the evolutionary graph generation (EGG) system, called the distributed EGG (DEGG) system, was developed on a cluster of PCs using a message-passing interface (MPI). To demonstrate the capability of DEGG, it is applied to seeking the optimal design of various multipliers. Experimental results substantially show that DEGG consistently performs better than the EGG and known conventional designs.
The visualization of scalar functions of two variables is a classic and ubiquitous application. We present a new method to visualize such data. The method is based on a nonlinear mapping of the function to a height fi...
详细信息
ISBN:
(纸本)0780372018
The visualization of scalar functions of two variables is a classic and ubiquitous application. We present a new method to visualize such data. The method is based on a nonlinear mapping of the function to a height field, followed by visualization as a shaded mountain landscape. The method is easy to implement and efficient, and leads to intriguing and insightful images: The visualization is enriched by adding ridges. Three types of applications are discussed: visualization of iso-levels, clusters (multivariate data visualization), and dense contours (flow visualization).
The analytical approaches to dynamic traffic assignment did not consider until now the limited capacity of arcs. In this paper a model and an algorithm are developed for this problem. The method was coded and computat...
详细信息
The analytical approaches to dynamic traffic assignment did not consider until now the limited capacity of arcs. In this paper a model and an algorithm are developed for this problem. The method was coded and computational results were obtained.
We introduce the notions of required precision and information content of datapath signals and use them to define functionally safe transformations on dataflow graphs. These transformations reduce widths of datapath ...
详细信息
ISBN:
(纸本)1581132972
We introduce the notions of required precision and information content of datapath signals and use them to define functionally safe transformations on dataflow graphs. These transformations reduce widths of datapath operators and enhance their mergeability. Using efficient algorithms to compute required precision and information content of signals, we define a new algorithm for partitioning a dataflow graph consisting of datapath operators into mergeable clusters. Experimental results indicate that use of our clustering algorithm for operator merging based synthesis of datapath intensive designs, can lead to significant improvement in the delay and area of the implementation.
Considers reconfigurable computing for application-specific systems, with particular reference to mixed-technology chips. A VLIW "core" is augmented by means of reconfigurable functional units (RFUs) and reg...
详细信息
ISBN:
(纸本)0769512062
Considers reconfigurable computing for application-specific systems, with particular reference to mixed-technology chips. A VLIW "core" is augmented by means of reconfigurable functional units (RFUs) and register files implemented via FPGA on to the same chip. The application is analyzed to extract segments of computation that could be usefully collapsed into complex instructions decoded and executed by the RFUs. In this paper, we focus on the problem of selecting the optimum extension to the native instruction set by means of the "best" segments of the computation that will become complex instructions. In particular, a genetic algorithm approach is introduced to analyze the population of candidates; modifications to the classic genetic operators are introduced to take into account the peculiarity of our problem. Applying the proposed methodology to some significant applications has validated the overall approach.
Many current research and development activities make significant contributions to the quality of some particular implementation approach. We describe the waveform description language, in which the best characteristi...
详细信息
ISBN:
(纸本)0780372255
Many current research and development activities make significant contributions to the quality of some particular implementation approach. We describe the waveform description language, in which the best characteristics of a variety of distinct programming approaches are exploited so that standard implementation domain practices can be applied in the specification domain. A single WDL specification may be refined to support semi-automated conversion to a variety of implementations. A WDL specification avoids the ambiguities and contradictions characteristic of many conventional specifications with an underlying formality that remains accessible and familiar to programmers.
This article presents a methodology to cope with the simultaneous optimization of multiple competing objectives and the different sources of heterogeneity in embedded system design.
This article presents a methodology to cope with the simultaneous optimization of multiple competing objectives and the different sources of heterogeneity in embedded system design.
The NA48 experiment at the CERN SPS aims to measure the parameter R epsilon(epsilon'/epsilon) of direct CP violation in the neutral kaon system with an accuracy of 2 x 10(-4). Based on the requirements of: high ev...
详细信息
The NA48 experiment at the CERN SPS aims to measure the parameter R epsilon(epsilon'/epsilon) of direct CP violation in the neutral kaon system with an accuracy of 2 x 10(-4). Based on the requirements of: high event rates (up to 10 kHz) with negligible dead time support for a variety of detectors with very wide variation in the number of readout channels data rates of up to 150 MByte/s sustained over the beam burst. level-3 filtering and remote data logging in the CERN computer center the collaboration has designed and built a modular pipelined data how system with 40 MHz sampling rate. The architecture combines custom-designed components with commercially available hardware for cost effectiveness and flexibility. To increase the available data bandwidth and to add filtering and monitoring capabilities, the original custom-built event builder hardware has been replaced by a farm of 24 Intel PentiumII based PCs running the Linux operating system during the shutdown between the 1997 and 1998 data taking periods. During the data taking period 1998 the system has been successfully operated taking ca. 70 Terabyte of data.
暂无评论