Multiply and accumulate are the two basic operations for FFT and digital filtering algorithms. In high-speed applications, the multiplier is crucial to the performance. The multiplier requires either large chip area i...
详细信息
Multiply and accumulate are the two basic operations for FFT and digital filtering algorithms. In high-speed applications, the multiplier is crucial to the performance. The multiplier requires either large chip area if parallel implementation is used or large amount of time if serial architecture is used. In this paper, the design of basic FFT arithmetic element and FIR filters using barrel shifters and accumulators (BSAC) to perform the multiplications is proposed and studied. The resulting architecture is completely programmable and allows the use of variable number of basic cells for each coefficient. The throughput rate of such an architecture is determined only by the delay in a single cell and hence can be of the order of 100 MHz or higher.
In this paper we describe the approach taken by the ASKALON Grid application development and computing environment for scalability and overhead analysis of scientific applications in the Austrian Grid. We present a te...
详细信息
In this paper we describe the approach taken by the ASKALON Grid application development and computing environment for scalability and overhead analysis of scientific applications in the Austrian Grid. We present a technique imported from parallel processing for overhead and scalability analysis of Grid applications based on speedup and efficiency metrics required primarily for tuning the middleware services and improving the executions. We present experimental results that validate our techniques for five real-world applications in the Austrian Grid environment.
We present a dynamic programming technique for solving the multiple supply voltage scheduling problem in both non-pipelined and functionally pipelined data-paths. The scheduling problem refers to the assignment of a s...
详细信息
We present a dynamic programming technique for solving the multiple supply voltage scheduling problem in both non-pipelined and functionally pipelined data-paths. The scheduling problem refers to the assignment of a supply voltage level to each operation in a dataflow graph so as to minimize the average energy consumption for given computation time or throughput constraints or both. The energy model is accurate and accounts for the input pattern dependencies, re-convergent fanout induced dependencies, and the energy cost of level shifters. Experimental results show that using four supply voltage levels on a number of standard benchmarks, an average energy saving of 53% (with a computation time constraint of 1.5 times the critical path delay) can be obtained compared to using one fixed supply voltage level.
Optical flow computation has been extensively used for object motion estimation in image sequences. However, the results obtained by most optical flow techniques are as accurate as computationally intensive due to the...
详细信息
Optical flow computation has been extensively used for object motion estimation in image sequences. However, the results obtained by most optical flow techniques are as accurate as computationally intensive due to the large amount of data involved. A new strategy for image sequence processing has been developed; pixels of the image sequence that significantly change fire the execution of the operations related to the image processing algorithm. The data reduction achieved with this strategy allows a significant optical flow computation speed up. Furthermore, FPGAs allow the implementation of a custom data-flow architecture specially suited for this strategy. The bases of the change-driven image processing are presented, as well as the hardware custom implementation
In high performance processors, the design of on-chip memory hierarchies is crucial for performance and energy efficiency. Current processors rely on large shared Non-Uniform Cache Architectures (NUCA) to improve perf...
详细信息
In high performance processors, the design of on-chip memory hierarchies is crucial for performance and energy efficiency. Current processors rely on large shared Non-Uniform Cache Architectures (NUCA) to improve performance and reduce data movement. Multiple solutions exploit information available at the microarchitecture level or in the operating system to optimize NUCA performance. However, existing methods have not taken advantage of the information captured by task dataflow programming models to guide the management of NUCA *** this paper we propose TD-NUCA, a hardware/software co-designed approach that leverages information present in the runtime system of task dataflow programming models to efficiently manage NUCA caches. TD-NUCA identifies the data access and reuse patterns of parallel applications in the runtime system and guides the operation of the NUCA caches in the hardware. As a result, TD-NUCA achieves a 1.18x average speedup over the baseline S-NUCA while requiring only 0.62x the data movement.
Program slicing is a viable method to restrict the focus of a task to specific sub-components of a program. Examples of applications include debugging, testing, program comprehension, restructuring, downsizing, and pa...
详细信息
Program slicing is a viable method to restrict the focus of a task to specific sub-components of a program. Examples of applications include debugging, testing, program comprehension, restructuring, downsizing, and parallelization. The paper discusses different statement deletion based slicing methods, together with algorithms and applications to software engineering.
FPGA devices have often found use as higher-performance alternatives to programmable processors for implementing a variety of computations. Applications successfully implemented on FPGAs have typically contained high ...
详细信息
FPGA devices have often found use as higher-performance alternatives to programmable processors for implementing a variety of computations. Applications successfully implemented on FPGAs have typically contained high levels of parallelism and have often used simple statically-scheduled control and modest arithmetic. Recently introduced computing devices such as coarse grain reconfigurable arrays, multi-core processors, and graphical processing units (GPUs) promise to significantly change the computational landscape for the implementation of high-speed real-time computing tasks. One reason for this is that these architectures take advantage of many of the same application characteristics that fit well on FPGAs. One real-time computing task, optical flow, is difficult to apply in robotic vision applications in practice because of its high computational and data rate requirements, and so is a good candidate for implementation on FPGAs and other custom computing architectures. In this paper, a tensor-based optical flow algorithm is implemented on both an FPGA and a GPU and the two implementations discussed. The two implementations had similar performance, but with the FPGA implementation requiring 12× more development time. Other comparison data for these two technologies is then given for three additional applications taken from a MIMO digital communication system design, providing additional examples of the relative capabilities of these two technologies.
Agent-oriented software engineering (AOSE) has become an active area of research in recent years. We look at the use of agent-oriented concepts for software analysis. Using agent-oriented analysis may offer benefits e...
详细信息
Agent-oriented software engineering (AOSE) has become an active area of research in recent years. We look at the use of agent-oriented concepts for software analysis. Using agent-oriented analysis may offer benefits even if the system is implemented without an agent-based language or framework (e.g. using an object-oriented detailed design and language). We examine the software analysis components of a number of existing agent-oriented methodologies. We discuss the benefits that can be gained by using agent-oriented concepts, and where the concepts require further development. Based on this analysis, we present the agent-oriented methodology that we are developing, and describe an example of how it may be applied for software analysis.
An overview is presented of a model for describing data and control flow associated with the execution of large-grained, decision-free algorithms in a special distributed computer environment. The ATAMM (Algorithm-To-...
详细信息
An overview is presented of a model for describing data and control flow associated with the execution of large-grained, decision-free algorithms in a special distributed computer environment. The ATAMM (Algorithm-To-Architecture Mapping Model) model provides a basis for relating an algorithm to its execution in a dataflow multicomputer environment. The ATAMM model features a marked graph Petri net description of the algorithm behavior with regard to both data and control flow. The model provides an analytical basis for calculating performance bounds on throughput characteristics which are demonstrated here.< >
A cooperative integration of stereopsis and optic flow computation is presented. Central to our approach is the modelling of the visual processes as a sequence of coupled Markov random fields by defining suitable inte...
详细信息
A cooperative integration of stereopsis and optic flow computation is presented. Central to our approach is the modelling of the visual processes as a sequence of coupled Markov random fields by defining suitable interprocess interactions based on some natural constraints. The integration makes each of the individual processes better constrained and more reliable. Further, as a result of the integration, it becomes possible to accurately preserve the discontinuities in both the flow and the disparity fields along with the regions of stereo occlusion. Some results, both on noisy synthetic image data and real images are presented.
暂无评论