High-level design tools are needed to reduce the design time of complex chips consisting of millions of transistors. Such complex chips use a lot of fine grain parallelism to get high performance. With the rapidly adv...
详细信息
ISBN:
(纸本)088986392X
High-level design tools are needed to reduce the design time of complex chips consisting of millions of transistors. Such complex chips use a lot of fine grain parallelism to get high performance. With the rapidly advanced technology and the greatly increased integration density and clock frequency, power consumption is becoming more and more important. The PACT HDL compiler solves two problems: (1) it allows users to develop algorithms in a high level language, namely C, and synthesize hardware designs onto FPGAs while exploiting fine grain parallelism;(2) it explicitly addresses low power issues during the high-level synthesis stages. This paper presents an approach to optimize power while exploiting fine grain parallelism on FPGAs. We describe a high-level power optimization algorithm called ETAIP which is used in PACT HDL compiler with IP core binding methodology. By introducing the IP core library, both the flexibility and robustness of the design are improved. The ETAIP algorithm is capable of handling multi-cycle operators, and multi-cycle memory read and write operations. Experimental results are reported for the optimization algorithm on a benchmark suite of four signal and image processing kernels that are mapped onto the Xilinx XCV400 Virtex FPGA.
RHiNET is a network which has advantages of both Local Area Networks (LANs) and System Area Networks (SANs). In this paper, preliminary performance evaluation of RHiNET-2 on a small prototype system is described. We u...
详细信息
ISBN:
(纸本)0889863415
RHiNET is a network which has advantages of both Local Area Networks (LANs) and System Area Networks (SANs). In this paper, preliminary performance evaluation of RHiNET-2 on a small prototype system is described. We use two benchmark applications, IS from the NAS parallel Benchmarks and LU from the SPLASH-2 Benchmark Suite, on the SCore Cluster System Software. The performance is improved 3.12 times in IS and 3.64 times in LU on a system with four nodes.
The approach described in this paper addresses fast prototyping of complex media processing systems. The method provides a high-level means to emulate the streaming subsystems of complex consumer electronics systems. ...
详细信息
ISBN:
(纸本)0889863415
The approach described in this paper addresses fast prototyping of complex media processing systems. The method provides a high-level means to emulate the streaming subsystems of complex consumer electronics systems. This includes connection management for high-throughput signal processing elements, data buffering and routing. Moreover, it offers a high-level abstraction for the configuration and control of such streaming subsystems. As a result, the essential characteristics of stream processing can be modeled and analyzed, while at the same time, complex middleware and application software can be developed and tested, which will be independent of the underlying streaming technology. Prototyping is sped up by using standard implementation technology such as PCs with off-the-shelf PCI cards, Ethernet and TCP/IP networking. All low-level media processing components have adequate software interfaces that can remain the same when implemented in real embedded products.
As information technology requirements for high availability and uptime become more important, it is of paramount importance to architect infrastructures and topologies that can comply. Infrastructure downtime results...
详细信息
ISBN:
(纸本)0889863415
As information technology requirements for high availability and uptime become more important, it is of paramount importance to architect infrastructures and topologies that can comply. Infrastructure downtime results in application unavailability, frustration and financial loss. In order to address the demand for highly reliable complex computer systems, a new concept of infrastructure architecture needs to be put in place. In this paper, we propose a model that can support any type of applications in any configuration such as client-server, distributed, parallel processing, peer-to-peer, any middleware and protocols. Our infrastructure can self-configure, self-optimise, self-protect, self-manage and self-heal. The building methodology involves the employment of new types of Unix-based clustered systems using large application/middleware groupings, each having a master cluster controller. Each controlling engine consists of self-healing intelligent entities that can adapt and compensate for a variety of software or hardware problems. It has access to a pool of hardware and software resources to allocate on demand, automatically and transparently and mostly without service interruption. This design forces the infrastructure to make full use of its capacity and be highly fault-tolerant. We also present evaluation results from our pilot implementation that has been working for more than a year and a half within the production environment of a mobile phone/internet service provider.
We propose a new approach, called cluster-based search (CBS), for scheduling large task graphs in parallel on a heterogeneous cluster of workstations connected by a high-speed network (e.g., using an ATM switch at OC-...
详细信息
We propose a new approach, called cluster-based search (CBS), for scheduling large task graphs in parallel on a heterogeneous cluster of workstations connected by a high-speed network (e.g., using an ATM switch at OC-3 speed). The CBS algorithm uses a parallel random neighborhood search which works by refining multiple different initial schedules simultaneously using different workstations. The workstations communicate periodically to exchange their best solutions found thus far in order to direct the search to more promising regions in the search space. Heterogeneity of machines is exploited by the biased partitioning of the search space. The parallel random neighborhood search is fault-tolerant in that the workload of a failed workstation is automatically redistributed to other workstations so that the search can continue. We have implemented the CBS algorithm as a core function of our on-going development of SSI middleware for a Sun workstation cluster.
Recently self-simulation algorithms have been developed to execute algorithms on a reconfigurable mesh (RM) of size smaller than recommended in those algorithms. Optimal slowdown, in self-simulation, has been achieved...
详细信息
Recently self-simulation algorithms have been developed to execute algorithms on a reconfigurable mesh (RM) of size smaller than recommended in those algorithms. Optimal slowdown, in self-simulation, has been achieved with the compromise that the resultant algorithms fail to remain AT(2) optimal. In this paper, we introduce, for the first time, the idea of adaptive algorithm which runs on RM of variable sizes without compromising the AT(2) optimality. We support our idea by developing adaptive algorithms for sorting items and computing the contour of maximal elements of a set of planar points on RM. (C) 2000 Published by Elsevier Science B.V. All rights reserved.
暂无评论