Previous work (J.S. McCaskill et al., 1996; 1997) has shown the power of massively parallel configurable hardware (NGEN) in conjunction with dataflow architectures for the simulation of evolving populations. NGEN is a...
详细信息
Previous work (J.S. McCaskill et al., 1996; 1997) has shown the power of massively parallel configurable hardware (NGEN) in conjunction with dataflow architectures for the simulation of evolving populations. NGEN is a flexible computer hardware for rapid custom circuit simulation of fine grained physical processes via a massively parallel architecture, e.g. 144 hardware configurable field programmable gate arrays (FPGAs, XC4008, Xilinx). NGEN is optimized to implement dataflow architectures and systolic algorithms for large problems and is confectioned withhigh speed distributed SRAM, 144*8*256 kBit, 15ns access time, on the chip to chip interconnect. Microconfigurable FPGAs allow a further step to close the gap between micro electronics and biology on the information processing area. A design for a massively parallel microconfigurable computer (POLYP) is presented. It is designed to allow online evolution in hardware with significant locally controllable memory resources. It is also designed for highthroughput dataflow applications with large problem size. Additionally, an evolvable interface between high rate measurement devices is provided to allow adaptive processing coupled with real time experimental environments. the computer represents the next logical step towards evolvable hardware interacting with biology beyond the massively parallel computer NGEN.
the square root operation is hard to implement on FPGAs because of the complexity of the algorithms. In this paper, we present a non-restoring square root algorithm and two very simple single precision floating point ...
详细信息
ISBN:
(纸本)0818681594
the square root operation is hard to implement on FPGAs because of the complexity of the algorithms. In this paper, we present a non-restoring square root algorithm and two very simple single precision floating point square root implementations based on the algorithm on FPGAs. One is low-cost iterative implementation that uses a traditional adder/subtracter. the operation latency is 25 clock cycles and the issue rate is 24 clock cycles. the other is high-throughput pipelined implementation that uses multiple adder/subtracters. the operation latency is 15 clock cycles and the issue rate is one clock cycle. It means that the pipelined implementation is capable of accepting a square root instruction on every clock cycle.
By strictly separating reconfigurable logic from their host processor, current custom computing systems suffer from a significant communication bottleneck. In this paper we describe Chimaera, a system that overcomes t...
详细信息
By strictly separating reconfigurable logic from their host processor, current custom computing systems suffer from a significant communication bottleneck. In this paper we describe Chimaera, a system that overcomes this bottleneck by integrating reconfigurable logic into the host processor itself with direct access to the host processor's register file, the system enables the creation of multi-operand instruction and a speculative execution model key to highperformance, general-purpose reconfigurable computing. It also supports multi-output functions, and utilizes partial run-time reconfiguration to reduce reconfiguration time. Combined, this system can provide speedups of a factor of two or more for general-purpose computing, and speedups of 160 or more are possible for hand-mapped applications.
this paper presents a new static logic implication algorithm. An improved implication procedure that fully takes advantage of the special context of static implication, the iterative method, and set algebra is describ...
详细信息
this paper presents a new static logic implication algorithm. An improved implication procedure that fully takes advantage of the special context of static implication, the iterative method, and set algebra is described. the algorithm discovers at low cost many indirect implications which are not discovered by dynamic learning without tremendous time cost. the experimental results show that a very large number of indirect implications are found by our algorithm. the static implication procedure has many useful applications, one of which is static redundancy identification. Use of the static implications obtained from the algorithm in static redundancy identification for ISCAS85 combinational circuits resulted in a larger number of redundant faults identified than in previous methods.
We propose three synchronous parallel algorithms for scalable parallel test set partitioned fault simulation. the algorithms are based on a new two-stage approach to parallelizing fault simulation for sequential VLSI ...
详细信息
We propose three synchronous parallel algorithms for scalable parallel test set partitioned fault simulation. the algorithms are based on a new two-stage approach to parallelizing fault simulation for sequential VLSI circuits in which the test set is partitioned among the available processors, the test set partitioning inherent in the algorithms overcomes the good circuit logic simulation bottleneck that exists in traditional fault partitioned approaches to parallel fault simulation. the implementations were done on a shared memory multiprocessor and on a network of workstations. Two of the algorithms show a small degree of pessimism in a few cases, with respect to the fault coverage as compared with a uniprocessor run, while the third algorithm provides the same results as in a uniprocessor run. All algorithms provide excellent speedups and perform much better than a traditional fault partitioned approach, on both shared and distributed memory parallel platforms.
We present the Data Parallel Fortran (DPF) benchmark suite, a set of data parallel Fortran codes for evaluating data parallel compilers appropriate for any target parallel architecture, with shared or distributed memo...
详细信息
We present the Data Parallel Fortran (DPF) benchmark suite, a set of data parallel Fortran codes for evaluating data parallel compilers appropriate for any target parallel architecture, with shared or distributed memory. the codes are provided in basic, optimized and several library versions. the functionality of the benchmarks cover collective communication functions, scientific software library functions, and application kernels that reflect the computational structure and communication patterns in fluid dynamic simulations, fundamental physics and molecular studies in chemistry or biology. the DPF benchmark suite assumes the language model of highperformance Fortran, and provides performance evaluation metrics of busy and elapsed times and FLOP rates, FLOP count, memory usage, communication patterns, focal memory access, and arithmetic efficiency as well as operation and communication counts per iteration. An instance of the benchmark suite was fully implemented in CM-Fortran and tested on the CM-5.
this paper introduces a new architecture for a fault-tolerant computer system which connects high-end PCs or workstations by a high-speed network. To achieve platform independence, coupling is based on the widely used...
详细信息
ISBN:
(纸本)0818674822
this paper introduces a new architecture for a fault-tolerant computer system which connects high-end PCs or workstations by a high-speed network. To achieve platform independence, coupling is based on the widely used PCI-bus. In contrast to commercially available fault-tolerant systems we strongly emphasize mechanisms for tolerating transient and intermittent faults. To keep hardware costs low the system is built with off-the-shelf computers and their extensions are kept as small as possible. To reduce the operational costs the system can be dynamically adapted to different demands on fault tolerance on a program-by-program basis. Adaptation is done transparently to the application software by the operating system. We use a commercially available real-time operating system with a POSIX-compliant UNIX-interface. the bandwidth of fault tolerance reaches from a non-redundant system of stand-alone computers, a master/checker configuration to a TMR-system. the high-performance network allows the system to operate as a parallel multi-computer, too.
We discuss here the emergent Web based distributed environments for HPCC on the NII withthe focus on Java as an enabling technology. We start with a review of the past, presence and the near term future of the 'J...
详细信息
ISBN:
(纸本)0818675829
We discuss here the emergent Web based distributed environments for HPCC on the NII withthe focus on Java as an enabling technology. We start with a review of the past, presence and the near term future of the 'Java phenomenon', exposed here in the background of some related previous approaches towards a distributed interpretative virtual machine architecture.
Rapidly expanding cellular communication technology, wireless LANs and satellite services have made it possible for mobile users to access information anywhere and at any time. In a mobile computing environment replic...
详细信息
ISBN:
(纸本)0818674822
Rapidly expanding cellular communication technology, wireless LANs and satellite services have made it possible for mobile users to access information anywhere and at any time. In a mobile computing environment replication might be considered as an essential technique providing reliability, throughput increase and data availability. this paper addresses the replica control protocols with an emphasis on workstation mobility issues. the modifications that have to be made to the primary copy method for replicated database management strategies in order to address the effect of mobility on the existing replica control protocols are analysed and proposed. A variation of the primary copy algorithm, called virtual primary copy method is proposed and it is shown that this method is well suited for the distributed mobile computing environment. the performance of virtual primary copy method comparative to traditional primary copy method using computer simulation is analysed.
Interactive home video entertainment is an actively developing application of technology. Major bottlenecks in this architecture are the limited number of broadcast channels and/or the number of movies that the server...
详细信息
ISBN:
(纸本)9780897918008
Interactive home video entertainment is an actively developing application of technology. Major bottlenecks in this architecture are the limited number of broadcast channels and/or the number of movies that the server can transmit concurrently. this paper investigates the on-line video on demand problem, namely having to accept or reject a request for a movie without the future requests. the study uses the randomized scheduling method, that follow the principle of refusal by choice with delayed notification.
暂无评论