Reconfigurable computing tries to achieve the balance between high efficiency of custom computing and flexibility of general-purpose computing. This paper presents the implementation techniques in LEAP, a coarse-grain...
详细信息
Reconfigurable computing tries to achieve the balance between high efficiency of custom computing and flexibility of general-purpose computing. This paper presents the implementation techniques in LEAP, a coarse-grained reconfigurable array, and proposes a speculative execution mechanism for dynamic loop scheduling with the goal of one iteration per cycle and implementation techniques to support decoupling synchronization between the token generator and the collector. This paper also in- troduces the techniques of exploiting both data dependences of intra- and inter-iteration, with the help of two instructions for special data reuses in the loop-carried dependences. The experimental results show that the number of memory accesses reaches on average 3% of an RISC processor simulator with no memory optimization. In a practical image matching application, LEAP architecture achieves about 34 times of speedup in execution cycles, compared with general-purpose processors.
Many proposed P2P networks are based on traditional interconnection topologies. Given a static topology, the maintenance mechanism for node join/departure is critical to designing an efficient P2P network. Kautz graph...
详细信息
Many proposed P2P networks are based on traditional interconnection topologies. Given a static topology, the maintenance mechanism for node join/departure is critical to designing an efficient P2P network. Kautz graphs have many good properties such as constant degree, low congestion and optimal diameter. Due to the complexity in topology maintenance, however, to date there have been no effective P2P networks that are proposed based on Kautz graphs with base ~ 2. To address this problem, this paper presents the "distributed Kautz (D-Kautz) graphs", which adapt Kautz graphs to the characteristics of P2P networks. Using the D-Kautz graphs we further propose SKY, the first effective P2P network based on Kautz graphs with arbitrary base. The effectiveness of SKY is demonstrated through analysis and simulations.
The Internet-based Virtual Computing Environment (iVCE) provides on-demand aggregation and autonomic collaboration mechanisms to facilitate the utilization of autonomous and dynamic Internet resources. Load balancing ...
详细信息
Predicting network latencies between Internet hosts can efficiently support large-scale Internet applications, e.g., file sharing service and the overlay construction. Several study use the Hyperbolic space to model t...
详细信息
Proximity ranking according to end-to-end network distances (e.g., Round-Trip Time, RTT) can reveal detailed proximity information, which is important in network management and performance diagnosis in distributed sys...
详细信息
As an infrastructure for data distribution, overlay networks have to feature efficient routing and adequate robustness to achieve fast and accurate data distribution in the environment with node churn. Considering tha...
详细信息
Jamming style Denial-of-Service attack is the transmission of radio signals that disrupt communications by decreasing the signal to noise ratio. This kind of attack can be easily launched by jammer through either bypa...
详细信息
In the field of RNA secondary structure prediction, the CYK (Coche-Younger-Kasami) algorithm is a most popular methods using SCFG (stochastic context-free grammars) model. However, general purpose parallel computers i...
详细信息
ISBN:
(纸本)9781605586267
In the field of RNA secondary structure prediction, the CYK (Coche-Younger-Kasami) algorithm is a most popular methods using SCFG (stochastic context-free grammars) model. However, general purpose parallel computers including SMP multiprocessors or cluster systems exhibit low parallel efficiency and they are too expensive to be used easily for many research institutes. FPGA chips provide a new approach to accelerate the CYK algorithm by exploiting fine-grained custom design. The CYK algorithm shows complicated data dependence, in which the dependence distance is variable, and the dependence direction is also across two dimensions. We propose a systolic array structure including one master PE and multiple slave PEs for fine grain hardware implementation on FPGA. We partition tasks by columns and assign tasks to PEs for load balance. We exploit data reuse schemes to reduce the need to load matrix from external memory. To our knowledge, our implementation with 16 PEs is the only FPGA accelerator implementing the complete CYK/inside algorithm. The experimental results show a factor of more than 14 speedup over the Infernal-0.55 software running on a PC platform with Pentium 4 2.66GHz CPU. The computational power of our platform with FPGA accelerator is comparable to a PC cluster consisting of 20 Intel-Xeon CPUs for RNA secondary structure prediction using SCFGs, but the hardware cost and power consumption is only about 15% and 10% of the latter respectively. Copyright 2009 ACM.
Consistency and responsiveness are two important factors in providing the sense of reality in distributed Virtual Environment (DVE). However, it is not easy to optimize both aspects because of the trade-off between th...
详细信息
With the advancement of peer-to-peer technology, media streaming applications become more and more popular in the Internet. However, the traditional development methods for this kind of applications need developers no...
详细信息
暂无评论