We present a high performance and memory efficient hardware implementation of matrix multiplication for dense matrices of any size on the FPGA devices. By applying a series of transformations and optimizations on the ...
详细信息
We present a high performance and memory efficient hardware implementation of matrix multiplication for dense matrices of any size on the FPGA devices. By applying a series of transformations and optimizations on the original serial algorithm, we can obtain an I/O and memory optimized block algorithm for matrix multiplication on FPGAs. A linear array of processing elements (PEs) is proposed to implement this block algorithm. We show significant reduction in hardware resources consuming compared to the related work while increasing clock frequency. Moreover, the memory requirement can be reduced to O(S) from O(S 2 ), where S is the block size. Therefore, more PEs can be integrated into the same FPGA devices.
To efficiently perform large matrix LU decomposition on FPGAs with limited local memory, the original algorithm needs to be blocked. In this paper, we propose a block LU decomposition algorithm for FPGAs, which is app...
详细信息
To efficiently perform large matrix LU decomposition on FPGAs with limited local memory, the original algorithm needs to be blocked. In this paper, we propose a block LU decomposition algorithm for FPGAs, which is applicable for matrices of arbitrary size. We introduce a high performance hardware design, which mainly consists of a linear array of processing elements (PEs), to implement our block LU decomposition algorithm. A total of 36 PEs can be integrated into a Xilinx Virtex-5 xc5vlx330 FPGA on our self-designed PCI-Express card, reaching a sustained performance of 8.50 GFLOPS at 133 MHz, which outperforms previous work.
Existing routing protocols for Wireless Mesh Networks (WMNs) are generally optimized with statistical link measures, while not addressing on the intrinsic uncertainty of wireless links. We show evidence that, with the...
详细信息
Existing routing protocols for Wireless Mesh Networks (WMNs) are generally optimized with statistical link measures, while not addressing on the intrinsic uncertainty of wireless links. We show evidence that, with the transient link uncertainties at PHY and MAC layers, a pseudo-deterministic routing protocol that relies on average or historic statistics can hardly explore the full potentials of a multi-hop wireless mesh. We study optimal WMN routing using probing-based online anypath forwarding, with explicit consideration of transient link uncertainties. We show the underlying connection between WMN routing and the classic Canadian Traveller Problem (CTP). Inspired by a stochastic recoverable version of CTP (SRCTP), we develop a practical SRCTP-based online routing algorithm under link uncertainties. We study how dynamic next hop selection can be done with low cost, and derive a systematic selection order for minimizing transmission delay. We conduct simulation studies to verify the effectiveness of the SRCTP algorithms under diverse network configurations. In particular, compared to deterministic routing, reduction of end-to-end delay (51.15~73.02%) and improvement on packet delivery ratio (99.76%) are observed.
Single-electronic transistors (SETs) are considered as the attractive candidates for post-CMOS VLSI due to their ultra-small size and low power consumption. Because SETs with single island can not work at room tempera...
详细信息
ISBN:
(纸本)9781424435432
Single-electronic transistors (SETs) are considered as the attractive candidates for post-CMOS VLSI due to their ultra-small size and low power consumption. Because SETs with single island can not work at room temperature normally, more and more researchers begin to make research on the SETs with 1-dimension multi-islands. A new simulation method-nSET, is introduced in this paper. Compared with other methods, nSET can simulate the SET device with 1-dimension multiple islands with high speed and accuracy. Through the comparison, it can be get that nSET is accurate and fast compared with the classical Monte Carlo (MC) simulator, and is very useful for the ASIC design of SET devices.
In this paper, we introduce a generic model to deal with the event matching problem of content-based publish/subscribe systems over structured P2P overlays. In this model, we claim that there are three methods (event-...
详细信息
In this paper, we introduce a generic model to deal with the event matching problem of content-based publish/subscribe systems over structured P2P overlays. In this model, we claim that there are three methods (event-oriented, subscription-oriented and hybrid) to make all the matched pairs (event, subscription) meet in a system. By theoretically analyzing the inherent problem of both event-oriented and subscription-oriented methods, we propose PEM (Popularity-based Event Matching), a variant of hybrid method. PEM can achieve better trade-off between event processing load and subscription storage load of a system. PEM has been verified through both mathematical and simulation-based evaluation.
In large-scale asynchronous distributed virtual environments(DVEs), one of the difficult problems is to deliver the concurrent events in a consistent order at each node. Generally, the previous consistency control app...
详细信息
Multi-island single electron transistor (MISET) is a kind of single electron transistor (SET), which has advantages of the room temperature operating. A novel semi-empirical compact model for MISET is proposed. The ne...
详细信息
Multi-island single electron transistor (MISET) is a kind of single electron transistor (SET), which has advantages of the room temperature operating. A novel semi-empirical compact model for MISET is proposed. The new approach combines the orthodox theory of single electron tunneling for single Coulomb island and a novel empirical analysis for a chain of Coulomb islands. The model is verified by the Monte-Carlo method in SIMON simulator, and is much faster than the traditional multi-island SET simulator, which has the advantages for the large scale multi-island SET circuit simulation.
As the wide application of multi-core processor architecture in the domain of high performance computing, fault tolerance for shared memory parallel programs becomes a hot spot of research. For years, checkpointing ha...
详细信息
ISBN:
(纸本)9781424459421
As the wide application of multi-core processor architecture in the domain of high performance computing, fault tolerance for shared memory parallel programs becomes a hot spot of research. For years, checkpointing has been the dominant fault tolerance technology in this field, and recently, many research works have been engaged with it. However, to those programs which deal with large amount of data, checkpointing may induce massive I/O transfer, which will adversely affect scalability. To deal with such a problem, this paper proposes a fault tolerance approach, making use of redundancy, for shared memory parallel programs. Our scheme avoids saving and restoring computational state during the program's execution, hence does not involve I/O operations, so presents explicit advantage over checkpointing in scalability. In this paper, we introduce our approach and the related compiler tool in detail, and give the experimental evaluation result.
Reputation systems provide a promising way to build trust relationships between users in distributed cooperation systems, such as file sharing, streaming, distributed computing and social network, through which a user...
详细信息
Reputation systems provide a promising way to build trust relationships between users in distributed cooperation systems, such as file sharing, streaming, distributed computing and social network, through which a user can distinguish good services or users from malicious ones and cooperate with them. However, most reputation models mainly focus on evaluating the qualities of different services in one dimension, but care less about the preferences of different users. This paper proposes a personalized reputation model which provides each user a personalized trust view on others according to his preference. In our approach, we aggregate the users' preferences with collaborative filtering method and qualify it with user similarity which is integrated into the computing of reputation values. The experimental results suggest that our model can resist possible kinds of malicious behaviors efficiently.
暂无评论