To efficiently perform large matrix LU decomposition on FPGAs with limited local memory, the original algorithm needs to be blocked. In this paper, we propose a block LU decomposition algorithm for FPGAs, which is app...
详细信息
To efficiently perform large matrix LU decomposition on FPGAs with limited local memory, the original algorithm needs to be blocked. In this paper, we propose a block LU decomposition algorithm for FPGAs, which is applicable for matrices of arbitrary size. We introduce a high performance hardware design, which mainly consists of a linear array of processing elements (PEs), to implement our block LU decomposition algorithm. A total of 36 PEs can be integrated into a Xilinx Virtex-5 xc5vlx330 FPGA on our self-designed PCI-Express card, reaching a sustained performance of 8.50 GFLOPS at 133 MHz, which outperforms previous work.
Ship area networks (SANs) have lately attracted attention in order to guarantee the safety of sea travelers and marine transportation. the wireless sensor network (WSN) is one of the most important components of SAN b...
详细信息
ISBN:
(纸本)9781424468553;9781424468584
Ship area networks (SANs) have lately attracted attention in order to guarantee the safety of sea travelers and marine transportation. the wireless sensor network (WSN) is one of the most important components of SAN because crew members must know any damage or malfunction of parts of ships promptly. the safety systems are mostly real-time applications, thus delay is the most important QoS requirement in SAN. Meanwhile, energy consumption is also a traditionally important metric in WSN. therefore, the goal of this paper is to find the minimum possible transmission power of each sensor node on the condition that it can meet delay constraint. Assuming the uniformly distributed sensor nodes, the proposed method firstly suggests the way to compute the average per-hop advancement with a single transmission. the difference from the actual simulation is less than 1% although it uses only nodal density information. Based on this per-hop advancement, the minimum possible transmission power is calculated, which can guarantee delay QoS for the predetermined ratio of connections in WSNs. the error-prone channel is also considered since packet transmission may frequently fail due to many equipments and ship body made of steel.
In light of its powerful computing capacity and high energy efficiency, GPU (graphics processing unit) has become a focus in the research field of HPC (High Performance computing). CPU-GPU heterogeneous parallel syste...
详细信息
In light of its powerful computing capacity and high energy efficiency, GPU (graphics processing unit) has become a focus in the research field of HPC (High Performance computing). CPU-GPU heterogeneous parallel systems have become a new development trend of super-computer. However, the inherent unreliability of the GPU hardware deteriorates the reliability of super-computer. We have researched on the fault-tolerance(FT) technique for CPU-GPU heterogeneous parallel systems, and introduced a new checkpointing mechanism, i.e., the hierarchical application-level checkpointing, for such systems. the basic idea of this new checkpointing mechanism is checkpointing at two independent levels, i.e., CPU level and GPU level, to tolerate CPU and GPU faults respectively. Based on the idea, we have also designed and implemented a hierarchical application-level checkpointing tool ”HiAL-Ckpt”. Using this tool, programmers can insert two kinds of directives, i.e., CPU directives and GPU directives into a program, and the compiler will transform the directives into CPU or GPU checkpointing codes according to their nature. From the case study of SWIM, a test bench from spec2000 benchmark suite, we have demonstrated the validity of the hierarchical application-level checkpointing technique. the experimental results show that the falut-tolerance temporal cost of HiAL-Ckpt for SWIM is only 2.25%, compared withthe executing time of SWIM without any FT work.
Unsupervised data exploration techniques are used for extracting relational and structural information from massifs of data. In this paper we explore a collection of ACM transactions and IEEE conferences related by th...
详细信息
ISBN:
(纸本)9781424444779
Unsupervised data exploration techniques are used for extracting relational and structural information from massifs of data. In this paper we explore a collection of ACM transactions and IEEE conferences related by the subject of high performance distributedcomputing - with a scientific computing flavor in order to understand how they relate to each other. the interpreted result is proved reasonable and is defined by groups of conferences and transactions which can be scaled by their degree of abstractness and physical realization.
In this paper, we consider the performance of serial distributed detection in wireless sensor networks (WSNs) under the assumption that the local decisions made by sensors are transmitted over noisy channels. Differen...
详细信息
ISBN:
(纸本)9781424436927
In this paper, we consider the performance of serial distributed detection in wireless sensor networks (WSNs) under the assumption that the local decisions made by sensors are transmitted over noisy channels. Different from the parallel fusion, where the local results are directly sent to the fusion center, in serial fusion, local results are transmitted to the fusion center through multi-hop, short-range communications. We derive a fusion decision rule for serial signal detection, which takes the channel noise into account. And the simulation results show that the performance of serial distributed detection is inferior to that of paralleldistributed detection, especially when the number of sensors is large. However, serial distributed detection utilizes short-range, multi-hop transmission, so it can be used as an energy-efficient distributed detection scheme for wireless sensor networks.
computing is being transformed to a model consisting of services that are commoditised and delivered in a manner similar to utilities such as water, electricity, gas, and telephony. In such a model, users access servi...
详细信息
ISBN:
(纸本)9781424439355
computing is being transformed to a model consisting of services that are commoditised and delivered in a manner similar to utilities such as water, electricity, gas, and telephony. In such a model, users access services based on their requirements without regard to where the services are hosted. Several computing paradigms have promised to deliver this utility computing vision and they include Grid computing, P2P computing, and more recently Cloud computing. the latter term denotes the infrastructure as a “Cloud” in which businesses and users are able to access applications from anywhere in the world on demand. Hence, Cloud computing can be classed as a new paradigm for the dynamic creation of next-generation Data Centers by assembling services of networked Virtual Machines (VMs). thus, the computing world is rapidly transforming towards developing software for millions to consume as a service rather than creating software for millions to run on their PCs.
In this paper, a new signal detection scheme using bothparallel interference cancellation (PIC) and equalization for the efficient joint distributed space-time coding is proposed to suppress the impact of imperfect s...
详细信息
Nowadays, common systems in the area of high performance computing exhibit highly hierarchical architectures. As a result, achieving satisfactory;application performance demands an adaptation of the respective paralle...
详细信息
ISBN:
(纸本)9781424437511
Nowadays, common systems in the area of high performance computing exhibit highly hierarchical architectures. As a result, achieving satisfactory;application performance demands an adaptation of the respective parallel algorithm to such systems. this, in turn, requires knowledge about the actual hardware structure even at the application level. However, the prevalent Message Passing Interface (MPI) standard (at least in its current version 2.1) intentionally hides heterogeneity from the application programmer in order to assure portability In this paper, we introduce the MPIXternal library which tries to Circumvent this obvious semantic gap within the current MPI standard. For this pur pose, the library offers the programmer additional features that should help to adapt applications to today's hierarchical systems in a convenient and portable way.
In this work we describe a parallel implementation of the Poisson Surface Reconstruction algorithm based on multigrid domain decomposition. We compare implementations using different;models of data-sharing between pro...
详细信息
ISBN:
(数字)9783642103315
ISBN:
(纸本)9783642103308
In this work we describe a parallel implementation of the Poisson Surface Reconstruction algorithm based on multigrid domain decomposition. We compare implementations using different;models of data-sharing between processors and show that a parallel implementation withdistributed memory provides the best scalability. Using our method. we are able to parallelize the reconstruction of models from one billion data points on twelve processors across three machines. providing a ninefold speedup in run nit time without sacrificing reconstruction accuracy.
Cooperative transmission is an efficient technique to realize diversity gain in wireless fading channels via a distributed way. In this paper, we consider a wireless network composed of a source, two parallel relays a...
详细信息
暂无评论