In this paper, closed-loop quasi-orthogonal space time block coding (QO-STBC) is exploited within a four relay node transmission scheme to achieve full-rate and increase the available diversity gain provided by earlie...
详细信息
ISBN:
(纸本)9781424453337
In this paper, closed-loop quasi-orthogonal space time block coding (QO-STBC) is exploited within a four relay node transmission scheme to achieve full-rate and increase the available diversity gain provided by earlier two relay approaches. the problem of imperfect synchronization between relay nodes is overcome by applying a parallel interference cancellation (PIC) detection scheme at the destination node. Bit error rate simulations confirm the advantages of the proposed methodology for a range of levels of imperfect synchronization and that only a small number of iterations is necessary within the PIC detection.
Many FPGA implementations for QR decomposition have been studied on small-scale matrix and all of them are presented individually. However to the best of our knowledge, there is no FPGA-based accelerator for large-sca...
详细信息
ISBN:
(纸本)9781424438914
Many FPGA implementations for QR decomposition have been studied on small-scale matrix and all of them are presented individually. However to the best of our knowledge, there is no FPGA-based accelerator for large-scale QR decomposition. In this paper, we propose a unified FPGA accelerator structure for large-scale QR decomposition. To exploit the computational potential of FPGA, we introduce a fine-grained parallel algorithm for QR decomposition. A scalable linear array processing elements (PEs), which is the core component of the FPGA accelerator;is proposed to implement this algorithm. A total of 15 PEs can integrated into an Altera StratixII EP2S130F1020C5 on our self-designed board. Experimental results show that a factor of 4 speedup and the maximum power-performance of 60.9 can be achieved compare to Pentium Dual CPU with double SSE thread.
In this paper, we propose a new versatile network, called a recursive dual-net (RDN), as a potential candidate for the interconnection network of supercomputers of the next generation. the RDN is based on recursive du...
详细信息
In this paper, we propose a new versatile network, called a recursive dual-net (RDN), as a potential candidate for the interconnection network of supercomputers of the next generation. the RDN is based on recursive dual-construction of a base network. A k-level recursive dual construction for k > 0 creates a network containing (2m)2(k)/2 nodes with node-degree d + k, where in and d are the number of nodes and the node-degree of the base network, respectively. the RDN is node and edge symmetric if the base network is node and edge symmetric. the RDN can contain a huge number of nodes, each with small node-degree and short diameter. For example, we can construct a symmetric RDN connecting more than 3-million nodes with only 6 links per node and a diameter of 22. We investigate the topological properties of the RDN and compare them to those of other networks including 3D torus, WK-recursive network, hypercube, cube-connected-cycle, and dual-cube. We also establish the efficient routing and broadcasting algorithms for the RDN.
An efficient GPU-based sorting algorithm is proposed in this paper together with a merging method on graphics devices. the proposed sorting algorithm is optimized for modern GPU architecture withthe capability of sor...
详细信息
An efficient GPU-based sorting algorithm is proposed in this paper together with a merging method on graphics devices. the proposed sorting algorithm is optimized for modern GPU architecture withthe capability of sorting elements represented by integers, floats and structures, while the new merging method gives a way to merge two ordered lists efficiently on GPU without using the slow atomic functions and uncoalesced memory read. Adaptive strategies are used for sorting disorderly or nearly-sorted lists, large or small lists. the current implementation is on NVIDIA CUDA with multi-GPUs support, and is being migrated to the new born Open Computing Language (OpenCL). Extensive experiments demonstrate that our algorithm has better performance than previous GPU-based sorting algorithms and can support real-time applications.
Agent-Based Modeling has been recently recognized as a method for in-silico multi-scale modeling of biological cell systems. Agent-Based Models (ABMs) allow results from experimental studies of individual cell behavio...
详细信息
ISBN:
(纸本)9780791843277
Agent-Based Modeling has been recently recognized as a method for in-silico multi-scale modeling of biological cell systems. Agent-Based Models (ABMs) allow results from experimental studies of individual cell behaviors to be scaled into the macro-behavior of interacting cells in complex cell systems or tissues. Current generation ABM simulation toolkits are designed to work on serial von-Neumann architectures, which have poor scalability. the best systems can barely handle tens of thousands of agents in real-time. Considering that there are models for which mega-scale populations have significantly different emergent behaviors than smaller population sizes, it is important to have the ability to model such large scale models in real-time. In this paper we present a new framework for simulating ABMs on programmable graphics processing units (GPUs). Novel algorithms and data-structures have been developed for agent-state representation, agent motion, and replication. As a test case, we have implemented an abstracted version of the Systematic Inflammatory Response System (SIRS) ABM. Compared to the original implementation on the NetLogo system, our implementation can handle an agent population that is over three orders of magnitude larger with close to 40 updates/sec. We believe that our system is the only one of its kind that is capable of efficiently handling realistic problem sizes in biological simulations.
Tsunami simulation consists of fluid dynamics, numerical computations, and visualization techniques. Nonlinear shallow water equations are often used to model the tsunami propagation. By adding the friction slope to t...
详细信息
Auto-tuners automate the performance toning of parallel applications. three major drawbacks of current approaches are I.) they mainly focus Oil numerical software;2) they typically do riot attempt;to reduce the large ...
详细信息
ISBN:
(纸本)9783642038686
Auto-tuners automate the performance toning of parallel applications. three major drawbacks of current approaches are I.) they mainly focus Oil numerical software;2) they typically do riot attempt;to reduce the large search space: before search algorithms are applied;3) the means to provide an auto-turner with additional information to improve tuning are limited. Our paper tackles these problems in a novel way by focusing on the interaction between art auto-toner and a parallel application. In particular;we introduce Atune-IL, an instrumentation language that uses new tykes of code annotations to mark tuning parameters, blocks, permutation regions, and measuring points. Atune-IL allows a more accurate extraction of meta-information to help an auto-tuner prune the search space before employing search algorithms. In addition, Atune-IL's Concepts target parallel applications in general, not just numerical programs. Atune-IL has been successfully evaluated in several case studies withparallel applications differing in size;programming language;and application domain;one case study employed a. large commercial application with nested parallelism. On average;Atune-IL reduced search spaces by 78%. In two corner cases, 99% of the search space could be pruned.
KMKE provides a knowledge engineering approach to integrating knowledge management activities (such as knowledge modeling, knowledge verification, knowledge storage and knowledge querying) into a systematic framework....
详细信息
ISBN:
(纸本)9783642040696
KMKE provides a knowledge engineering approach to integrating knowledge management activities (such as knowledge modeling, knowledge verification, knowledge storage and knowledge querying) into a systematic framework. In this paper, we develop the KMKE knowledge management system based on design patterns and parallelprocessing. First, several design patterns are applied to develop the KMKE system for enhancing its flexibility and extensibility. Making the KMKE system flexible and extensible is useful to deal with continuous changes originated in knowledge. Second, JAVA programs and CLIPS programs are bound to offer the capability of knowledge inference for the KMKE system. Knowledge verification and knowledge querying can then be performed through the execution of CLIPS rules. Finally, we propose the parallel CLIPS to shorten the execution time of the KMKE system. Since a large amount of knowledge may increase the execution time substantially, parallelizing the execution of CLIPS rules in cluster system could effectively reduce the search space of the CLIPS inference engine.
Micrometer order diamond photonic crystals composed of alumina were fabricated by using a micro-stereolithography of computer aided design and manufacturing systems and powder sintering processes. Twinned structures w...
详细信息
ISBN:
(纸本)9781615673742
Micrometer order diamond photonic crystals composed of alumina were fabricated by using a micro-stereolithography of computer aided design and manufacturing systems and powder sintering processes. Twinned structures with mirror symmetric diamond lattices were designed to introduce defect interfaces parallel to (100) and (111) planes. Electromagnetic wave properties were measured by using a terahertz time domain spectroscopy. A perfect photonic band gap was observed in a terahertz wave frequency range, and showed good agreement withtheoretical calculations by using a plane wave expansion method. Transmission peaks of localized modes were formed in the band gaps through the twinned diamond structures. In distribution profiles of electric field intensities simulated by using a transmission line modeling method, incident waves were resonated and localized strongly through multiple reflections between the twinned diffraction lattices. the stronger localized mode of transmission peak withthe higher intensity was obtained between through structural modifications of the twinned lattice patterns.
Many machine vision applications deal with depth estimation in a scene. Disparity map recovery from a stereo image pair has been extensively studied by the computer vision community. Previous methods are mainly restri...
详细信息
ISBN:
(纸本)9783642041457
Many machine vision applications deal with depth estimation in a scene. Disparity map recovery from a stereo image pair has been extensively studied by the computer vision community. Previous methods are mainly restricted to software based techniques on general-purpose architectures, presenting relatively high execution time due to the computationally complex algorithms involved. In this paper a new hardware module suitable for real-time disparity map computation module is realized. this enables a hardware based occlusion-aware parallel-pipelined design, implemented on a single FPGA device with a typical operating frequency of 511 MHz. It provides accurate disparity map computation at a rate of 768 frames per second, given a stereo image pair with a disparity range of 80 pixels and 640x480 pixel spatial resolution. the proposed method allows a fast disparity map computational module to be built, enabling a suitable module for real-time stereo vision applications.
暂无评论