Low-Density Parity-Check (LDPC) codes are among the best error correcting codes known and have been recently adopted by data transmission standards, such as the second generation for Satellite Digital Video Broadcasti...
详细信息
ISBN:
(纸本)9783540929895
Low-Density Parity-Check (LDPC) codes are among the best error correcting codes known and have been recently adopted by data transmission standards, such as the second generation for Satellite Digital Video Broadcasting (DVB-S2) and WiMAX. LDPC codes are based on sparse parity-check matrices and use message-passing algorithms, also known as belief propagation, which demands very intensive computation. For that reason, VLSI dedicated architectures have been proposed in the past few years, to achieve real-time processing. this paper proposes a new flexible and programmable approach for LDPC decoding on a heterogeneous multicore Cell Broadband Engine. (Cell/B.E.) architecture. Very compact data structures were developed to represent the bipartite graph for both regular and irregular LDPC codes. they are used to map the irregular behavior of the Sum-Product Algorithm (SPA) used in LDPC decoding into a computing model that expresses parallelism and locality of data by decoupling computation and memory accesses. this model can be used in general for exploiting capabilities of modern multicore architecture. For the Cell/B.E., in particular, stream-based programs were developed for simultaneous multicodeword LDPC decoding by using SIMD features and a low-latency DMA-based data communication mechanism between processors. Experimental results show significant throughputs that compare well with state-of-the-art VLSI-based solutions.
SRAM (static random access memory)-based pipelined algorithmic solutions have become competitive alternatives to TCAMs (ternary content addressable memories) for high-throughput IP lookup. Multiple pipelines can be ut...
详细信息
SRAM (static random access memory)-based pipelined algorithmic solutions have become competitive alternatives to TCAMs (ternary content addressable memories) for high-throughput IP lookup. Multiple pipelines can be utilized in parallel to improve the throughput further. However, several challenges must be addressed to make such solutions feasible. First, the memory distribution over different pipelines, as well as across different stages of each pipeline, must be balanced. Second, the traffic among these pipelines should be balanced. third, the intra-flow packet order (i.e. the sequence) must be preserved. In this paper, we propose a parallel SRAM-based multi-pipeline architecture for IP lookup. A two-level mapping scheme is developed to balance the memory requirement among the pipelines as well as across the stages in each pipeline. To balance the traffic, we propose an early caching scheme to exploit the data locality inherent in the architecture. Our technique uses neither a large reorder buffer nor complex reorder logic. Instead, a flow-aware queuing scheme exploiting the flow information is used to maintain the intra-flow sequence. Extensive simulation using real-life traffic traces shows that the proposed architecture with8 pipelines can achieve a throughput of up to 10 billion packets per second, i.e. 3.2 Tbps for minimum size (40 bytes) packets, while preserving intra-flow packet order. (c) 2009 Elsevier Inc. All rights reserved.
In this paper, we propose a new versatile network, called a recursive dual-net (RDN), as a potential candidate for the interconnection network of supercomputers of the next generation. the RDN is based on recursive du...
详细信息
In this paper, we propose a new versatile network, called a recursive dual-net (RDN), as a potential candidate for the interconnection network of supercomputers of the next generation. the RDN is based on recursive dual-construction of a base network. A k-level recursive dual construction for k > 0 creates a network containing (2m)2(k)/2 nodes with node-degree d + k, where in and d are the number of nodes and the node-degree of the base network, respectively. the RDN is node and edge symmetric if the base network is node and edge symmetric. the RDN can contain a huge number of nodes, each with small node-degree and short diameter. For example, we can construct a symmetric RDN connecting more than 3-million nodes with only 6 links per node and a diameter of 22. We investigate the topological properties of the RDN and compare them to those of other networks including 3D torus, WK-recursive network, hypercube, cube-connected-cycle, and dual-cube. We also establish the efficient routing and broadcasting algorithms for the RDN.
this paper presents two discrete computational geometry algorithms designed for execution on Graphics processing Units (GPUs). the algorithms are parallelized versions of sequential algorithms intended for application...
详细信息
ISBN:
(纸本)9783642019692
this paper presents two discrete computational geometry algorithms designed for execution on Graphics processing Units (GPUs). the algorithms are parallelized versions of sequential algorithms intended for application in geographical data mining. the first algorithm finds clusters of in points, called m-clusters, in the rasterized plane. the second algorithm complements the first by identifying outliers, those points which are not members of any m-clusters. the use of a raster representation of coordinates provides an ideal data stream environment for efficient GPU utilization. the parallelalgorithms have low memory demands, and require only a limited amount of inter-process communication. Initial performance analysis indicates the algorithms are scalable, both in problem size and in the number of seeds, and significantly outperform commercial implementations.
An efficient GPU-based sorting algorithm is proposed in this paper together with a merging method on graphics devices. the proposed sorting algorithm is optimized for modern GPU architecture withthe capability of sor...
详细信息
An efficient GPU-based sorting algorithm is proposed in this paper together with a merging method on graphics devices. the proposed sorting algorithm is optimized for modern GPU architecture withthe capability of sorting elements represented by integers, floats and structures, while the new merging method gives a way to merge two ordered lists efficiently on GPU without using the slow atomic functions and uncoalesced memory read. Adaptive strategies are used for sorting disorderly or nearly-sorted lists, large or small lists. the current implementation is on NVIDIA CUDA with multi-GPUs support, and is being migrated to the new born Open Computing Language (OpenCL). Extensive experiments demonstrate that our algorithm has better performance than previous GPU-based sorting algorithms and can support real-time applications.
Driven by the insatiable demand of real-time graphics, especially from the market of computer games, Graphics processing Unit (CPU) is becoming a major computing horsepower during recent years since the performance of...
详细信息
ISBN:
(纸本)9780769539294
Driven by the insatiable demand of real-time graphics, especially from the market of computer games, Graphics processing Unit (CPU) is becoming a major computing horsepower during recent years since the performance of CPU is surpassing that of the contemporary CPU. this paper presents our study on how to efficiently recover the passwords for encrypted RAR files. Our research focus is on the AES key generation processing, which is the most time consuming stage in the whole RAR encryption/decryption process. the design and implementation of the password recovery are based on NVIDIA's CUDA (Computer Unified Device Architecture). A CPU-based version is also implemented as a reference and the performance comparison withthat of the CPU-based version. In addition, a modified model is proposed to estimate the performance by static analysis of code for and then further assist program optimization.
As Chip-Multiprocessor systems (CMP) have become the predominant topology for leading microprocessors, critical components of the system are now integrated on a single chip. this enables sharing of computation resourc...
详细信息
Agent-Based Modeling has been recently recognized as a method for in-silico multi-scale modeling of biological cell systems. Agent-Based Models (ABMs) allow results from experimental studies of individual cell behavio...
详细信息
ISBN:
(纸本)9780791843277
Agent-Based Modeling has been recently recognized as a method for in-silico multi-scale modeling of biological cell systems. Agent-Based Models (ABMs) allow results from experimental studies of individual cell behaviors to be scaled into the macro-behavior of interacting cells in complex cell systems or tissues. Current generation ABM simulation toolkits are designed to work on serial von-Neumann architectures, which have poor scalability. the best systems can barely handle tens of thousands of agents in real-time. Considering that there are models for which mega-scale populations have significantly different emergent behaviors than smaller population sizes, it is important to have the ability to model such large scale models in real-time. In this paper we present a new framework for simulating ABMs on programmable graphics processing units (GPUs). Novel algorithms and data-structures have been developed for agent-state representation, agent motion, and replication. As a test case, we have implemented an abstracted version of the Systematic Inflammatory Response System (SIRS) ABM. Compared to the original implementation on the NetLogo system, our implementation can handle an agent population that is over three orders of magnitude larger with close to 40 updates/sec. We believe that our system is the only one of its kind that is capable of efficiently handling realistic problem sizes in biological simulations.
the concepts of artifact-as-organism and creator-in-a-box, and their autonomy, adaptation and evolution are proposed as purely engineering motivations for the incorporation of the cognitive attributes of consciousness...
详细信息
the concepts of artifact-as-organism and creator-in-a-box, and their autonomy, adaptation and evolution are proposed as purely engineering motivations for the incorporation of the cognitive attributes of consciousness and self-awareness into robots, automata, machines and artifacts. these ideas are then used to create computational models of cognitive robots and machine consciousness that can be executed using modern parallel, distributed, many core, and massively multi-core, computer architectures.
Tsunami simulation consists of fluid dynamics, numerical computations, and visualization techniques. Nonlinear shallow water equations are often used to model the tsunami propagation. By adding the friction slope to t...
详细信息
暂无评论