检索结果-内蒙古大学图书馆

parallel LDPC Decoding on the Cell/BE Processor

4th international conference on High Performance Embedded architectures and Compilers

作者： Falcao, Gabriel Sousa, Leonel Silva, Vitor Marinho, Jose Univ Coimbra Inst Telecomunicacoes Polo 2 P-3030290 Coimbra Portugal Univ Tecn Lisboa INESC ID IST P-1000129 Lisbon Portugal

ISBN: (纸本)9783540929895

Low-Density Parity-Check (LDPC) codes are among the best error correcting codes known and have been recently adopted by data transmission standards, such as the second generation for Satellite Digital Video Broadcasting (DVB-S2) and WiMAX. LDPC codes are based on sparse parity-check matrices and use message-passing algorithms, also known as belief propagation, which demands very intensive computation. For that reason, VLSI dedicated architectures have been proposed in the past few years, to achieve real-time processing. this paper proposes a new flexible and programmable approach for LDPC decoding on a heterogeneous multicore Cell Broadband Engine. (Cell/B.E.) architecture. Very compact data structures were developed to represent the bipartite graph for both regular and irregular LDPC codes. they are used to map the irregular behavior of the Sum-Product Algorithm (SPA) used in LDPC decoding into a computing model that expresses parallelism and locality of data by decoupling computation and memory accesses. this model can be used in general for exploiting capabilities of modern multicore architecture. For the Cell/B.E., in particular, stream-based programs were developed for simultaneous multicodeword LDPC decoding by using SIMD features and a low-latency DMA-based data communication mechanism between processors. Experimental results show significant throughputs that compare well with state-of-the-art VLSI-based solutions.

关键词： Digital video broadcasting (DVB)

来源：评论

学校读者我要写书评

暂无评论

Sequence-preserving parallel IP lookup using multiple SRAM-based pipelines

引用

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2009年第9期69卷 778-789页

作者： Jiang, Weirong Prasanna, Viktor K. Univ So Calif Ming Hsieh Dept Elect Engn Los Angeles CA 90089 USA

SRAM (static random access memory)-based pipelined algorithmic solutions have become competitive alternatives to TCAMs (ternary content addressable memories) for high-throughput IP lookup. Multiple pipelines can be utilized in parallel to improve the throughput further. However, several challenges must be addressed to make such solutions feasible. First, the memory distribution over different pipelines, as well as across different stages of each pipeline, must be balanced. Second, the traffic among these pipelines should be balanced. third, the intra-flow packet order (i.e. the sequence) must be preserved. In this paper, we propose a parallel SRAM-based multi-pipeline architecture for IP lookup. A two-level mapping scheme is developed to balance the memory requirement among the pipelines as well as across the stages in each pipeline. To balance the traffic, we propose an early caching scheme to exploit the data locality inherent in the architecture. Our technique uses neither a large reorder buffer nor complex reorder logic. Instead, a flow-aware queuing scheme exploiting the flow information is used to maintain the intra-flow sequence. Extensive simulation using real-life traffic traces shows that the proposed architecture with 8 pipelines can achieve a throughput of up to 10 billion packets per second, i.e. 3.2 Tbps for minimum size (40 bytes) packets, while preserving intra-flow packet order. (c) 2009 Elsevier Inc. All rights reserved.

关键词： IP lookup Pipeline SRAM Router

来源：评论

学校读者我要写书评

暂无评论

RECURSIVE DUAL-NET: A NEW VERSATILE NETWORK FOR SUPERCOMPUTERS OF thE NEXT GENERATION

RECURSIVE DUAL-NET: A NEW VERSATILE NETWORK FOR SUPERCOMPUTE...

引用

9th international conference on algorithms and architectures for parallel processing

作者： Li, Yamin Peng, Shietung Chu, Wanming Hosei Univ Dept Comp Sci Tokyo 1848584 Japan Univ Aizu Dept Comp Hardware Aizu Wakamatsu Fukushima 9658580 Japan

In this paper, we propose a new versatile network, called a recursive dual-net (RDN), as a potential candidate for the interconnection network of supercomputers of the next generation. the RDN is based on recursive dual-construction of a base network. A k-level recursive dual construction for k > 0 creates a network containing (2m)2(k)/2 nodes with node-degree d + k, where in and d are the number of nodes and the node-degree of the base network, respectively. the RDN is node and edge symmetric if the base network is node and edge symmetric. the RDN can contain a huge number of nodes, each with small node-degree and short diameter. For example, we can construct a symmetric RDN connecting more than 3-million nodes with only 6 links per node and a diameter of 22. We investigate the topological properties of the RDN and compare them to those of other networks including 3D torus, WK-recursive network, hypercube, cube-connected-cycle, and dual-cube. We also establish the efficient routing and broadcasting algorithms for the RDN.

关键词： parallel processing interconnection network

来源：评论

学校读者我要写书评

暂无评论

Accelerated Discovery of Discrete M-Clusters/Outliers on the Raster Plane Using Graphical processing Units

Accelerated Discovery of Discrete M-Clusters/Outliers on the...

引用

9th international conference on Computational Science

作者： Trefftz, Christian Szakas, Joseph Majdandzic, Igor Wolffe, Gregory Grand Valley State Univ Sch Comp Allendale MI 49401 USA Univ Maine Comp Informat Syst Dept Augusta GA 04330 USA

ISBN: (纸本)9783642019692

this paper presents two discrete computational geometry algorithms designed for execution on Graphics processing Units (GPUs). the algorithms are parallelized versions of sequential algorithms intended for application in geographical data mining. the first algorithm finds clusters of in points, called m-clusters, in the rasterized plane. the second algorithm complements the first by identifying outliers, those points which are not members of any m-clusters. the use of a raster representation of coordinates provides an ideal data stream environment for efficient GPU utilization. the parallel algorithms have low memory demands, and require only a limited amount of inter-process communication. Initial performance analysis indicates the algorithms are scalable, both in problem size and in the number of seeds, and significantly outperform commercial implementations.

关键词： GPU algorithms Geographical data mining CUDA programming

来源：评论

学校读者我要写书评

暂无评论

AN EFFICIENT SORTING ALGORIthM WIth CUDA

AN EFFICIENT SORTING ALGORITHM WITH CUDA

引用

9th international conference on algorithms and architectures for parallel processing

作者： Chen, Shifu Qin, Jing Xie, Yongming Zhao, Junping Heng, Pheng-Ann Chinese Univ Hong Kong Chinese Acad Sci Shenzhen Inst Adv Integrat Technol Hong Kong Hong Kong Peoples R China Chinese Univ Hong Kong Dept Comp Sci & Engn Hong Kong Hong Kong Peoples R China Chinese PLA Gen Hosp & Postgrad Med Sch Inst Med Informat Beijing Peoples R China

An efficient GPU-based sorting algorithm is proposed in this paper together with a merging method on graphics devices. the proposed sorting algorithm is optimized for modern GPU architecture with the capability of sorting elements represented by integers, floats and structures, while the new merging method gives a way to merge two ordered lists efficiently on GPU without using the slow atomic functions and uncoalesced memory read. Adaptive strategies are used for sorting disorderly or nearly-sorted lists, large or small lists. the current implementation is on NVIDIA CUDA with multi-GPUs support, and is being migrated to the new born Open Computing Language (OpenCL). Extensive experiments demonstrate that our algorithm has better performance than previous GPU-based sorting algorithms and can support real-time applications.

关键词： parallel sorting parallel merging CUDA

来源：评论

学校读者我要写书评

暂无评论

Password Recovery for RAR Files Using CUDA

Password Recovery for RAR Files Using CUDA

引用

8th IEEE international conference on Dependable, Autonomic and Secure Computing

作者： Hu, Guang Ma, Jianhua Huang, Benxiong Huazhong Univ Sci & Technol Dept Electron & Informat Wuhan 430074 Peoples R China Hosei Univ Fac Comp & Informat Sci Koganei Tokyo 1848584 Japan

ISBN: (纸本)9780769539294

Driven by the insatiable demand of real-time graphics, especially from the market of computer games, Graphics processing Unit (CPU) is becoming a major computing horsepower during recent years since the performance of CPU is surpassing that of the contemporary CPU. this paper presents our study on how to efficiently recover the passwords for encrypted RAR files. Our research focus is on the AES key generation processing, which is the most time consuming stage in the whole RAR encryption/decryption process. the design and implementation of the password recovery are based on NVIDIA's CUDA (Computer Unified Device Architecture). A CPU-based version is also implemented as a reference and the performance comparison with that of the CPU-based version. In addition, a modified model is proposed to estimate the performance by static analysis of code for and then further assist program optimization.

关键词： GPU CUDA parallel computing password recovery AES

来源：评论

学校读者我要写书评

暂无评论

Bank-aware dynamic cache partitioning for multicore architectures

Bank-aware dynamic cache partitioning for multicore architec...

引用

38th international conference on parallel processing, ICPP-2009

作者： Kaseridis, Dimitris Stuecheli, Jeffrey John, Lizy K. Department of Electrical and Computer Engineering University of Texas Austin TX United States IBM Corp. Austin TX United States

ISBN: (纸本)9780769538020

As Chip-Multiprocessor systems (CMP) have become the predominant topology for leading microprocessors, critical components of the system are now integrated on a single chip. this enables sharing of computation resources that was not previously possible. In addition, the virtualization of these computational resources exposes the system to a mix of diverse and competing workloads. Cache is a resource of primary concern as it can be dominant in controlling overall throughput. In order to prevent destructive interference between divergent workloads, the last level of cache must be partitioned. In the past, many solutions have been proposed but most of them are assuming either simplified cache hierarchies with no realistic restrictions or complex cache schemes that are difficult to integrate in a real design. To address this problem, we propose a dynamic partitioning strategy based on realistic last level cache designs of CMP processors. We used a cycle accurate, full system simulator based on Simics and Gems to evaluate our partitioning scheme on an 8-core DNUCA CMP system. Results for an 8-core system show that our proposed scheme provides on average a 70% reduction in misses compared to non-partitioned shared caches, and a 25% misses reduction compared to static equally partitioned (private) caches. © 2009 IEEE.

关键词： Software architecture

来源：评论

学校读者我要写书评

暂无评论

DATA-parallel TECHNIQUES FOR AGENT-BASED TISSUE MODELING ON GRAPHICS processing UNITS

DATA-PARALLEL TECHNIQUES FOR AGENT-BASED TISSUE MODELING ON ...

引用

ASME international Design Engineering Technical conferences/Computers and Information in Engineering conference

作者： Richards, Ryan S. Lysenko, Mikola D'Souza, Roshan M. An, Gary Michigan Technol Univ Dept Comp Sci Houghton MI 49931 USA

ISBN: (纸本)9780791843277

Agent-Based Modeling has been recently recognized as a method for in-silico multi-scale modeling of biological cell systems. Agent-Based Models (ABMs) allow results from experimental studies of individual cell behaviors to be scaled into the macro-behavior of interacting cells in complex cell systems or tissues. Current generation ABM simulation toolkits are designed to work on serial von-Neumann architectures, which have poor scalability. the best systems can barely handle tens of thousands of agents in real-time. Considering that there are models for which mega-scale populations have significantly different emergent behaviors than smaller population sizes, it is important to have the ability to model such large scale models in real-time. In this paper we present a new framework for simulating ABMs on programmable graphics processing units (GPUs). Novel algorithms and data-structures have been developed for agent-state representation, agent motion, and replication. As a test case, we have implemented an abstracted version of the Systematic Inflammatory Response System (SIRS) ABM. Compared to the original implementation on the NetLogo system, our implementation can handle an agent population that is over three orders of magnitude larger with close to 40 updates/sec. We believe that our system is the only one of its kind that is capable of efficiently handling realistic problem sizes in biological simulations.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

Engineering computer architectures for cognitive robotics - the CR/SARAMA model

Engineering computer architectures for cognitive robotics - ...

引用

IEEE international conference on Cognitive Informatics

作者： John-thones Amenyo Department of Math & Computer Science York College City University of New York New York NY USA

the concepts of artifact-as-organism and creator-in-a-box, and their autonomy, adaptation and evolution are proposed as purely engineering motivations for the incorporation of the cognitive attributes of consciousness and self-awareness into robots, automata, machines and artifacts. these ideas are then used to create computational models of cognitive robots and machine consciousness that can be executed using modern parallel, distributed, many core, and massively multi-core, computer architectures.

关键词： Computer architecture Cognitive robotics Chromium Cognition Cognitive science Humans Artificial intelligence Psychology Problem-solving Information processing

来源：评论

学校读者我要写书评

暂无评论

A GPU-based simulation of tsunami propagation and inundation

引用

9th international conference on algorithms and architectures for parallel processing, ICA3PP 2009

作者： Liang, Wen-Yew Hsieh, Tung-Ju Satria, Muhammad T. Chang, Yang-Lang Fang, Jyh-Perng Chen, Chih-Chia Han, Chin-Chuan Department of Computer Science and Information Engineering Taiwan Department of Electrical Engineering National Taipei University of Technology Taiwan Department of Computer Science and Information Engineering National United University Taiwan

ISBN: (纸本)3642030947

Tsunami simulation consists of fluid dynamics, numerical computations, and visualization techniques. Nonlinear shallow water equations are often used to model the tsunami propagation. By adding the friction slope to the conservation of momentum, it also can model the tsunami inundation. To solve these equations, we use the second order finite difference MacCormack method. Since it is a finite difference method, it brings the possibility to be parallelized. We use the parallelism provided by GPU to speed up the computations. By loading data as textures in GPU memory, the computation processes can be written as shader programs and the operations will be done by GPU in parallel. the results show that with the help of GPU, the simulation can get a significant improvement in the execution time for each of the computation steps. © 2009 Springer Berlin Heidelberg.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：