检索结果-内蒙古大学图书馆

10th international conference on Future Networks and Communications (FNC) / 12th international conference on Mobile Systems and Pervasive Computing (MobiSPC)

作者： Latif, Jawwad Chaudhry, Hassan Nazeer Azam, Sadia Baloch, Naveed Khan Univ Engn & Technol Taxila 47080 Taxila Pakistan Tech Univ Darmstadt Darmstadt Germany

On chip interconnection networks simplify the challenges of integrating large number of processing elements. Routers are backbone of networks. Buffers and crossbar in router consumes significant area and power of network. Reducing buffers could lead to degradation of network performance. Dual Xbar router architecture combines buffered and bufferless feature to reduce buffer read/ write energy with dual crossbars. While Switch folding technique introduced to reduce wire density and decrease muxes in crossbar by increasing resource utilization. In this paper, we propose Folded Dual Xbar architecture by combining the Dual Xbar and Folding technique in order to get advantages of both architectures. Performance of architectures is evaluated using OMNET++ platform under different load conditions. Simulation results shows that there is slight increase in throughput and reduction in buffer read/ write energy by average 46% at high loads in proposed 2-Folded Dual Xbar as compared to conventional architecture. Proposed 3-Folded Dual Xbar results at least 16.6 % increase in throughput as compared to conventional architecture with 43-45% reduced buffer read/ write energy but slight increase in crossbar. throughput of 3-Folded Dual Xbar decreased only by 5-7% as compared to Dual Xbar with distributed wire density advantage. (C) 2015 the Authors. Published by Elsevier B.V.

关键词： Network-on-Chip Dual Xbar Folding technique Router architectures

来源：评论

学校读者我要写书评

暂无评论

Software Data Plane and Flow Switching Plane Separation in Next-Generation Router Architecture 10

Software Data Plane and Flow Switching Plane Separation in N...

引用

10th international conference on P2P, parallel, Grid, Cloud and Internet Computing 3PGCIC

作者： Gao Xianming Wang Baosheng Zhang Xiaozhe Wang Xu'an Natl Univ Def Technol Sch Comp Changsha Hunan Peoples R China Engn Univ CAPF Key Lab Network & Informat Secur Xian Peoples R China

ISBN: (纸本)9781467394734

Routers have traditionally been architected as two elements: forwarding plane and control plane through ForCES or other protocols. Each forwarding plane aggregates a fixed amount of computing, memory, and network interface resources to forward packets. Unfortunately, the tight coupling of packet-processing tasks with network interfaces has severely restricted service innovation and hardware upgrade. In this context, we explore the insightful prospect of functional separation in forwarding plane to propose a next-generation router architecture, which, if realized, can provide promises both for various packet-processing tasks and for flexible deployment while solving concerns related to the above problems. thus, we put forward an alternative construction in which functional resources within a forwarding plane are disaggregated. A forwarding plane is instead separated into two planes: software data plane (SDP) and flow switching plane (FSP). SDP is responsible for packet-processing tasks without its expansibility restricted with the amount and kinds of network interfaces. FSP is in charge of packet receiving/transmitting tasks and can incrementally add switching elements, such as general switches, or even specialized switches, to provide network interfaces for SDP. At last, we make an experiment on our platform in terms of bandwidth utilization rate, configuration delay, and the processing time of a simple router. Our experimental results show that the separation of SDP and FSP brings greater modularity to router architecture, allowing operators to optimize their deployments.

关键词： router architecture forwarding plane functional separation packet-processing task packet receiving/transmitting task

来源：评论

学校读者我要写书评

暂无评论

Indexing GPU acceleration for solutions approximation of the Laplace equation 10

Indexing GPU acceleration for solutions approximation of the...

引用

10th Colombian Computing conference, 10CCC 2015

作者： Monsalve, Manuel Alejandro Tamayo Castrillon, Nubia Liliana Montes Soto, Reinel Tabares Osorio, Gustavo Depto. de Ing. Eléctrica Electrónica y Computación Universidad Nacional de Colombia Sede Manizales Colombia Depto. de Sistemas e Informática Universidad de Caldas Colombia

ISBN: (纸本)9781467394642

this paper presents the use of two-dimensional indexation existing in graphics processing units (GPU), to accelerate approximation algorithms of system solutions of partial differential equations. these approximation use recurrent equations where dependence of the near data plays an important role in the calculation speed. For these calculations large amount of data are involved, as well as frequently memory accesses. therefore, using computational structures that allow you to realize operations in a parallel and concurrent way to process the information more quickly is convenient. Also the memory indexation capacity enables the generation of better acceleration. 3 different architectures are compared, and contrasted against the sequential process on CPU. the results shows how the accelerations up until 9x can be achieve on the case of the Laplace equation in two dimensions. © 2015 IEEE.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

Effectiveness of Fast Fourier Transform Implementations on GPU and CPU 16

Effectiveness of Fast Fourier Transform Implementations on G...

引用

16th international conference on Computational Problems of Electrical Engineering (CPEE)

作者： Puchala, Dariusz Stokfiszewski, Kamil Yatsymirskyy, Mykhaylo Szczepaniak, Bartlomiej Lodz Univ Technol Inst Comp Sci Lodz Poland Lodz Univ Technol Inst Appl Comp Sci Lodz Poland

ISBN: (纸本)9786176078043

In this paper, we present the results of comparison of the effectiveness of selected variants of radix-2 Fast Fourier Transform (FFT) algorithms implemented on both Graphics (GPU) and Central (CPU) processing Units. the considered algorithms differ in memory consumption and the arrangement of data-flow paths which affects the global memory coalescing and cache memory exploitation. the obtained results allow to indicate the variants of FFT algorithms which are best suited for GPU and CPU architectures, to confirm the advisability of GPU oriented calculations of FFT and to formulate a guideline for implementations of fast algorithms of various linear transforms.

关键词： fast fourier transform parallel computations general purpose GPU computations

来源：评论

学校读者我要写书评

暂无评论

A Design-flow for High-Level Synthesis and Resource Estimation of Reconfigurable architectures 10

A Design-flow for High-Level Synthesis and Resource Estimati...

引用

10th IEEE international conference on Design and Technology of Integrated Systems in Nanoscale Era (DTIS)

作者： Pasha, Muhammad Adeel Siddiqui, Bilal Farooq, Umer LUMS SBASSE Dept Elect Engn Lahore Pakistan COMSATS IIT Dept Elect Engn Lahore Pakistan

ISBN: (纸本)9781479919994

Field Programmable Gate Arrays (FPGAs), due to their programmability, have become a popular design choice for control and processing blocks of an embedded system. However, this flexibility makes them larger, slower and less power-efficient than Application Specific Integrated Circuits (ASICs) and hinders their use in low-area and low-power applications. On the other hand, ASICs have their inherent drawbacks like lack of programmability and inflexibility. the solution is reconfigurable architectures that have improved flexibility over ASICs and better resource utilization than FPGAs. However, designing a reconfigurable architecture is a daunting task in itself due to lack of high-level design-flow support. this paper proposes an automated design-flow for system-level synthesis and resource estimation for generic as well as custom reconfigurable architectures. the experimental results show that the generated reconfigurable architectures are 79% more area and 76% more power efficient than generic academic FPGA-based implementations.

关键词： Reconfigurable architectures programmability high level synthesis Resource utilization application specific integrated circuits Field programmable gate arrays

来源：评论

学校读者我要写书评

暂无评论

parallel version n-dimensional fast fourier transform algorithm: Analog of the cooley-tukey algorithm 5

Parallel version n-dimensional fast fourier transform algori...

引用

5th international Workshop on Image Mining. theory and Applications, IMTA-5 2015 - In conjunction with the 10th Internatioanal Joint conference on Computer Vision, Imaging and Computer Graphics theory and Applications, VISIGRAPP 2015

作者： Noskov, M.V. Tutatchikov, V.S. Institute of Space and Information Technology Siberian Federal University Kirenskogo Street 26 Krasnoyarsk Russia

ISBN: (纸本)9789897580949

One-, two- and three-dimensional fast Fourier transform (FFT) algorithms has been widely used in digital processing. Multi-dimensional discrete Fourier transform is reduced to a combination of one-dimensional FFT for all coordinates due to the increased complexity and the large amount of computation by increasing the dimensional of the signal. this article provides a general Cooley-Tukey algorithm analog, which requires less complex operations of additional and multiplication than the standard method, and runs 1.5 times faster than analogue in Matlab.

关键词： Discrete Fourier transforms

来源：评论

学校读者我要写书评

暂无评论

Nonbinary Multi-parallel-Concatenated Single-Parity-Check (NB-MPCSPC) Codes over Partial-Response Channels 10

Nonbinary Multi-Parallel-Concatenated Single-Parity-Check (N...

引用

10th international conference on Information, Communications and Signal processing (ICICS)

作者： Qin, Zhiliang Kong, Anmin Wang, Xueqiang Data Storage Inst Singapore 117608 Singapore

ISBN: (纸本)9781467372183

In this paper, we propose a high-rate nonbinary multi-parallel-concatenated single-parity-check (NB-MPCSPC) code as a low-complexity coding scheme for data storage channels. the proposed scheme is composed of parallel branches of nonbinary SPC codes over a Galois Field (GF) and can be flexibly designed to achieve a wide range of code rates and codeword lengths. the encoding can be directly implemented based on the parity-check matrix;while the decoding is simplified by using the first-order MacLaurin Series to approximate the check-node operation. Compared with its binary counterpart, the proposed nonbinary coding scheme significantly improves bit-error-rate (BER) performance in the error-floor region. Simulation results show that a noticeable performance gain is obtained over conventional binary low-density parity-check (LDPC) codes when used in turbo equalization for partial-response channels.

关键词： Multiple-parallel-Concatenated (MPC) codes nonbinary low-density parity-check (NB-LDPC) codes single-parity-check (SPC) codes symbol-level BCJR algorithm

来源：评论

学校读者我要写书评

暂无评论

TripleID: A low-overhead representation and querying using GPU for large RDFs

Communications in Computer and Information Science

引用

Communications in Computer and Information Science 2016年 613卷 400-415页

作者： Chantrapornchai, Chantana Choksuchat, Chidchanok Haidl, Michael Gorlatch, Sergei Department of Computer Engineering Kasetsart University Bangkok Thailand Department of Computing Silpakorn University Bangkok Thailand University of Münster Münster Germany

ISBN: (纸本)9783319340982

Resource Description Framework (RDF) is a commonly used format for semantic web processing. It basically contains strings representing terms and their relationships which can be queried or inferred. RDF is usually a large text file which contains many million relationships. In this work, we propose a framework, TripleID, for processing queries of large RDF data. the framework utilises Graphics processing Units (GPUs) to search RDF relations. the RDF data is first transformed to the encoded form suitable for storing in the GPU memory. then parallel threads on the GPU search the required data. We show in the experiments that one GPU on a personal desktop can handle 100 million triple relations, while a traditional RDF processing tool can process up to 10 million triples. Furthermore, we can query sample relations within 0.18 s with the GPU in 7 million triples, while the traditional tool takes at least 6 s for 1.8 million triples. © Springer international Publishing Switzerland 2016.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

Fast Subgraph Matching on Large Graphs using Graphics Processors 20th

Fast Subgraph Matching on Large Graphs using Graphics Proces...

引用

20th international conference on Database Systems for Advanced Applications (DASFAA)

作者： Ha-Nguyen Tran Kim, Jung-Jae He, Bingsheng Nanyang Technol Univ Sch Comp Engn Singapore 639798 Singapore

ISBN: (纸本)9783319181202;9783319181196

Subgraph matching is the task of finding all matches of a query graph in a large data graph, which is known as an NP-complete problem. Many algorithms are proposed to solve this problem using CPUs. In recent years, Graphics processing Units (GPUs) have been adopted to accelerate fundamental graph operations such as breadth-first search and shortest path, owing to their parallelism and high data throughput. the existing subgraph matching algorithms, however, face challenges in mapping backtracking problems to the GPU architectures. Moreover, the previous GPU-based graph algorithms are not designed to handle intermediate and final outputs. In this paper, we present a simple and GPU-friendly method for subgraph matching, called GpSM, which is designed for massively parallel architectures. We show that GpSM outperforms the state-of-the-art algorithms and efficiently answers subgraph queries on large graphs.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

parallelized Software Implementation of Elliptic Curve Scalar Multiplication 10

Parallelized Software Implementation of Elliptic Curve Scala...

引用

10th China international conference on Information Security and Cryptology (Inscrypt)

作者： Robert, Jean-Marc Univ Perpignan Team DALI F-66025 Perpignan France Univ Montpellier 2 LIRMM UMR 5506 Montpellier France CNRS Montpellier France

ISBN: (纸本)9783319167459;9783319167442

Recent developments of multicore architectures over various platforms (desktop computers and servers as well as embedded systems) challenge the classical approaches of sequential computation algorithms, in particular elliptic curve cryptography protocols. In this work, we deploy different parallel software implementations of elliptic curve scalar multiplication of point, in order to improve the performances in comparison with the sequential counter parts, taking into account the multi-threading synchronization, scalar recoding and memory management issues. Two thread and four thread algorithms are tested on various curves over prime and binary fields, they provide improvement ratio of around 15% in comparison with their sequential counterparts.

关键词： Elliptic curve cryptography parallel algorithm Efficient software implementation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：