On chip interconnection networks simplify the challenges of integrating large number of processing elements. Routers are backbone of networks. Buffers and crossbar in router consumes significant area and power of netw...
详细信息
On chip interconnection networks simplify the challenges of integrating large number of processing elements. Routers are backbone of networks. Buffers and crossbar in router consumes significant area and power of network. Reducing buffers could lead to degradation of network performance. Dual Xbar router architecture combines buffered and bufferless feature to reduce buffer read/ write energy with dual crossbars. While Switch folding technique introduced to reduce wire density and decrease muxes in crossbar by increasing resource utilization. In this paper, we propose Folded Dual Xbar architecture by combining the Dual Xbar and Folding technique in order to get advantages of botharchitectures. Performance of architectures is evaluated using OMNET++ platform under different load conditions. Simulation results shows that there is slight increase in throughput and reduction in buffer read/ write energy by average 46% at high loads in proposed 2-Folded Dual Xbar as compared to conventional architecture. Proposed 3-Folded Dual Xbar results at least 16.6 % increase in throughput as compared to conventional architecture with 43-45% reduced buffer read/ write energy but slight increase in crossbar. throughput of 3-Folded Dual Xbar decreased only by 5-7% as compared to Dual Xbar with distributed wire density advantage. (C) 2015 the Authors. Published by Elsevier B.V.
Routers have traditionally been architected as two elements: forwarding plane and control plane through ForCES or other protocols. Each forwarding plane aggregates a fixed amount of computing, memory, and network inte...
详细信息
ISBN:
(纸本)9781467394734
Routers have traditionally been architected as two elements: forwarding plane and control plane through ForCES or other protocols. Each forwarding plane aggregates a fixed amount of computing, memory, and network interface resources to forward packets. Unfortunately, the tight coupling of packet-processing tasks with network interfaces has severely restricted service innovation and hardware upgrade. In this context, we explore the insightful prospect of functional separation in forwarding plane to propose a next-generation router architecture, which, if realized, can provide promises both for various packet-processing tasks and for flexible deployment while solving concerns related to the above problems. thus, we put forward an alternative construction in which functional resources within a forwarding plane are disaggregated. A forwarding plane is instead separated into two planes: software data plane (SDP) and flow switching plane (FSP). SDP is responsible for packet-processing tasks without its expansibility restricted withthe amount and kinds of network interfaces. FSP is in charge of packet receiving/transmitting tasks and can incrementally add switching elements, such as general switches, or even specialized switches, to provide network interfaces for SDP. At last, we make an experiment on our platform in terms of bandwidth utilization rate, configuration delay, and the processing time of a simple router. Our experimental results show that the separation of SDP and FSP brings greater modularity to router architecture, allowing operators to optimize their deployments.
this paper presents the use of two-dimensional indexation existing in graphics processing units (GPU), to accelerate approximation algorithms of system solutions of partial differential equations. these approximation ...
详细信息
In this paper, we present the results of comparison of the effectiveness of selected variants of radix-2 Fast Fourier Transform (FFT) algorithms implemented on both Graphics (GPU) and Central (CPU) processing Units. T...
详细信息
ISBN:
(纸本)9786176078043
In this paper, we present the results of comparison of the effectiveness of selected variants of radix-2 Fast Fourier Transform (FFT) algorithms implemented on both Graphics (GPU) and Central (CPU) processing Units. the considered algorithms differ in memory consumption and the arrangement of data-flow paths which affects the global memory coalescing and cache memory exploitation. the obtained results allow to indicate the variants of FFT algorithms which are best suited for GPU and CPU architectures, to confirm the advisability of GPU oriented calculations of FFT and to formulate a guideline for implementations of fast algorithms of various linear transforms.
Field Programmable Gate Arrays (FPGAs), due to their programmability, have become a popular design choice for control and processing blocks of an embedded system. However, this flexibility makes them larger, slower an...
详细信息
ISBN:
(纸本)9781479919994
Field Programmable Gate Arrays (FPGAs), due to their programmability, have become a popular design choice for control and processing blocks of an embedded system. However, this flexibility makes them larger, slower and less power-efficient than Application Specific Integrated Circuits (ASICs) and hinders their use in low-area and low-power applications. On the other hand, ASICs have their inherent drawbacks like lack of programmability and inflexibility. the solution is reconfigurable architecturesthat have improved flexibility over ASICs and better resource utilization than FPGAs. However, designing a reconfigurable architecture is a daunting task in itself due to lack of high-level design-flow support. this paper proposes an automated design-flow for system-level synthesis and resource estimation for generic as well as custom reconfigurable architectures. the experimental results show that the generated reconfigurable architectures are 79% more area and 76% more power efficient than generic academic FPGA-based implementations.
One-, two- and three-dimensional fast Fourier transform (FFT) algorithms has been widely used in digital processing. Multi-dimensional discrete Fourier transform is reduced to a combination of one-dimensional FFT for ...
详细信息
ISBN:
(纸本)9789897580949
One-, two- and three-dimensional fast Fourier transform (FFT) algorithms has been widely used in digital processing. Multi-dimensional discrete Fourier transform is reduced to a combination of one-dimensional FFT for all coordinates due to the increased complexity and the large amount of computation by increasing the dimensional of the signal. this article provides a general Cooley-Tukey algorithm analog, which requires less complex operations of additional and multiplication than the standard method, and runs 1.5 times faster than analogue in Matlab.
In this paper, we propose a high-rate nonbinary multi-parallel-concatenated single-parity-check (NB-MPCSPC) code as a low-complexity coding scheme for data storage channels. the proposed scheme is composed of parallel...
详细信息
ISBN:
(纸本)9781467372183
In this paper, we propose a high-rate nonbinary multi-parallel-concatenated single-parity-check (NB-MPCSPC) code as a low-complexity coding scheme for data storage channels. the proposed scheme is composed of parallel branches of nonbinary SPC codes over a Galois Field (GF) and can be flexibly designed to achieve a wide range of code rates and codeword lengths. the encoding can be directly implemented based on the parity-check matrix;while the decoding is simplified by using the first-order MacLaurin Series to approximate the check-node operation. Compared with its binary counterpart, the proposed nonbinary coding scheme significantly improves bit-error-rate (BER) performance in the error-floor region. Simulation results show that a noticeable performance gain is obtained over conventional binary low-density parity-check (LDPC) codes when used in turbo equalization for partial-response channels.
Resource Description Framework (RDF) is a commonly used format for semantic web processing. It basically contains strings representing terms and their relationships which can be queried or inferred. RDF is usually a l...
详细信息
Subgraph matching is the task of finding all matches of a query graph in a large data graph, which is known as an NP-complete problem. Many algorithms are proposed to solve this problem using CPUs. In recent years, Gr...
详细信息
ISBN:
(纸本)9783319181202;9783319181196
Subgraph matching is the task of finding all matches of a query graph in a large data graph, which is known as an NP-complete problem. Many algorithms are proposed to solve this problem using CPUs. In recent years, Graphics processing Units (GPUs) have been adopted to accelerate fundamental graph operations such as breadth-first search and shortest path, owing to their parallelism and high data throughput. the existing subgraph matching algorithms, however, face challenges in mapping backtracking problems to the GPU architectures. Moreover, the previous GPU-based graph algorithms are not designed to handle intermediate and final outputs. In this paper, we present a simple and GPU-friendly method for subgraph matching, called GpSM, which is designed for massively parallelarchitectures. We show that GpSM outperforms the state-of-the-art algorithms and efficiently answers subgraph queries on large graphs.
Recent developments of multicore architectures over various platforms (desktop computers and servers as well as embedded systems) challenge the classical approaches of sequential computation algorithms, in particular ...
详细信息
ISBN:
(纸本)9783319167459;9783319167442
Recent developments of multicore architectures over various platforms (desktop computers and servers as well as embedded systems) challenge the classical approaches of sequential computation algorithms, in particular elliptic curve cryptography protocols. In this work, we deploy different parallel software implementations of elliptic curve scalar multiplication of point, in order to improve the performances in comparison withthe sequential counter parts, taking into account the multi-threading synchronization, scalar recoding and memory management issues. Two thread and four thread algorithms are tested on various curves over prime and binary fields, they provide improvement ratio of around 15% in comparison withtheir sequential counterparts.
暂无评论