this paper proposes an iterative Non-Binary LDPC (NB-LDPC) decoder for non-binary codes constructed using the 5G base matrices. Motivated by the binary to non-binary replacement method, we construct NB-LDPC matrices d...
详细信息
this paper proposes an iterative Non-Binary LDPC (NB-LDPC) decoder for non-binary codes constructed using the 5G base matrices. Motivated by the binary to non-binary replacement method, we construct NB-LDPC matrices devised directly from the 5G base matrices. Subsequently, we develop an iterative decoding scheme able to facilitate parallelism due to its low complexity and to offer high performance due to its fast convergence. BER plots comparing Min-sum binary decoder (over 5G base matrices) to our proposed NB decoder reveal a performance gain of 0.5 dB in certain cases. Furthermore, hardware synthesis results obtained for a 45-nm ASIC technology are provided in order to quantify the throughput rate and area requirements of the proposed architecture. It is shown that the proposed decoding architecture, because of its independence on the lifting size factor, can offer higher throughput rate than the binary ones for small codeword lengths and code rates. In addition, as Galois Field (GF) order increases, the throughput rate increases too. Finally, the throughput-to-area show that the proposed NB architecture is generally suitable for small lifting size factors.
Software applications for biological networks analysis rely on graphs to model the structure interactions. A great part of them requires searching for subgraphs in a target graph or in collections of graphs. Even thou...
详细信息
ISBN:
(纸本)9781728116440
Software applications for biological networks analysis rely on graphs to model the structure interactions. A great part of them requires searching for subgraphs in a target graph or in collections of graphs. Even though very efficient algorithms have been defined to solve such a subgraph isomorphisms problem, the complexity of current real biological networks make their sequential execution time prohibitive. On the other hand, parallelarchitectures, from multi-core to many-core, have become pervasive to deal withthe problem of the data size. Nevertheless, the sequential nature of the graph searching algorithms makes their implementation for parallelarchitectures very challenging. this paper presents three different parallel solutions for the graph searching problem. the first two target the exact search for multi-core CPUs and many-core GPUs, respectively. the third one targets the approximate search for GPUs, which handles node, edge, and node label mismatches. the paper shows how different techniques have been developed in all the solutions to reduce the search space complexity. the paper shows the performance of the proposed solutions on representative biological networks containing antiviral chemical compounds and protein interactions networks.
Nowadays, intelligent mobile devices become most wide mobile multimedia terminals. But limited by performance and battery life, traditional audio architectures and algorithms cannot meet increasingly complex processin...
详细信息
Viewshed analysis is an indispensable part of digital terrain analysis and is widely used in many application domains. High-resolution raster DEM data bring significant computational challenges to the existing viewshe...
详细信息
In this paper, we propose a novel fault-tolerant parallel matrix multiplication algorithm called 3D Coded SUMMA that achieves higher failure-tolerance than replication-based schemes for the same amount of redundancy. ...
详细信息
ISBN:
(纸本)9783030576752;9783030576745
In this paper, we propose a novel fault-tolerant parallel matrix multiplication algorithm called 3D Coded SUMMA that achieves higher failure-tolerance than replication-based schemes for the same amount of redundancy. this work bridges the gap between recent developments in coded computing and fault-tolerance in high-performance computing (HPC). the core idea of coded computing is the same as algorithm-based fault-tolerance (ABFT), which is weaving redundancy in the computation using error-correcting codes. In particular, we show that MatDot codes, an innovative code construction for parallel matrix multiplications, can be integrated into three-dimensional SUMMA (Scalable Universal Matrix Multiplication Algorithm [30]) in a communication-avoiding manner. To tolerate any two node failures, the proposed 3D Coded SUMMA requires similar to 50% less redundancy than replication, while the overhead in execution time is only about 5-10%.
Kyber, an IND-CCA-secure key encapsulation mechanism (KEM) based on the MLWE problem, has been shortlisted for the third round evaluation of the NIST Post-Quantum Cryptography Standardization. In this paper, we explor...
详细信息
In data analytics applications, join is a general and time consuming operation. Optimizing join algorithms can benefit the query processing significantly. the emerging of GPUs provides a massive parallelism solution f...
详细信息
In this paper, we investigate the performance of parallel Discrete Event Simulation ( PDES) on a cluster of many-core Intel KNL processors. Specifically, we analyze the impact of different Global Virtual Time (GVT) al...
详细信息
ISBN:
(纸本)9781450362955
In this paper, we investigate the performance of parallel Discrete Event Simulation ( PDES) on a cluster of many-core Intel KNL processors. Specifically, we analyze the impact of different Global Virtual Time (GVT) algorithms in this environment and contribute three significant results. First, we show that it is essential to isolate the thread performing MPI communications from the task of processing simulation events, otherwise the simulation is significantly imbalanced and performs poorly. this applies to both synchronous and asynchronous GVT algorithms. Second, we demonstrate that synchronous GVT algorithm based on barrier synchronization is a better choice for communication-dominated models, while asynchronous GVT based on Mattern's algorithm performs better for computation-dominated scenarios. third, we propose Controlled Asynchronous GVT (CA-GVT) algorithm that selectively adds synchronization to Mattern-style GVT based on simulation conditions. We demonstrate that CA-GVT outperforms both barrier and Mattern's GVT and achieves about 8% performance improvement on mixed computation-communication models. this is a reasonable improvement for a simple modification to a GVT algorithm.
Data compression plays an important role in the era of big data;however, such compression is typically one of the bottlenecks of a massive data processing system due to intensive computing and memory access. In this p...
详细信息
the aim of this paper is to present a new high-performance implementation of Marsa-LFIB4 which is an example of high-quality multiple recursive pseudorandom number generators. We propose a new algorithmic approach tha...
详细信息
暂无评论