A Load Balancing-Supported ID assignment method is the foundation to implement and maintain DHT overlays, realized constant degree DHTs usually use simple pure centralized or distributed ID management strategies, whic...
详细信息
A Load Balancing-Supported ID assignment method is the foundation to implement and maintain DHT overlays, realized constant degree DHTs usually use simple pure centralized or distributed ID management strategies, which cannot resolve the contradiction between cost of maintaining topologies' information and topologies' balance. Analyzing the universal tree structures in the topologies, an ID Assignment method RFIDAM based on the internal structure Routing Forest is proposed, which regularly aggregates local balancing information to guide new nodes' joining for overall balance. The experimental results show, with low maintenance and routing message overhead, the system's loading balance is efficiently ensured with the length of IDs differ by at most 2.
Memory-intensive applications often suffer from the poor performance of disk swapping when memory is inadequate. Remote memory sharing schemes, which provide a remote memory that is faster than the local hard disk, ar...
详细信息
Scalability is a crucial factor determining the performance of massive heterogeneous parallel CFD applications on the multi-GPUs platforms, particularly after the single-GPU implementations have achieved optimal perfo...
详细信息
In large-scale asynchronous distributed virtual environments(DVEs), one of the difficult problems is to deliver the concurrent events in a consistent order at each node. Generally, the previous consistency control app...
详细信息
In large-scale asynchronous distributed virtual environments(DVEs), one of the difficult problems is to deliver the concurrent events in a consistent order at each node. Generally, the previous consistency control approaches can be classified into two categories: causal order and time stamped order. However, causal order approaches can merely preserve the cause-effect relation of events and time stamped order approaches seem intrinsically complex to be used in serverless large-scale asynchronous DVEs. In this paper, we proposed a novel distributed algorithm to identify the concurrent events and preserve the consistent order delivery of them at different nodes. Simulation studies are also carried out to compare the performance of this algorithm with that of the previous ones. The results show that the new algorithm can effectively deliver the concurrent events in consistent order at each node and is more efficient than the previous algorithms in large-scale asynchronous DVEs.
With the rapid development of Internet technology, various network attack methods come out one after the other. SQL injection has become one of the most severe threats to Web applications and seriously threatens vario...
详细信息
Polar codes are a class of codes that can achieve the symmetric capacity. They are adopted to be control code for the enhanced mobile broadband (eMBB) for the fifth generation(5G) standard. Although Polar codes can be...
详细信息
ISBN:
(数字)9781728143286
ISBN:
(纸本)9781728143293
Polar codes are a class of codes that can achieve the symmetric capacity. They are adopted to be control code for the enhanced mobile broadband (eMBB) for the fifth generation(5G) standard. Although Polar codes can be efficiently decoded by successive cancellation algorithm with complexity O(NlogN), decoding performance of this algorithm is not good enough for short codewords. The successive cancellation list(SCL) decoder is recently investigated in most studies. It has better frame error rate(FER) performance but poor latency and throughput. In this study, a parallel SCL decoder based on the graphic processing unit(GPU) is designed to reduce the latency and improve the decoding throughput. An efficient approach for sharing the intermediate values among different decoding paths is introduced. This method reduces the computing complexity and decoding latency. The implementation of parallel non-recursive decoding algorithm also increases the throughput significantly. For the typical case of code length N=1024 and list size L=4 with code rate R = 0.5, the parallel decoder based on GPU achieves throughput of 49 Mbps on Nvidia GTX 980 and 79 Mbps on Nvidia Titan X. The throughputs are 240 and 392 times higher than the decoder based on the CPU.
The development of a basic scalable preprocessing tool is the key routine to accelerate the entire computational fluid dynamics (CFD) workflow toward the exascale computing era. In this work, a parallel preprocessing ...
详细信息
Meteorology Grid Computing aims to provide scientist with seamless, reliable, secure and inexpensive access to meteorological resources. In this paper, we presented a semantic-based meteorology grid service registry, ...
详细信息
Stencil Computation has long been an omnipresent kernel of a wide range of scientific and engineering applications. There is much work investigating the stencil performance on x86 processors and accelerators such as G...
详细信息
暂无评论