To deal with the problem of external storages management in multiple virtual machines environment, a system design scheme of external storages management is proposed with the idea of virtual memory management and prot...
详细信息
When implementing data-parallel programming languages such as CUDA, OpenCL on CPUs, synchronization must be simulated correctly. The basic method is thread-based, which means all thread must execute one instruction in...
详细信息
ISBN:
(纸本)9781479909735
When implementing data-parallel programming languages such as CUDA, OpenCL on CPUs, synchronization must be simulated correctly. The basic method is thread-based, which means all thread must execute one instruction in turn before execute the next one. In this paper, we propose function splitting to treat synchronization in a coroutine style but not just thread-based. It splits the data-parallel function presented by low-level intermediate representation into several parts by simulating synchronization. We evaluate our method in translating PTX kernels to multi-core CPUs, the result of which shows this method could promotes performance by 15% compared to thread-based method. Our main contribution is a generous synchronization treatment that performs on low-level intermediate code given by a control flow graph in SSA form.
Existing OpenMP cost models does not give enough thought to the implementation details of OpenMP programs so they cannot be applied widely to different types of parallel loops. To solve this problem, this study extend...
详细信息
Chaos is a similar and random process which is very sensitive to initial value in deterministic system. It is a performance of nonlinear dynamical system with built-in randomness. Combined with the advantages and disa...
详细信息
Chaos is a similar and random process which is very sensitive to initial value in deterministic system. It is a performance of nonlinear dynamical system with built-in randomness. Combined with the advantages and disadvantages of the present chaos encryption model, the paper proposes a chaotic stream cipher model based on chaos theory, which not only overcomes finite precision effect, but also improves the randomness of chaotic system and output sequence. The Sequence cycle theory generated by the algorithm can reach more than 10600 at least, which completely satisfies the actual application requirements of stream cipher system.
With the development of network security research, network attack modeling and analysis techniques have been paid more and more attention. A generalized stochastic colored Petri Net (GSCPN) Model is proposed. To each ...
详细信息
While multicore processors increase throughput for multi-programmed and multithreaded codes, many important applications are single threaded and thus do not benefit. Automatic parallelization techniques play an import...
详细信息
ISBN:
(纸本)9781479914449
While multicore processors increase throughput for multi-programmed and multithreaded codes, many important applications are single threaded and thus do not benefit. Automatic parallelization techniques play an important role in migrating singe threaded applications to multicore platforms. Unfortunately, the prevalence of control flow, recursive data structures, and general pointer accesses in ordinary programs renders the traditional automatic parallelization techniques unsuitable. Parallel-Stage Decoupled Software Pipelining (PS-DSWP) is proposed to exploit fine-grained pipeline parallelism lurking in ordinary programs with the existence of all kinds of dependences, including arbitrary control dependences, at the instruction level. But it requires knowledge of architectural properties and hardware support of a communication channel and two special instructions. We propose an improved PS-DSWP algorithm based on OpenMP in this paper. It is implemented without relying on CPU architectures by using a high level intermediate representation. Moreover, the Program Dependence Graph (PDG) used in the algorithm is built based on the basic blocks, which exploits coarser-grained parallelism than the original PS-DSWP transformation with PDG based on instructions. OpenMP is employed in our algorithm to assign task and implement synchronization among threads while avoiding dependence on hardware support. We evaluate the loops with complex memory patterns and control flow, which cannot be dealt with by traditional techniques, on multicore platform. As a result, they can be parallelized and gain significant performance improvement with our algorithm. We obtain a maximum speedup as high as 2.07x and on average 1.39x with 5 threads.
When implementing SPMD programs on multi core platforms, whole function vectorization is an important optimization method. SPMD program has drawback that lots of instructions across multi threads are redundant which i...
详细信息
When implementing SPMD programs on multi core platforms, whole function vectorization is an important optimization method. SPMD program has drawback that lots of instructions across multi threads are redundant which is sustained in vectorization. This paper proposes to alleviate this overhead by detecting scalar operations and extract them out in vectorization instructions. An algorithm is designed to deal with control flow and data flow synchronously in which convergent and invariance analysis is employed to statically identify convergent execution and invariant values or instructions. Our algorithm is effectively on implementing SPMD programs on multi core platforms. The experiments show our method could improve the execution efficiency by 13.3%.
Traditional multicast protocol forms multicast trees rooted at different sources to forward packets. If the multicast sources and receivers are in different domains, these trees will produce a great number of multicas...
详细信息
Traditional multicast protocol forms multicast trees rooted at different sources to forward packets. If the multicast sources and receivers are in different domains, these trees will produce a great number of multicast states in the backbone, resulting in poor scalability. Therefore, we propose a one Wide-Sense Circuit Tree per Traffic Class based inter-domain multicast (WSCT-TC), in which a Wide-Sense Circuit Tree (WSCT) is established for a class of multicast traffic. The WSCT is established in the backbone, along which multicast packets are forwarded by label switching. The spec of WSCT can be reconfigured according to the QoS (Quality of Service) requirement of multicast applications, to provide preferable QoS. Simulating experiment shows that WSCT-TC behaves better scalability.
Integrating a large number of simple cores on the chip to provide the desired performance and throughput, microprocessor has entered the many core era. In order to fully extract the ability of the many core processor,...
详细信息
ISBN:
(纸本)9781479952465
Integrating a large number of simple cores on the chip to provide the desired performance and throughput, microprocessor has entered the many core era. In order to fully extract the ability of the many core processor, we propose speedup models for many core architecture in this paper. Under the assumption of Hill-Marty model, we deduce our formulas based on Gustafson's Law and Sun-Ni's Law. Then, compared with the Hill-Marty model, we theoretically analyze the best allocation under the given resources. Furthermore, we apply the conclusions of our models to evaluate current many core processors and predict concrete future architecture. Our results show that the many core architecture is capable of extensive scalability and being beneficial to promote the performance, especially heterogeneous one. By using simple analytical models, we provide a better understanding of architecture design and our work complement existing studies.
In order to achieve higher estimation accuracy of the embedding change rate of a stego object, an ensemble learning-based estimation method is presented. First of all, a framework of embedding change rate estimation b...
详细信息
暂无评论