When implementing data-parallel programming languages such as CUDA, OpenCL on CPUs, synchronization must be simulated correctly. The basic method is thread-based, which means all thread must execute one instruction in...
详细信息
ISBN:
(纸本)9781479909735
When implementing data-parallel programming languages such as CUDA, OpenCL on CPUs, synchronization must be simulated correctly. The basic method is thread-based, which means all thread must execute one instruction in turn before execute the next one. In this paper, we propose function splitting to treat synchronization in a coroutine style but not just thread-based. It splits the data-parallel function presented by low-level intermediate representation into several parts by simulating synchronization. We evaluate our method in translating PTX kernels to multi-core CPUs, the result of which shows this method could promotes performance by 15% compared to thread-based method. Our main contribution is a generous synchronization treatment that performs on low-level intermediate code given by a control flow graph in SSA form.
Mit Blick darauf, dass Tenside die Form und Größe von Mikro‐ und Nanopartikeln steuern können, sollten sie auch in der Lage sein, das Wachstum von makroskopischen Kristallen zu dirigieren. Dieser Kurza...
详细信息
Mit Blick darauf, dass Tenside die Form und Größe von Mikro‐ und Nanopartikeln steuern können, sollten sie auch in der Lage sein, das Wachstum von makroskopischen Kristallen zu dirigieren. Dieser Kurzaufsatz fasst jüngste Entwicklungen bei der Verwendung von Tensiden zur Herstellung neuer kristalliner anorganischer Materialien aus dem Bereich der Chalkogenide, Metall‐organischen Gerüstverbindungen und Zeolithanaloga zusammen. Die Rolle der Tenside in den verschiedenen Reaktionssystemen wird diskutiert.
To deal with the problem of external storages management in multiple virtual machines environment, a system design scheme of external storages management is proposed with the idea of virtual memory management and prot...
详细信息
In this work, we study the fractal and multifractal properties of a family of fractal networks introduced by Gallos et al (2007 Proc. Nat. Acad. Sci. USA 104 7746). In this fractal network model, there is a parameter ...
详细信息
Emotion recognition at sentence level is one of the fundamental problems of textual emotion understanding. Based on the observation that sentence emotional focus can be expressed by some clauses in this sentence, this...
详细信息
Emotion recognition at sentence level is one of the fundamental problems of textual emotion understanding. Based on the observation that sentence emotional focus can be expressed by some clauses in this sentence, this paper proposes to And the emotional focus for sentence emotion recognition. For the sake of breaking through the problems brought about by depending on emotion lexicons, we first recognize word emotions in a sentence based on Maximum entropy model. And then homogeneous Markov model is built for clause emotion recognition;After that, a strategy based on emotion selection is proposed for a sentence with multiple clauses, and genetic algorithm is used for clause selection by textual feature weighting. The experimental results show that, comparing with the baseline, there are 9.1% and 3.6% improvement respectively for two different evaluations. It is demonstrated that finding emotional focus by clause selection is able to improve the performance of sentence emotion recognition significantly.
Chaos is a similar and random process which is very sensitive to initial value in deterministic system. It is a performance of nonlinear dynamical system with built-in randomness. Combined with the advantages and disa...
详细信息
Chaos is a similar and random process which is very sensitive to initial value in deterministic system. It is a performance of nonlinear dynamical system with built-in randomness. Combined with the advantages and disadvantages of the present chaos encryption model, the paper proposes a chaotic stream cipher model based on chaos theory, which not only overcomes finite precision effect, but also improves the randomness of chaotic system and output sequence. The Sequence cycle theory generated by the algorithm can reach more than 10600 at least, which completely satisfies the actual application requirements of stream cipher system.
Fuzz testing is an automated black-box testing technique providing random data as input to a software system in the hope to find vulnerability. In order to be effective, the fuzzed input must be common enough to pass ...
详细信息
The paper presents an innovative design for a wheeled hopping robot. Our main objective was to obtain a stable and efficient robot, which can jump over obstacles and has the ability to reach a certain ledge. The innov...
详细信息
The paper presents an innovative design for a wheeled hopping robot. Our main objective was to obtain a stable and efficient robot, which can jump over obstacles and has the ability to reach a certain ledge. The innovative design of our robot has the capability to jump to a certain distance even with a zero initial velocity provided by the wheels movement. Also, we took into consideration the wind speed which can prevent our robot to reach the desired destination, and compensated with an initial velocity on the direction of movement. For testing we designed a simulation which had as inputs different initial conditions to test and present the jumping capability of our robot in different jumping conditions. In the end we provide a jump area which the robot can reach for certain initial conditions, so we can later chose the optimal one for reaching the target position.
While multicore processors increase throughput for multi-programmed and multithreaded codes, many important applications are single threaded and thus do not benefit. Automatic parallelization techniques play an import...
详细信息
ISBN:
(纸本)9781479914449
While multicore processors increase throughput for multi-programmed and multithreaded codes, many important applications are single threaded and thus do not benefit. Automatic parallelization techniques play an important role in migrating singe threaded applications to multicore platforms. Unfortunately, the prevalence of control flow, recursive data structures, and general pointer accesses in ordinary programs renders the traditional automatic parallelization techniques unsuitable. Parallel-Stage Decoupled Software Pipelining (PS-DSWP) is proposed to exploit fine-grained pipeline parallelism lurking in ordinary programs with the existence of all kinds of dependences, including arbitrary control dependences, at the instruction level. But it requires knowledge of architectural properties and hardware support of a communication channel and two special instructions. We propose an improved PS-DSWP algorithm based on OpenMP in this paper. It is implemented without relying on CPU architectures by using a high level intermediate representation. Moreover, the Program Dependence Graph (PDG) used in the algorithm is built based on the basic blocks, which exploits coarser-grained parallelism than the original PS-DSWP transformation with PDG based on instructions. OpenMP is employed in our algorithm to assign task and implement synchronization among threads while avoiding dependence on hardware support. We evaluate the loops with complex memory patterns and control flow, which cannot be dealt with by traditional techniques, on multicore platform. As a result, they can be parallelized and gain significant performance improvement with our algorithm. We obtain a maximum speedup as high as 2.07x and on average 1.39x with 5 threads.
When implementing SPMD programs on multi core platforms, whole function vectorization is an important optimization method. SPMD program has drawback that lots of instructions across multi threads are redundant which i...
详细信息
When implementing SPMD programs on multi core platforms, whole function vectorization is an important optimization method. SPMD program has drawback that lots of instructions across multi threads are redundant which is sustained in vectorization. This paper proposes to alleviate this overhead by detecting scalar operations and extract them out in vectorization instructions. An algorithm is designed to deal with control flow and data flow synchronously in which convergent and invariance analysis is employed to statically identify convergent execution and invariant values or instructions. Our algorithm is effectively on implementing SPMD programs on multi core platforms. The experiments show our method could improve the execution efficiency by 13.3%.
暂无评论