Although atomicity plays a key role in data operations of shared variables in parallel computation, researchers haven't treated atomicity in Python in much detail. This study provides a novel approach to integrate...
详细信息
Although atomicity plays a key role in data operations of shared variables in parallel computation, researchers haven't treated atomicity in Python in much detail. This study provides a novel approach to integrate the CPU-based atomic C APIs into Python shared variables by C Foreign Function Interface for Python (CFFI) on all major platforms and utilises Cython to optimise calculation in CPython. Evidence shows that the resulting product, Shared Atomic Enterprise (SAE), could accelerate data operations on shared data types to a large extent. These findings provide a solid evidence base for the massive utilisation of Python atomic operations in parallel computation and concurrentprogramming.
There is a wide range of implementation approaches to multi-threading. User-level threads are efficient because threads can be scheduled by a user-defined scheduling policy that suits the needs of the specific applica...
详细信息
ISBN:
(纸本)9781728174457
There is a wide range of implementation approaches to multi-threading. User-level threads are efficient because threads can be scheduled by a user-defined scheduling policy that suits the needs of the specific application. However, user-level threads are unable to handle blocking system-calls efficiently. To the contrary, kernel-level threads incur large overhead during context switching. Kernel-level threads are scheduled by the scheduling policy provided by the OS kernel which is hard to customize to application needs. We propose a novel thread execution model, bi-level thread, that combines the best aspects of the two conventional thread implementations. A bi-level thread can be either a kernel-level thread or a user-level thread at runtime. Consequently, the context switching overhead of a bi-level thread is as low as that of user-level threads, but thread scheduling can be defined by user policies. Blocking system-calls, on the other hand, can be called as a kernel-level thread without blocking the execution of other user-level threads. Furthermore, the proposed bi-level thread is combined with an address space sharing technique which allows processes to share the same virtual address space. Processes sharing the same address space can be scheduled with the same technique as user-level threads, thus we call this implementation a user-level process. However, the main difference between threads and processes is that threads share most of the kernel state of the underlying process, such as process ID and file descriptors, whereas different processes do not. A user-level process must guarantee that the system-calls always access the appropriate kernel information that belongs to the particular process. We call this system-call consistency. In this paper, we show that the proposed bi-level threads, implemented in an address space sharing library, can resolve the blocking system-call issue of user-level threads, while at the same time it retains system-call consisten
As the hardware found within data centers becomes more heterogeneous, it is important to allow for efficient execution of algorithms across architectures. We present RAPID, a high-level programming language and combin...
详细信息
As the hardware found within data centers becomes more heterogeneous, it is important to allow for efficient execution of algorithms across architectures. We present RAPID, a high-level programming language and combined imperative and declarative model for functionally-and performance-portable execution of sequential pattern-matching applications across CPUs, GPUs, Field-Programmable Gate Arrays (FPGAs), and Micron's D480 AP. RAPID is clear, maintainable, concise, and efficient both at compile and run time. Language features, such as code abstraction and parallel control structures, map well to pattern-matching problems, providing clarity and maintainability. For generation of efficient runtime code, we present algorithms to convert RAPID programs into finite automata. Our empirical evaluation of applications in the ANMLZoo benchmark suite demonstrates that the automata processing paradigm provides an abstraction that is portable across architectures. We evaluate RAPID programs against custom, baseline implementations previously demonstrated to be significantly accelerated. We also find that RAPID programs are much shorter in length, are expressible at a higher level of abstraction than their handcrafted counterparts, and yield generated code that is often more compact.
Deployed through skeleton frameworks, structured parallelism yields a clear and consistent structure across platforms by distinctly decoupling computations from the structure in a parallel programme. Structured progra...
详细信息
Deployed through skeleton frameworks, structured parallelism yields a clear and consistent structure across platforms by distinctly decoupling computations from the structure in a parallel programme. Structured programming is a viable and effective means of providing the separation of concerns, as it subdivides a system into building blocks (modules, skids or components) that can be independently created, and then used in different systems to drive multiple functionalities. Depending on its defined semantic, each building block wraps a unit of computing function, where the valid assembly of these building blocks forms a high-level structural parallel programming model. This paper proposes a grammar to build block components to execute computational functions in heterogeneous multi-core architectures. The grammar is validated against three different families of computing models: skeleton-based, general purpose, and domain-specific. In conjunction with the protocol, the grammar produces fully instrumented code for an application suite using the skeletal framework FastFlow.
Structured parallel programs ought to be conceived as two separate and complementary entities: computation, which expresses the calculations in a procedural manner, and coordination, which abstracts the interaction an...
详细信息
Structured parallel programs ought to be conceived as two separate and complementary entities: computation, which expresses the calculations in a procedural manner, and coordination, which abstracts the interaction and communication. By abstracting commonly used patterns of parallel computation, communication, and interaction, algorithmic skeletons enable programmers to code algorithms without specifying platform-dependent primitives. This article presents a literature review on algorithmic skeleton frameworks (ASKF), parallel software development environments furnishing a collection of parameterizable algorithmic skeletons, where the control flow, nesting, resource monitoring, and portability of the resulting parallel program is dictated by the ASKF as opposed to the programmer. Consequently, the ASKF can be positioned as high-level structured parallel programming enablers, as their systematic utilization permits the abstract description of programs and fosters portability by focusing on the description of the algorithmic structure rather than on its detailed implementation. Copyright (C) 2010 John Wiley & Sons, Ltd.
Algorithmic skeletons abstract commonly used patterns of parallel computation, communication, and interaction. Based on the algorithmic skeleton concept, structured parallelism provides a high-level parallel programmi...
详细信息
Algorithmic skeletons abstract commonly used patterns of parallel computation, communication, and interaction. Based on the algorithmic skeleton concept, structured parallelism provides a high-level parallel programming technique that allows the conceptual description of parallel programs while fostering platform independence and algorithm abstraction. This work presents a methodology to improve skeletal parallel programming in heterogeneous distributed systems by introducing adaptivity through resource awareness. As we hypothesise that a skeletal program should be able to adapt to the dynamic resource conditions over time using its structural forecasting information, we have developed adaptive structured parallelism (ASPARA). ASPARA is a generic methodology to incorporate structural information at compilation into a parallel program, which will help it to adapt at execution. ASPARA comprises four phases: programming, compilation, calibration, and execution. We illustrate the feasibility of this approach and its associated performance improvements using independent case studies based on two algorithmic skeletons-the task farm and the pipeline-evaluated in a non-dedicated heterogeneous multi-cluster system. Copyright (C) 2010 John Wiley & Sons, Ltd.
The actor model is developed as a foundation for concurrent object-oriented programming. The model provides for non-interference of state changes with multiple threads, inherent concurrency, reconfigurability, encapsu...
详细信息
暂无评论