Correctly synchronizing multithreaded programs is challenging, and errors can lead to program failures (e.g., atomicity violations). Existing memory consistency models rule out some possible failures, but are limited ...
详细信息
Correctly synchronizing multithreaded programs is challenging, and errors can lead to program failures (e.g., atomicity violations). Existing memory consistency models rule out some possible failures, but are limited by depending on subtle programmer-defined locking code and by providing unintuitive semantics for incorrectly synchronized code. Stronger memory consistency models assist programmers by providing them with easier-to-understand semantics with regard to memory access interleavings in parallel code. This dissertation proposes a new strong memory consistency model based on ordering-free regions (OFRs), which are spans of dynamic instructions between consecutive ordering constructs (e.g. barriers). Atomicity over ordering-free regions provides stronger atomicity than existing strong memory consistency models with competitive performance. Ordering-free regions also simplify programmer reasoning by limiting the potential for atomicity violations to fewer points in the program’s execution. This dissertation explores both software-only and hardware-supported systems that provide OFR serializability.
The married women who educate simultaneously, are faced to many challenges for managing their time. Since they have multiple and even conflicting roles, their academic achievement or their family life may be at *** pa...
详细信息
The married women who educate simultaneously, are faced to many challenges for managing their time. Since they have multiple and even conflicting roles, their academic achievement or their family life may be at *** parallel planning:a total time management model made by authors,can improve their academic achievement or not?A model which tries firstly to improve some skills about and secondly put together all important tasks .The main goal of this study was determining the effectiveness of instructing and employing this model in academic achievement in the case of married women. For doing so,a single case has been selected, multiple baseline(across subjects) design. The sample included 5, married female subjects who were selected in a purposive sampling way among Payame Noor University 2013 students. The cases average age was 24.2 years. Each subject had atleast 11 instructional, practical and monitoring sessions during 18 weeks. Study had two phases of baseline and treatment(instruction).Subjects entered in instruction respectively in 4 th ,5 th ,6 th ,7 th &8 th session. In each session, each subject responded to totally 18 shortanswer exams(with 20 questions) based on her thermic lesson design, along baseline and instruction phase. The scores reported in a 100point scale and finally graphs and visual analysis prepared on the basis of data. Comparison of the scores of baseline and instruction phase,demonstrated a clear improvement in each subjects’ scores. Based on findings parallel programming instruction was effective on academic achievement.
Because of the irregular and dynamic data structures, parallel programming in non-numerical field often requires asynchronous and unspecific number of messages. Such programs are hard to write using MPI/Pthreads, and ...
详细信息
ISBN:
(纸本)3540411283
Because of the irregular and dynamic data structures, parallel programming in non-numerical field often requires asynchronous and unspecific number of messages. Such programs are hard to write using MPI/Pthreads, and many new parallel languages, designed to hide messages under the runtime system, suffer from the execution overhead. Thus, we propose a parallel programming language Orgel that enables brief and efficient programming. An Orgel program is a set of agents connected with abstract channels called streams. The stream connections and messages axe declaratively specified, which prevents bugs due to the parallelization, and also enables effective optimization. The computation in each agent is described in usual sequential language, thus efficient execution is possible. The result of evaluation shows the overhead of concurrent switching and communication in Orgel is only 1.2 and 4.3 times larger than that. of Pthreads, respectively. In the parallel execution, we obtained 6.5-10 times speedup with 11-13 processors.
General purpose graphics processing units (GPGPUs) suitable for general purpose programming have become sufficiently affordable in the last three years to be used in personal workstations. In this paper we assess the ...
详细信息
ISBN:
(纸本)9781479974863
General purpose graphics processing units (GPGPUs) suitable for general purpose programming have become sufficiently affordable in the last three years to be used in personal workstations. In this paper we assess the usefulness of such hardware in the statistical analysis of simulation input and output data. In particular we consider the fitting of complex parametric statistical metamodels to large data samples where optimization of a statistical function of the data is needed and investigate whether use of a GPGPU in such a problem would be worthwhile. We give an example, involving loss-given-default data obtained in a real credit risk study, where use of Nelder-Mead optimization can be efficiently implemented using parallel processing methods. Our results show that significant improvements in computational speed of well over an order of magnitude are possible. With increasing interest in "big data" samples the use of GPGPUs is therefore likely to become very important.
The ability to teach parallel programming principles and techniques is becoming fundamental to prepare a new generation of programmers able to master the pervasive parallelism made available by hardware vendors. Class...
详细信息
ISBN:
(纸本)9781538649756
The ability to teach parallel programming principles and techniques is becoming fundamental to prepare a new generation of programmers able to master the pervasive parallelism made available by hardware vendors. Classical parallel programming courses leverage either low-level programming frameworks (e.g. those based on Pthreads) or higher level frameworks such as OpenMP or MPI. We discuss our teaching experience within the Master in "Computer Science and networking" where parallel programming is taught leveraging structured parallel programming principles and frameworks. The paper summarizes the results achieved in eight years of experience and shows how the adoption of a structured parallel programming approach improves the efficiency of the teaching process.
Linda is a coordination language invented by David Gelernter at Yale University [7], which, when combined with a computation language (like C) yields a high-level parallel programming language for MIMD machines. Linda...
详细信息
In this paper, a cloud parallel programming system CSSP being under development at the Institute of Informatics Systems is considered. The system is aimed to be an interactive visual environment of functional and para...
详细信息
ISBN:
(纸本)9789897583728
In this paper, a cloud parallel programming system CSSP being under development at the Institute of Informatics Systems is considered. The system is aimed to be an interactive visual environment of functional and parallel programming for supporting of computer science teaching and learning. The system will support the development, verification and debugging of architecture-independent parallel programs and their correct conversion into efficient code of parallel computing systems for its execution in clouds. In the paper, the CPPS system itself, its input functional language, and its internal graph presentation of the functional programs are described.
This paper argues for the development of more general and user-friendly parallel programming models, independent of hardware structures and concurrency concepts of operating systems theory, leading to portable program...
详细信息
ISBN:
(纸本)0818684275
This paper argues for the development of more general and user-friendly parallel programming models, independent of hardware structures and concurrency concepts of operating systems theory, leading to portable programs and easy to use languages. It then presents the BaLinda model, based on last in/first out threads that interact via a shared tuplespace, and argues that it is simple enough to be both general and easy to use. It also discusses the idea of using function-based objects as the basic unit of parallel execution and the hierarchical structure to partition tuplespaces.
Research on high-level parallel programming approaches systematically evaluate the performance of applications written using these approaches and informally argue that high-level parallel programming languages or libr...
详细信息
ISBN:
(纸本)9781479953134
Research on high-level parallel programming approaches systematically evaluate the performance of applications written using these approaches and informally argue that high-level parallel programming languages or libraries increase the productivity of programmers. In this paper we present a methodology that allows to evaluate the trade-off between programming effort and performance of applications developed using different programming models. We apply this methodology on some implementations of a function solving the all nearest smaller values problem. The high-level implementation is based on a new version of the BSP homomorphism algorithmic skeleton.
The continued miniaturization of the technology node increases not only the chip capacity but also the circuit design complexity. How does one efficiently design a chip with millions or billions transistors? This has ...
详细信息
The continued miniaturization of the technology node increases not only the chip capacity but also the circuit design complexity. How does one efficiently design a chip with millions or billions transistors? This has become a challenging problem in the integrated circuit (IC) design industry, especially for the developers of electronic design automation (EDA) tools. To boost the performance of EDA tools, one promising direction is via parallel computing. In this dissertation, we explore different parallel computing approaches, from CPU to GPU to distributed computing, for EDA applications. Nowadays multi-core processors are prevalent from mobile devices to laptops to desktop, and it is natural for software developers to utilize the available cores to maximize the performance of their applications. Therefore, in this dissertation we first focus on multi-threaded programming. We begin by reviewing a C++ parallel programming library called Cpp-Taskflow. Cpp-Taskflow is designed to facilitate programmingparallel applications, and has been successfully applied to an EDA timing analysis tool. We will demonstrate Cpp-Taskflow’s programming model and interface, software architecture and execution flow. Then, we improve Cpp-Taskflow in several aspects. First, we enhance Cpp-Taskflow’s usability through restructuring the software architecture. Second, we introduce task graph composition to support composability and modularity, which makes it easier for users to construct large and complex parallel patterns. Third, we add a new task type in Cpp-Taskflow to let users control the graph execution flow. This feature empowers the graph model with the ability to describe complex control flow. Aside from the above enhancements, we have designed a new scheduler to adaptively manage the threads based on available parallelism. The new scheduler uses a simple and effective strategy which can not only prevent resource from being underutilized, but also mitigate resource over-subscription
暂无评论