As chip multi-processors (CMPs) are becoming more and more complex, software solutions such as parallel programming models are attracting a lot of attention. Task-based parallel programming models offer an appealing a...
详细信息
ISBN:
(纸本)9783319920405;9783319920399
As chip multi-processors (CMPs) are becoming more and more complex, software solutions such as parallel programming models are attracting a lot of attention. Task-based parallel programming models offer an appealing approach to utilize complex CMPs. However, the increasing number of cores on modern CMPs is pushing research towards the use of fine grained parallelism. Task-based programming models need to be able to handle such workloads and offer performance and scalability. Using specialized hardware for boosting performance of task-based programming models is a common practice in the research community. Our paper makes the observation that task creation becomes a bottleneck when we execute fine grained parallel applications with many taskbased programming models. As the number of cores increases the time spent generating the tasks of the application is becoming more critical to the entire execution. To overcome this issue, we propose TaskGenX. TaskGenX offers a solution for minimizing task creation overheads and relies both on the runtime system and a dedicated hardware. On the runtime system side, TaskGenX decouples the task creation from the other runtime activities. It then transfers this part of the runtime to a specialized hardware. We draw the requirements for this hardware in order to boost execution of highly parallel applications. From our evaluation using 11 parallel workloads on both symmetric and asymmetric multicore systems, we obtain performance improvements up to 15x , averaging to 3.1x over the baseline.
Recently Smart Mobile Access Point (SMAP) based architectures have emerged as a promising solution for creating smart solutions supporting monitoring of special phenomena. SMAP allow us to predict communication activi...
详细信息
ISBN:
(纸本)9781538693803
Recently Smart Mobile Access Point (SMAP) based architectures have emerged as a promising solution for creating smart solutions supporting monitoring of special phenomena. SMAP allow us to predict communication activities in a system using the information collected from the network, and select the best approach to support the network at any given time. To improve the network performance, SMAPs can autonomously change their positions. They communicate with each other and carry out distributed computing tasks, constituting a mobile fog-computing platform. However, the communication cost becomes a critical factor. In this paper, we propose a compound method to select the best near-optimal placement of SMAPs with the goal to maximize the monitoring coverage and to minimize the communication cost. Our approach combines a parallel implementation of the Imperialist Competitive Algorithm (ICA) with Kruskal's Algorithm.
The complexities involved in parallel programming encourage frameworks to detach programmers from these concerns via higher-level abstraction. The high-performance nature of parallel computing drifts the focus of thes...
详细信息
ISBN:
(纸本)9781538655559
The complexities involved in parallel programming encourage frameworks to detach programmers from these concerns via higher-level abstraction. The high-performance nature of parallel computing drifts the focus of these programming environments towards facilitating and safeguarding faster computations. Therefore, aspects such as asynchronous graphical user interfaces (GUIs) do not see as much emphasis, even though many applications today depend on concurrent human-computer interactions. The significance of this topic is growing such that facilitating the efficient management of asynchronous GUI operations is currently a virtue, but will soon become necessary for parallel-programming frameworks. This paper discusses an unobtrusive and annotation-based approach for managing different types of asynchronous GUI operations within the layout of familiar sequential code. The proposed solution minimizes the restructuring of sequential code, in order to simplify developing, testing and maintaining GUI-based applications. Furthermore, the paper presents an implementation of the concept for @PT, a parallel programming environment based on Java annotations. The evaluation discussed in this paper suggests that the proposed mechanism is valid, and demonstrates timely and efficient handling of asynchronous GUI operations.
In fault tolerant systems, applications are replicated and executed to enable error detection and recovery. If one replica application fails, another is able to take its place and provide the correct results. This con...
详细信息
ISBN:
(纸本)9783800749577
In fault tolerant systems, applications are replicated and executed to enable error detection and recovery. If one replica application fails, another is able to take its place and provide the correct results. This concept can benefit from parallel execution on separate execution units. The rise of multicore platforms supports the development of parallel software, by providing the adequate hardware. However, this raises challenges regarding the synchronization of the redundant strings of execution. Replica determinism means that given the same input, identical programs provide the same output. To ensure replica determinism, requirements regarding the synchronization can be split in two domains: data and time. This paper examines the state of the art of synchronization techniques for parallel replicated execution in the context of fault tolerant systems. We analyze the requirements regarding synchronization within the time and data domain and compare different concepts of hardware (multicore, multiprocessor and multi-PCB) and software (processes, threads).
Medical images are corrupted by different types of noises caused by the equipment itself. It is very important to obtain precise images to facilitate accurate observations for the given application. Removing of noise ...
详细信息
ISBN:
(数字)9781510620766
ISBN:
(纸本)9781510620766
Medical images are corrupted by different types of noises caused by the equipment itself. It is very important to obtain precise images to facilitate accurate observations for the given application. Removing of noise from images is now a very challenging issue in the field of medical image processing. This work undertake the study of noise removal techniques in medical image by using fast implementation of different digital filters, such as average, median and Gaussian filter. Processing of X-ray medical images takes a significant time. Now days modern hardware allows to use parallel technology for image processing on CPU and GPU. Using GPU processing technology were proposed parallel implementations of noise reduction algorithm taking into account the data parallelism. The experimental study conducted on medical X-ray image, so that to choose the best filters considering medical task and time of processing. The comparison of the implementation of fast filters algorithm and GPU implementation show great increase in performance. Graphics processing units (GPUs) are used today in a wide range of applications, mainly because they can dramatically accelerate parallel computing. In the field of medical imaging, GPUs are in some cases crucial for enabling practical use of computationally demanding algorithms.
Memory transactions are becoming more popular as chip manufacturers are building native support for their execution. Although current Intel and IBM microprocessors support transactions in their instruction set archite...
详细信息
ISBN:
(纸本)9781728137728
Memory transactions are becoming more popular as chip manufacturers are building native support for their execution. Although current Intel and IBM microprocessors support transactions in their instruction set architectures, there is still room for improvement in the compiler and runtime front. The GNU Compiler Collection (GCC) has language support for transactions, although performance is still a hindrance for its wider use. In this paper we perform an up-to-date study of the GCC transactional code generation and highlight where the main performance losses are coming from. Our study indicates that one of the main source of inefficiency is the read and write barriers inserted by the compiler. Most of this instrumentation is required because the compiler cannot determine, at compile time, whether a region of memory will be accessed concurrently or not. To overcome those limitations, we propose new language constructs that allow programmers to specify which memory locations should be free from instrumentation. Initial experimental results show a good speedup when barriers are elided using our proposed language support compared to the original code generated by GCC.
Graphics Processing Units (GPUs) have become a vital hardware resource for the industry and research community due to their high computing capabilities. Despite this, GPUs have not been introduced into the undergradua...
详细信息
ISBN:
(纸本)9781728101903
Graphics Processing Units (GPUs) have become a vital hardware resource for the industry and research community due to their high computing capabilities. Despite this, GPUs have not been introduced into the undergraduate curriculum of Computer Engineering and are barely covered in graduate courses. Bridging the gap between university curriculum and industry requirements for GPU expertise is ongoing, but this process takes time. Offering an immediate opportunity for students to learn GPU programming is key for their professional growth. The Northeastern University Computer Architecture Research Lab offers a free GPU programming course to incentivize students from all disciplines to learn how to efficiently program a GPU. In this paper, we discuss the methods used to keep students engaged in a course with no academic obligations. After applying these strategies, we were able to retain more than 80% of the students who started the course. Moreover, the students gave positive feedback on these strategies.
Algorithmic skeletons are patterns of parallel computations. Skeletal parallel programming eases parallel programming: a program is merely a composition of such patterns. Data-parallel skeletons operate on parallel da...
详细信息
ISBN:
(数字)9781728126166
ISBN:
(纸本)9781728126173
Algorithmic skeletons are patterns of parallel computations. Skeletal parallel programming eases parallel programming: a program is merely a composition of such patterns. Data-parallel skeletons operate on parallel data-structures that have often sequential counterparts. In algorithmic skeleton approaches that offer a global view of programs, a parallel program has therefore a structure similar to a sequential program but operates on parallel data-structures. PySke is such an algorithmic skeleton library for Python to program shared or distributed memory parallel architectures in a simple way. This paper presents an extension to PySke: new algorithmic skeletons on parallel lists. This extension is evaluated on an application.
PySke is a library of parallel algorithmic skeletons in Python designed for list and tree data structures. Such algorithmic skeletons are high-order functions implemented in parallel. An application developed with PyS...
详细信息
ISBN:
(数字)9781728144849
ISBN:
(纸本)9781728144856
PySke is a library of parallel algorithmic skeletons in Python designed for list and tree data structures. Such algorithmic skeletons are high-order functions implemented in parallel. An application developed with PySke is a composition of skeletons. To ease the write of parallel programs, PySke does not follow the Single Program Multiple Data (SPMD) paradigm but offers a global view of parallel programs to users. This approach aims at writing scalable programs easily. In addition to the library, we present experiments performed on a high-performance computing cluster (distributed memory) on a set of example applications developed with PySke.
Multiparty session types (MST) are a well-established type theory that describes the interactive structure of a fixed number of components from a global point of view and type-checks the components through projection ...
详细信息
Multiparty session types (MST) are a well-established type theory that describes the interactive structure of a fixed number of components from a global point of view and type-checks the components through projection of the global type onto the participants of the session. They guarantee communication-safety for a language of multiparty sessions (LMS), i.e., distributed, parallel components can exchange values without deadlocking and unexpected message types. Several variants of MST and LMS have been proposed to study key features of distributed and parallel programming. We observe that the population of the considered variants follows from only one ancestor, i.e. the original LMS/MST, and there are overlapping traits between features of the considered variants and the original. These hamper evolution of session types and languages and their adoption in practice. This paper addresses the following question: What are the essential features for MST and LMS, and how can these be modelled with simple constructs? To the best of our knowledge, this is the first time this question has been addressed. We performed a systematic analysis of the features and the constructs in MST, LMS, and the considered variants to identify the essential features. The variants are among the most influential (according to Google Scholar) and well-established systems that cover a wide set of areas in distributed, parallel programming. We used classical techniques of formal models such as BNF, structural congruence, small step operational semantics and typing judgments to build our language and type system. Lastly, the coherence of operational semantics and type system is proven by induction. This paper proposes a set of essential features, a language of structured interactions and a type theory of comprehensive multiparty session types, including global types and type system. The analysis removes overlapping features and captures the shared traits, thereby introducing the essential features. Th
暂无评论