Multicore processors are nowadays widespread across desktop, laptop, server, and even smartphone and tablets devices. The rise of such powerful execution environments calls for new parallel and distributed Description...
详细信息
ISBN:
(纸本)9781479941162
Multicore processors are nowadays widespread across desktop, laptop, server, and even smartphone and tablets devices. The rise of such powerful execution environments calls for new parallel and distributed Description Logics (DLs) reasoning algorithms. Many sophisticated optimizations have been explored and have considerably enhanced DL reasoning with light ontologies. Non-determinism remains a main source of complexity for implemented systems handling ontologies relying on more expressive logics. In this work, we explore handling non-determinism with DL languages enabling qualified cardinality restrictions. We implement a fork/join parallel framework into our hybrid algebraic reasoner, which handles qualified cardinality restrictions and nominals using in-equation solving. Preliminary evaluation shows encouraging results.
Experiments are a fundamental part of science. They are needed when the system under evaluation is too complex to be analytically described and they serve to empirically validate hypotheses. This work presents the exp...
详细信息
ISBN:
(纸本)9781479941162
Experiments are a fundamental part of science. They are needed when the system under evaluation is too complex to be analytically described and they serve to empirically validate hypotheses. This work presents the experimentation framework ExCovery for dependability analysis of distributed processes. It provides concepts that cover the description, execution, measurement and storage of experiments. These concepts foster transparency and repeatability of experiments for further sharing and comparison. ExCovery has been tried and refined in a manifold of dependability related experiments during the last two years. A case study is provided to describe service discovery (SD) as experiment process (EP). A working prototype for IP networks runs on the distributed Embedded System (DES) wireless testbed at the Freie Universitat Berlin.
Reading input from primary storage (i.e. the ingest phase) and aggregating results (i.e. the merge phase) are important pre- and post-processing steps in large batch computations. Unfortunately, today's data sets ...
详细信息
ISBN:
(纸本)9781479941162
Reading input from primary storage (i.e. the ingest phase) and aggregating results (i.e. the merge phase) are important pre- and post-processing steps in large batch computations. Unfortunately, today's data sets are so large that the ingest and merge job phases are now performance bottlenecks. In this paper, we mitigate the ingest and merge bottlenecks by leveraging the scale-up MapReduce model. We introduce an ingest chunk pipeline and a merge optimization that increases CPU utilization (50 - 100%) and job phase speedups (1.16x - 3.13x) for the ingest and merge phases. Our techniques are based on well-known algorithms and scale-out MapReduce optimizations, but applying them to a scale-up computation framework to mitigate the ingest and merge bottlenecks is novel.
Although some colleges and universities have access to parallel computing hardware, none that we are aware of can provide dedicated parallel computing hardware to each student. Instead, institutions often provide shar...
详细信息
ISBN:
(纸本)9781479941162
Although some colleges and universities have access to parallel computing hardware, none that we are aware of can provide dedicated parallel computing hardware to each student. Instead, institutions often provide shared parallel computing equipment for the students, if they can afford to provide any. It is difficult for students to really get an understanding of the performance of their programs and how they scale when they are using shared equipment that is not dedicated to them and where other students or other users may interfere with their work. The current emphasis on network security at some institutions also prevents some shared resources that students could use from being easily accessible or sometimes accessible at all. We provide a parts list, information, and microSD card images to make a small, affordable compute cluster that each student in a parallel computing course can purchase in lieu of a textbook so each student has their own private compute cluster.
This paper presents experience using a research-infused teaching approach towards an undergraduate parallel programming course. The research-teaching nexus is applied at various levels, first by using research-led tea...
详细信息
ISBN:
(纸本)9781479941162
This paper presents experience using a research-infused teaching approach towards an undergraduate parallel programming course. The research-teaching nexus is applied at various levels, first by using research-led teaching of core parallel programming concepts, as well as teaching the latest developments from the affiliated research group. The bulk of the course, however, focuses more on the student-driven research-based and research-tutored teaching approaches, where students actively participate in groups on research projects;students are fully immersed in the learning activity of their respective project, while at the same time participating in discussions of wider parallel programming topics across other groups. This intimate affiliation between the undergraduate course and the research group results in a wide range of benefits for all those involved.
Large-scale graph structures are considered as a keystone for many emerging high-performance computing applications in which Breadth-First Search (BFS) is an important building block. For such graph structures, BFS op...
详细信息
ISBN:
(纸本)9781479941162
Large-scale graph structures are considered as a keystone for many emerging high-performance computing applications in which Breadth-First Search (BFS) is an important building block. For such graph structures, BFS operations tends to be memory-bound rather than compute-bound. In this paper, we present an efficient reconfigurable architecture for parallel BFS that adopts new optimizations for utilizing memory bandwidth. Our architecture adopts a custom graph representation based on compressed-sparse raw format (CSR), as well as a restructuring of the conventional BFS algorithm. By taking maximum advantage of available memory bandwidth, our architecture continuously keeps our processing elements active. Using a commercial high-performance reconfigurable computing system (the Convey HC-2), our results demonstrate a 5x speedup over previously published FPGA-based implementations.
A distributed algorithm is proposed in order to control block motion of a reconfigurable micro-electro-mechanical modular surface. The modular surface is designed to convey fragile and tiny micro-parts. The distribute...
详细信息
ISBN:
(纸本)9781479941162
A distributed algorithm is proposed in order to control block motion of a reconfigurable micro-electro-mechanical modular surface. The modular surface is designed to convey fragile and tiny micro-parts. The distributed algorithm solves a discrete trajectory optimization problem. In particular, the algorithm computes the shortest path between two points of the modular surface using a strategy based on minimum hop count. The proposed method based on distributed asynchronous iterative elections is scalable.
We propose a novel computational model for GPU. Known parallel computational models such as the PRAM model are not appropriate for evaluating GPU algorithms. Our model, called AGPU, abstracts the essence of current GP...
详细信息
ISBN:
(纸本)9781479941162
We propose a novel computational model for GPU. Known parallel computational models such as the PRAM model are not appropriate for evaluating GPU algorithms. Our model, called AGPU, abstracts the essence of current GPU architectures such as global and shared memory, memory coalescing and bank conflicts. We can therefore evaluate asymptotic behavior of GPU algorithms more accurately than known models and we can develop algorithms that are efficient on many real architectures. As a showcase, we first analyze known comparison-based sorting algorithms using the AGPU model and show that they are not I/O optimal, that is, the number of global memory accesses is more than necessary. Then we propose a new algorithm which uses an asymptotically optimal number of global memory accesses and whose time complexity is also nearly optimal.
This paper proposes a parallel large neighborhood search-based heuristic for solving the Disjunctively Constrained Knapsack Problem (DCKP), which has an important impact on the transportation issues. The proposed appr...
详细信息
ISBN:
(纸本)9781479941162
This paper proposes a parallel large neighborhood search-based heuristic for solving the Disjunctively Constrained Knapsack Problem (DCKP), which has an important impact on the transportation issues. The proposed approach is designed using Message Passing Interface (MPI). The effectiveness of MPI's allows us to build a flexible message passing model of parallel programming. Meanwhile, large neighborhood search heuristic is introduced in the model in order to propose an efficient resolution method yielding high quality solutions. The results provided by the proposed method are compared to those reached by the Cplex solver and to those obtained by one of the best methods of the literature. As shown from the experimental results, the proposed model is able to provide high quality solutions with fast runtime on most cases of the benchmark literature.
distributed search is important for finding solutions to hard problems in artificial intelligence. Building distributed search systems can be difficult because the steps required to solve these problems are interdepen...
详细信息
ISBN:
(纸本)9781479941162
distributed search is important for finding solutions to hard problems in artificial intelligence. Building distributed search systems can be difficult because the steps required to solve these problems are interdependent. Fortunately, aspects of search systems exhibit commonalities that allow them to be distributed using several different paradigms. These paradigms can be used as the basis for libraries to make implementing new systems, or distributing existing systems, easier. DisSLib:CC is such a library. DisSLib:CC implements the distributed search paradigm search with a central common search state. In this paradigm, agents collaborate to update a central search state. Systems built with DisSLib:CC require very little extra code to implement compared to the standalone versions. These systems can also show improvements in wall-clock run-times, which can be improved by varying the meta-parameters of the distribution paradigm. One such parameter is the number of steps, transitions, that the agents execute before consulting the common search state and in this paper we show how varying this meta-parameter improves the efficiency of the systems.
暂无评论