Large-scale graph structures are considered as a keystone for many emerging high-performance computing applications in which Breadth-First Search (BFS) is an important building block. For such graph structures, BFS op...
详细信息
ISBN:
(纸本)9781479941162
Large-scale graph structures are considered as a keystone for many emerging high-performance computing applications in which Breadth-First Search (BFS) is an important building block. For such graph structures, BFS operations tends to be memory-bound rather than compute-bound. In this paper, we present an efficient reconfigurable architecture for parallel BFS that adopts new optimizations for utilizing memory bandwidth. Our architecture adopts a custom graph representation based on compressed-sparse raw format (CSR), as well as a restructuring of the conventional BFS algorithm. By taking maximum advantage of available memory bandwidth, our architecture continuously keeps our processing elements active. Using a commercial high-performance reconfigurable computing system (the Convey HC-2), our results demonstrate a 5x speedup over previously published FPGA-based implementations.
Two recent curriculum studies, the ACM/ieee Curricula 2013 Report and the NSF/ieee-TCPP Curriculum Initiative on parallel and distributed Computing, argue that every undergraduate computer science program should inclu...
详细信息
ISBN:
(纸本)9781479941162
Two recent curriculum studies, the ACM/ieee Curricula 2013 Report and the NSF/ieee-TCPP Curriculum Initiative on parallel and distributed Computing, argue that every undergraduate computer science program should include topics in parallel and distributed computing (PDC). Although not within the scope of these reports, there is also a need for students in computing related general education courses to be aware of the role that parallel and distributed computing technologies play in the computing landscape. One approach to integrating these topics into existing curricula is to spread them across several courses. However, this approach requires development of multiple instructional modules targeted to introduce PDC concepts at specific points in the curriculum. Such modules need to mesh withthe goals of the courses for which they are designed in such a way that minimal material has to be removed from existing topics. At the same time the modules should provide students with an understanding of and experience employing fundamental PDC concepts. In this paper we report on our experience developing and deploying such modules.
In this study, a control method is proposed to improve the harmonic suppression efficiency of the single phase active power filter in a distorted power system environment. Here, we present a method to use self-tuning ...
详细信息
ISBN:
(纸本)9781479951154
In this study, a control method is proposed to improve the harmonic suppression efficiency of the single phase active power filter in a distorted power system environment. Here, we present a method to use self-tuning filter (STF) algorithm for single phase active power filter. the proposed method process the grid voltage in order to provide a uniform reference grid current to increase efficiency of the system. the results of simulation study are presented to verify the effectiveness of the proposed control technique in this study.
this paper presents the development of a Hadoop MapReduce module that has been taught in a course in distributed computing to upper undergraduate computer science students at Clemson University. the paper describes ou...
详细信息
ISBN:
(纸本)9781479941162
this paper presents the development of a Hadoop MapReduce module that has been taught in a course in distributed computing to upper undergraduate computer science students at Clemson University. the paper describes our teaching experiences and the feedback from the students over several semesters that have helped to shape the course. We provide suggested best practices for lecture materials, the computing platform, and the teaching methods. In addition, the computing platform and teaching methods can be extended to accommodate emerging technologies and modules for related courses.
Detecting similar pairs in large biological sequence collections is one of the most commonly performed tasks in computational biology. Withthe advent of high throughput sequencing technologies the problem regained si...
详细信息
ISBN:
(纸本)9781479941162
Detecting similar pairs in large biological sequence collections is one of the most commonly performed tasks in computational biology. Withthe advent of high throughput sequencing technologies the problem regained significance as data sets with millions of sequences became ubiquitous. this paper is an initial report on our parallel, distributed memory and sketching-based approach to constructing large-scale sequence similarity graphs. We develop load balancing techniques, derived from multi-way number partitioning and work stealing, to manage computational imbalance and ensure scalability on thousands of processors. Our experimental results show that the method is efficient, and can be used to analyze data sets with millions of DNA sequences in acceptable time limits.
Multicore processors are nowadays widespread across desktop, laptop, server, and even smartphone and tablets devices. the rise of such powerful execution environments calls for new parallel and distributed Description...
详细信息
ISBN:
(纸本)9781479941162
Multicore processors are nowadays widespread across desktop, laptop, server, and even smartphone and tablets devices. the rise of such powerful execution environments calls for new parallel and distributed Description Logics (DLs) reasoning algorithms. Many sophisticated optimizations have been explored and have considerably enhanced DL reasoning with light ontologies. Non-determinism remains a main source of complexity for implemented systems handling ontologies relying on more expressive logics. In this work, we explore handling non-determinism with DL languages enabling qualified cardinality restrictions. We implement a fork/join parallel framework into our hybrid algebraic reasoner, which handles qualified cardinality restrictions and nominals using in-equation solving. Preliminary evaluation shows encouraging results.
We present a new parallel-in-time method designed to reduce the overall time-to-solution of a patient-specific cardiovascular flow simulation. Using a modified parareal algorithm, our approach extends strong scalabili...
详细信息
ISBN:
(纸本)9780769552071
We present a new parallel-in-time method designed to reduce the overall time-to-solution of a patient-specific cardiovascular flow simulation. Using a modified parareal algorithm, our approach extends strong scalability beyond spatial parallelism with fully controllable accuracy and no decrease in stability. We discuss the coupling of spatial and temporal domain decompositions used in our implementation, and showcase the use of the method on a study of blood flow through the aorta. We observe an additional 40% reduction in overall wall clock time with no significant loss of accuracy, in agreement with a predictive performance model.
Recently, a variety of accelerator architectures became available in the field of high performance computing. Intel's MIC (Many Integrated Core) and both GPU architectures, NVIDIA's Kepler and AMD's Graphi...
详细信息
ISBN:
(纸本)9781479941162
Recently, a variety of accelerator architectures became available in the field of high performance computing. Intel's MIC (Many Integrated Core) and both GPU architectures, NVIDIA's Kepler and AMD's Graphics Core Next, all represent the latest innovation in the field of general purpose computing accelerators. this paper explores several important characteristics of these architectures and investigates the impact of certain design factors on the achieved performance using the uCLbench micro-benchmarks, the NPB (NAS parallel Benchmark) suite and diverse real-world applications from the field of physics. Based on the single unified programming interface OpenCL, we observe the run-time behavior of each test program on several test platforms. Major architectural discrepancies are studied and a higher level examination is discussed in details.
Experiments are a fundamental part of science. they are needed when the system under evaluation is too complex to be analytically described and they serve to empirically validate hypotheses. this work presents the exp...
详细信息
ISBN:
(纸本)9781479941162
Experiments are a fundamental part of science. they are needed when the system under evaluation is too complex to be analytically described and they serve to empirically validate hypotheses. this work presents the experimentation framework ExCovery for dependability analysis of distributed processes. It provides concepts that cover the description, execution, measurement and storage of experiments. these concepts foster transparency and repeatability of experiments for further sharing and comparison. ExCovery has been tried and refined in a manifold of dependability related experiments during the last two years. A case study is provided to describe service discovery (SD) as experiment process (EP). A working prototype for IP networks runs on the distributed Embedded System (DES) wireless testbed at the Freie Universitat Berlin.
Reading input from primary storage (i.e. the ingest phase) and aggregating results (i.e. the merge phase) are important pre- and post-processing steps in large batch computations. Unfortunately, today's data sets ...
详细信息
ISBN:
(纸本)9781479941162
Reading input from primary storage (i.e. the ingest phase) and aggregating results (i.e. the merge phase) are important pre- and post-processing steps in large batch computations. Unfortunately, today's data sets are so large that the ingest and merge job phases are now performance bottlenecks. In this paper, we mitigate the ingest and merge bottlenecks by leveraging the scale-up MapReduce model. We introduce an ingest chunk pipeline and a merge optimization that increases CPU utilization (50 - 100%) and job phase speedups (1.16x - 3.13x) for the ingest and merge phases. Our techniques are based on well-known algorithms and scale-out MapReduce optimizations, but applying them to a scale-up computation framework to mitigate the ingest and merge bottlenecks is novel.
暂无评论