Analyzing large dynamic networks is an important problem withapplications in a wide range of disciplines. A key operation is updating the network properties as its topology changes. In this paper we present graph spa...
详细信息
ISBN:
(纸本)9781509036820
Analyzing large dynamic networks is an important problem withapplications in a wide range of disciplines. A key operation is updating the network properties as its topology changes. In this paper we present graph sparsification as an efficient abstraction for updating the properties of dynamic networks. We demonstrate the applicability of graph sparsification in updating the connected components in random and scale-free networks on shared memory systems. Our results show that the updating is scalable (10X on 16 processors for larger networks). To the best of our knowledge this is the first parallel implementation of graph sparsification. Based on these initial results, we discuss how the current implementation can be further improved and how graph sparsification can be applied to updating other network properties.
the Petascale Cray XT5 system at the Oak Ridge National Laboratory (ORNL) Leadership Computing Facility (LCF) shares a number of system and software features with its predecessor, the Cray XT4 system including the qua...
详细信息
ISBN:
(纸本)9781424437511
the Petascale Cray XT5 system at the Oak Ridge National Laboratory (ORNL) Leadership Computing Facility (LCF) shares a number of system and software features with its predecessor, the Cray XT4 system including the quad-core AMD processor and a multi-core aware MPI library. We analyze performance of scalable scientific applications on the quad-core Cray XT4 system as part of the early system access using a combination of micro-benchmarks and Petascale ready applications. Particularly, we evaluate impact of key changes that occurred during the dual-core to quad-core processor upgrade on applications behavior and provide projections for the next-generation massively-parallel platforms with multi-core processors, specifically for proposed Petascale Cray XT5 system. We compare and contrast the quad-core XT4 system features withthe upcoming XT5 system and discuss strategies for improving scaling and performance for our target applications.
Airborne SAR remote sensing image has the characteristic of large data volume and computation burden, so the processing needs very large computer memory and stronger computation ability. Based on the introduction of t...
详细信息
ISBN:
(纸本)0780378407
Airborne SAR remote sensing image has the characteristic of large data volume and computation burden, so the processing needs very large computer memory and stronger computation ability. Based on the introduction of the SAR image processing procedure, this paper studies the SAR image processing using computer parallel computation technology. the parallelprocessing mechanism is based on the parallel computer cluster operation and the large virtual shared memory technology. In the Client/Server architecture based SAR image parallel system, agent based network communication plays an important role in the computer performance monitor and burden distribution. In the end, the application of the SAR image parallelprocessing system in the disaster monitor and assessment system is introduced. the application result illustrates the high efficiency of the system and the feasibility of our research.
As the number of cores grow in HPC systems, so does the effect of system noise on applications running on these systems. Withthe knowledge that future large-scale parallel computer systems, including exascale systems...
详细信息
ISBN:
(纸本)9781509036820
As the number of cores grow in HPC systems, so does the effect of system noise on applications running on these systems. Withthe knowledge that future large-scale parallel computer systems, including exascale systems, will operate under an overall power bound, we claim to have found a solution that can counter the effects of noise. We present two methods that estimate the effects of noise on an application and then optimally redistributes power among nodes, such that the effects of noise are "hidden".
thanks to their massive computational power and their SIMT computational model, Graphics processing Units (GPUs) have been successfully used to accelerate a wide variety of regular applications (linear algebra, stenci...
详细信息
ISBN:
(纸本)9780769549712
thanks to their massive computational power and their SIMT computational model, Graphics processing Units (GPUs) have been successfully used to accelerate a wide variety of regular applications (linear algebra, stencil computations, image processing and bioinformatics algorithms, among others). However, many established and emerging problems are based on irregular data structures, such as graphs. Examples can be drawn from different application domains: networking, social networking, machine learning, electrical circuit modeling, discrete event simulation, compilers, and computational sciences. It has been shown that irregular applications based on large graphs do exhibit runtime parallelism;moreover, the amount of available parallelism tends to increase withthe size of the datasets. In this work, we explore an implementation space for deploying a variety of graph algorithms on GPUs. We show that the dynamic nature of the parallelism that can be extracted from graph algorithms makes it impossible to find an optimal solution. We propose a runtime system able to dynamically transition between different implementations with minimal overhead, and investigate heuristic decisions applicable across algorithms and datasets. Our evaluation is performed on two graph algorithms: breadth-first search and single-source shortest paths. We believe that our proposed mechanisms can be extended and applied to other graph algorithms that exhibit similar computational patterns.
Control systems are required to comply with certain safety and liveness correctness properties. In most cases, such systems have an intrinsic degree of complexity and it is not easy to formally analyze them, due to th...
详细信息
ISBN:
(纸本)9783540680673
Control systems are required to comply with certain safety and liveness correctness properties. In most cases, such systems have an intrinsic degree of complexity and it is not easy to formally analyze them, due to the resulting large state space. Also, exhaustive simulation and testing can easily miss system errors, whether they are life-critical or not. In this work, we introduce an interlocking control approach that is based on the use of the so-called distributed Signal Boxes (DSBs). the proposed control design is applied to a railway-interlocking problem and more precisely, to the Athens underground metro system. Signal boxes correspond to the network's interlocking points and communicate only withtheir neighbor signal boxes. Communication takes place by the use of rendezvous communication channels. this design results in a simple interlocking control approach that compared to other centralized solutions produces a smaller and easier to analyze state space. Formal analysis and verification is performed withthe SPIN model checker.
High-level tools for analyzing and predicting the performance GPU-accelerated applications are scarce, at best. Although performance modeling approaches for GPUs exist, their complexity makes them virtually impossible...
详细信息
ISBN:
(纸本)9781509036820
High-level tools for analyzing and predicting the performance GPU-accelerated applications are scarce, at best. Although performance modeling approaches for GPUs exist, their complexity makes them virtually impossible to use to quickly analyze the performance of real life applications and obtain easy-to-use, readable feedback. this is why, although GPUs are significant performance boosters in many HPC domains, performance prediction is still based on extensive benchmarking, and performance bottleneck analysis remains a nonsystematic, experience-driven process. In this context, we propose a tool for bottleneck analysis and performance prediction for GPU-accelerated applications. Based on random forest modeling, and using hardware performance counters data, our method can be used to quickly and accurately evaluate application performance on GPU-based systems for different problem characteristics and different hardware generations. We illustrate the benefits of our approach withthree detailed use cases: a simple step-by-step example on a parallel reduction kernel, and two classical benchmarks (matrix multiplication and sequence alignment). Our results so far indicate that our statistical modeling is a quick, easy-to-use method to grasp the performance characteristics of applications running on GPUs. Our current work focuses on tackling some of its applicability limitations (more applications, more platforms) and improving its usability (full automation from input to user feedback).
In this paper we discuss the runtime support required for the parallelization of unstructured data-parallelapplications on nonuniform and adaptive environments. the approach presented is reasonably general and is app...
详细信息
ISBN:
(纸本)0818675829
In this paper we discuss the runtime support required for the parallelization of unstructured data-parallelapplications on nonuniform and adaptive environments. the approach presented is reasonably general and is applicable to a wide variety of regular as well as irregular applications. We present performance results for the solution of an unstructured mesh on a cluster of heterogeneous workstations.
this paper describes highly scalable X10-based agent simulation platform called XAXIS. XAXIS is designed to handle millions or billions of agents on recent highly distributed and parallel computing environments with m...
详细信息
ISBN:
(纸本)9780769548463
this paper describes highly scalable X10-based agent simulation platform called XAXIS. XAXIS is designed to handle millions or billions of agents on recent highly distributed and parallel computing environments with more than hundreds of CPU cores. To make the runtime scalable on such environments, we need to redesign and implement the simulation middleware. In this paper, we propose the software design, implementation on X10, one of the state-of-the-art PGAS language, and then application to large-scale traffic simulation. By using 192 CPU cores in distributed memory computing environment, the performance scalability is achieved with a traffic simulation.
Tree-shaped task graphs become a paradigm to be utilized in distributed platform for various computational domains, such as the electronic structure calculations and the factorization of sparse matrices. However, the ...
详细信息
ISBN:
(纸本)9781665435741
Tree-shaped task graphs become a paradigm to be utilized in distributed platform for various computational domains, such as the electronic structure calculations and the factorization of sparse matrices. However, the scheduling of the tree-shaped task graph has been rarely studied for the more realistic heterogeneous multiprocessor platform (HEMP). this paper proposes an efficient algorithm named Partition-Allocation (PA) for parallel computing on HEMP with limited memory. Algorithm PA consists of two stages: partitioning and allocation. In the partitioning stage, a task tree is split into several subtrees. In the allocation stage, these subtrees are assigned to different processors for execution. Our algorithm PA can reduce makespan by prioritizing subtrees on the critical path, both in the partitioning and in the allocation. Based on randomly generated trees and real-world dataset, experimental results show that the proposed PA is significantly better than the latest work in terms of average makespan. the proposed algorithm can successfully reduce the average makespan by up to 67.01% on real-world dataset, and 52.35% on randomly generated trees.
暂无评论