GraphBLAS is a recent standard that allows the expression of graph algorithms in the language of linear algebra and enables automatic code parallelization and optimization. GraphBLAS operations are executed either in ...
详细信息
ISBN:
(数字)9781665497473
ISBN:
(纸本)9781665497480
GraphBLAS is a recent standard that allows the expression of graph algorithms in the language of linear algebra and enables automatic code parallelization and optimization. GraphBLAS operations are executed either in blocking or in non-blocking mode. Although there exist multiple implementations of GraphBLAS for efficient blocking execution on both shared-and distributed-memory systems, none of these implementations supports full nonblocking execution to improve data locality. In this paper, we present a preliminary evaluation for two algorithms, Pagerank and Conjugate Gradient, that confirms the importance of nonblocking execution, by showing promising speedups over the corresponding blocking execution.
作者:
Cicirelli, FrancoNigro, LiberoCNR
Inst High Performance Comp & Networking ICAR Natl Res Council Italy I-87036 Arcavacata Di Rende CS Italy Univ Calabria
DIMES Dept Informat Modelling Elect & Syst Sci I-87036 Arcavacata Di Rende CS Italy
This work aims at the development of tools for supporting modelling and analysis of timed systems by Stochastic Reward Nets (SRN). In a first approach it was proposed and experimented a formal reduction of SRN over Ti...
详细信息
ISBN:
(纸本)9781665433266
This work aims at the development of tools for supporting modelling and analysis of timed systems by Stochastic Reward Nets (SRN). In a first approach it was proposed and experimented a formal reduction of SRN over Timed Automata (TA) in the context of the Uppaal popular toolbox. The reduction has the merit to allow both exhaustive model checking of an SRN model, useful for the assessment of qualitative properties (e.g., absence of deadlocks, occurrence of particular event sequences etc.), and quantitative analysis through the statistical model checker, which is based on simulations. However, although Uppaal enabled formal reasoning on the semantics of SRN, its practical usage suffers of scalability problems, that is it can introduce severe limitations in time and space when studying complex models. To cope with this problem, this paper describes a Java implementation of the SRN operational core engine, using the lock-free and efficient Theatre actor system which permits the parallel simulation of large models. The realization can be used for functional property checking on an untimed version of a source SRN model, and quantitative estimation of measurables through simulations. The paper discusses the design and implementation of the core engine of SRN on top of Theatre, together with supported intuitive configuration process of an SRN model, and reports some experimental results using a scalable grid computing model. The experiments confirm Theatre/SRN are capable of exploiting the potential of modern multi-core machines and can deliver good execution performances on large models.
High-performance computing (HPC) is often perceived as a matter of making large-scale systems (e.g., clusters) run as fast as possible, regardless the required programming effort. However, the idea of "bringing H...
详细信息
Big Data technologies such as Cloud and paralleldistributed computing and storage are necessary to treat Earth Science data volume. Yet the great diversity of Earth Science data renders it nearly impossible to organi...
详细信息
ISBN:
(纸本)9781665403696
Big Data technologies such as Cloud and paralleldistributed computing and storage are necessary to treat Earth Science data volume. Yet the great diversity of Earth Science data renders it nearly impossible to organize that data on scalable platforms without costly data movement or undesired interpolation that straitjackets scientific research. The SpatioTemporal Adaptive Resolution Encoding (STARE) is an alternative geolocation and indexing scheme for harmonizing data for integrative analysis on scalable systems. STARE uses a hierarchical, recursive partitioning of space and time in which the index or coordinates of each node are integers from the same index space, usually allowing quick comparison without floating-point calculation. STARE is well suited to provide a unifying geo-semantics for arranging data in databases. In this work, we outline the technical principles underlying STARE and its application to SQLite as an example. The STARELite STARE-aware lightweight geo-database can be used to catalogue diverse data for geographical querying and integration on local resources and Cloud.
The volume, veracity, and velocity of data generated by the accelerators, colliders, supercomputers, light sources and neutron sources have grown exponentially in the last decade. Data has fundamentally changed the sc...
详细信息
ISBN:
(数字)9781665497473
ISBN:
(纸本)9781665497480
The volume, veracity, and velocity of data generated by the accelerators, colliders, supercomputers, light sources and neutron sources have grown exponentially in the last decade. Data has fundamentally changed the scientific workflow running on high performance computing (HPC) systems. It is necessary that we develop appropriate capabilities and tools to understand, analyze, preserve, share, and make optimal use of data. Intertwined with data are complex human processes, policies and decisions that need to be accounted for when building software tools. In this talk, I will outline our work addressing data lifecycle challenges on HPC systems including effective use of storage hierarchy, managing complex scientific data processing, and enabling search on large-scale scientific data.
Deep learning (DL) is a popular technique for building models from large quantities of data such as pictures, videos, messages generated from edges devices at rapid pace all over the world. It is often infeasible to m...
详细信息
ISBN:
(纸本)9781450382175
Deep learning (DL) is a popular technique for building models from large quantities of data such as pictures, videos, messages generated from edges devices at rapid pace all over the world. It is often infeasible to migrate large quantities of data from the edges to centralized data center(s) over WANs for training due to privacy, cost, and performance reasons. At the same time, training large DL models on edge devices is infeasible due to their limited resources. An attractive alternative for DL training distributed data is to use micro-clouds-small-scale clouds deployed near edge devices in multiple locations. However, micro-clouds present the challenges of both computation and network resource heterogeneity as well as dynamism. In this paper, we introduce DLion, a new and generic decentralized distributed DL system designed to address the key challenges in micro-cloud environments, in order to reduce overall training time and improve model accuracy. We present three key techniques in DLion: (1) Weighted dynamic batching to maximize data parallelism for dealing with heterogeneous and dynamic compute capacity, (2) Per-link prioritized gradient exchange to reduce communication overhead for model updates based on available network capacity, and (3) Direct knowledge transfer to improve model accuracy by merging the best performing model parameters. We build a prototype of DLion on top of TensorFlow and show that DLion achieves up to 4.2x speedup in an Amazon GPU cluster, and up to 2x speed up and 26% higher model accuracy in a CPU cluster over four state-of-the-art distributed DL systems.
Query rewriting transforms a relational database query into an equivalent but more efficient one, which is crucial for the performance of database-backed applications. Such rewriting relies on pre-specified rewrite ru...
详细信息
ISBN:
(纸本)9781450392495
Query rewriting transforms a relational database query into an equivalent but more efficient one, which is crucial for the performance of database-backed applications. Such rewriting relies on pre-specified rewrite rules. In existing systems, these rewrite rules are discovered through manual insights and accumulate slowly over the years. In this paper, we present WETUNE, a rule generator that automatically discovers new rewrite rules. Inspired by compiler super-optimization, WETUNE enumerates all valid logical query plans up to a certain size and tries to discover equivalent plans that could potentially lead to more efficient rewrites. The core challenge is to determine which set of conditions (aka constraints) allows one to prove the equivalence between a pair of query plans. We address this challenge by enumerating combinations of "interesting" constraints that relate tables and their attributes between each pair of queries. We also propose a new SMT-based verifier to verify the equivalence of a query pair under different enumerated constraints. To evaluate the usefulness of rewrite rules discovered by WETUNE, we apply them on the SQL queries collected from the 20 most popular open-source web applications on GitHub. WETUNE successfully optimizes 247 queries that existing databases cannot optimize, resulting in substantial performance improvements.
Most widely-deployed parallel file systems (PFSs) implement POSIX semantics, which implies sequential consistency for reads and writes. Strict adherence to POSIX semantics is known to impede performance and thus sever...
详细信息
ISBN:
(纸本)9781450382175
Most widely-deployed parallel file systems (PFSs) implement POSIX semantics, which implies sequential consistency for reads and writes. Strict adherence to POSIX semantics is known to impede performance and thus several new PFSs with relaxed consistency semantics and better performance have been introduced. Such PFSs are useful provided that applications can run correctly on a PFS with weaker semantics. While it is widely assumed that HPC applications do not require strict POSIX semantics, to our knowledge there has not been systematic work to support this assumption. In this paper, we address this gap with a categorization of the consistency semantics guarantees of PFSs and develop an algorithm to determine the consistency semantics requirements of a variety of HPC applications. We captured the I/O activity of 17 representative HPC applications and benchmarks as they performed I/O through POSIX or I/O libraries and examined the metadata operations used and their file access patterns. From this analysis, we find that 16 of the 17 applications can utilize PFSs with weaker semantics.
暂无评论