Building correct distributedsystems is challenging, and any attempt for providing a direct, global proof of correctness of a distributed system is bound to fail. An interesting alternative approach consists in starti...
详细信息
ISBN:
(纸本)9781479959198
Building correct distributedsystems is challenging, and any attempt for providing a direct, global proof of correctness of a distributed system is bound to fail. An interesting alternative approach consists in starting from a specification or program of the system under construction, verifying all properties of interest on it - which has a much lower complexity than the verification on a distributed implementation - and finally derive a distributed implementation using some correct by-construction approach. Note that this topic is related to distributed control, where the objective is to enforce in a distributed manner some global constraint on a plant. Deriving such a distributed controller directly is difficult, and the correctness of the resulting controller is difficult to prove. A more feasible approach in this context is to first construct a global controller, then transform it into distributed one, again by means of a correct-by-construction approach.
Workflow Management systems automate the execution of business processes allowing the concurrent execution of multiple process instances. Existing systems do not provide a mechanism to guarantee correct concurrent exe...
详细信息
ISBN:
(纸本)0818676833
Workflow Management systems automate the execution of business processes allowing the concurrent execution of multiple process instances. Existing systems do not provide a mechanism to guarantee correct concurrent execution and, as a result, it is not possible to coordinate and synchronize different process instances. Part;of the problem is that conventional techniques are not entirely suitable for workflow environments. In databases, locks are the basic mechanism. In operating systems, this is achieved using semaphores or monitors. Neither of these approaches is appropriate for workflow applications. In this paper a method is proposed to enforce correct interleavings and guarantee mutual exclusion, as defined by the user, between concurrent workflow processes. The proposed protocol takes advantage of the semantic constructs associated with workflow management to solve some complex problems like dealing with inherited restrictions and the coarse granularity of workflow specifications.
Graphs are increasingly important for modelling and analysing connected data sets. Traditionally, graph analytical tools targeted global fixed-point computations, while graph databases focused on simpler transactional...
详细信息
ISBN:
(纸本)9781728174457
Graphs are increasingly important for modelling and analysing connected data sets. Traditionally, graph analytical tools targeted global fixed-point computations, while graph databases focused on simpler transactional read operations such as retrieving the neighbours of a node. However, recent applications of graph processing (such as financial fraud detection and serving personalized recommendations) often necessitate a mix of the two workload profiles. A potential approach to tackle these complex workloads is to formulate graph algorithms in the language of linear algebra. To this end, the recent GraphBLAS standard defines a linear algebraic graph computational model and an API for implementing such algorithms. To investigate its usability and efficiency, we have implemented a GraphBLAS solution for the "Social Media" case study of the 2018 Transformation Tool Contest. This paper presents our solution along with an incrementalized variant to improve its runtime for repeated evaluations. Preliminary results show that the GraphBLAS-based solution is competitive but implementing it requires significant development efforts.
Outer joins are ubiquitous in databases and big data systems. The question of how best to execute outer joins in large parallelsystems is particularly challenging as real world datasets are characterized by data skew...
详细信息
ISBN:
(纸本)9781479927845
Outer joins are ubiquitous in databases and big data systems. The question of how best to execute outer joins in large parallelsystems is particularly challenging as real world datasets are characterized by data skew leading to performance issues. Although skew handling techniques have been extensively studied for inner joins, there is little published work solving the corresponding problem for parallel outer joins. Conventional approaches to this problem such as ones based on hash redistribution often lead to load balancing problems while duplication-based approaches incurs significant overhead in terms of network communication. In this paper, we propose a new algorithm, query with counters (QC), for directly handling skew in outer joins on distributed architectures. We present an efficient implementation of our approach based on the asynchronous partitioned global address space (APGAS) parallel programming model. We evaluate the performance of our approach on a cluster of 192 cores (16 nodes) and datasets of 1 billion tuples with different skew. Experimental results show that our method is scalable and, in cases of high skew, faster than the state-of-the-art.
Term Rewriting System (TRS) is a model of computation and it is used in various application such as algebraic specification. TRS has an inherent concurrency and it is suitable for parallel computing. We have already p...
详细信息
Term Rewriting System (TRS) is a model of computation and it is used in various application such as algebraic specification. TRS has an inherent concurrency and it is suitable for parallel computing. We have already proposed BOB (Bundle Of Branches), which is a mechanism of data management for parallel rewriting. We have proposed a model of parallel rewriting using BOB and implemented a TRS simulator based on this model on a shared memory parallel computer. Because it fully depends on the feature of a shared memory architecture, that is, a process can access any memory element, it is hard to transport it on a distributed memory parallel computer. In this paper, we propose autonomous BOB model. This model is suitable for a distributed memory architecture since a process uses message passing protocol and the method of load balancing is provided. We implement a TRS simulator using this model on a distributed memory architecture and it runs about 30 times faster on 64 processors than on a single processor.
Modern data generation is enormous;we now capture events at increasingly fine granularity, and require processing at rates approaching real-time. For graph analytics, this explosion in data volumes and processing dema...
详细信息
ISBN:
(纸本)9781728112466
Modern data generation is enormous;we now capture events at increasingly fine granularity, and require processing at rates approaching real-time. For graph analytics, this explosion in data volumes and processing demands has not been matched by improved algorithmic or infrastructure techniques. Instead of exploring solutions to keep up with the velocity of the generated data, most of today's systems focus on analyzing individually built historic snapshots. Modern graph analytics pipelines must evolve to become viable at massive scale, and move away from static, post-processing scenarios to support on-line analysis. This paper presents our progress towards a system that analyzes dynamic incremental graphs, responsive at single-change granularity. We present an algorithmic structure using principles of recursive updates and monotonic convergence, and a set of incremental graph algorithms that can be implemented based on this structure. We also present the required middleware to support graph analytics at fine, event-level granularity. We envision that graph topology changes are processed asynchronously, concurrently, and independently (without shared state), converging an algorithm's state (e.g. single-source shortest path distances, connectivity analysis labeling) to its deterministic answer. The expected long-term impact of this work is to enable a transition away from offfine graph analytics, allowing knowledge to be extracted from networked systems in real-time.
We present the design and implementation of Data Jockey, a data management system for HPC multi-tiered storage systems. As a centralized data management control plane, Data Jockey automates bulk data movement and plac...
详细信息
ISBN:
(纸本)9781728112466
We present the design and implementation of Data Jockey, a data management system for HPC multi-tiered storage systems. As a centralized data management control plane, Data Jockey automates bulk data movement and placement for scientific workflows and integrates into existing HPC storage infrastructures. Data Jockey simplifies data management by eliminating human effort in programming complex data movements, laying datasets across multiple storage tiers when supporting complex workflows, which in turn increases the usability of multi-tiered storage systems emerging in modern HPC data centers. Specifically, Data Jockey presents a new data management scheme called "goal driven data management" that can automatically infer low-level bulk data movement plans from declarative high-level goal statements that come from the lifetime of iterative runs of scientific workflows. While doing so, Data Jockey aims to minimize data wait times by taking responsibility for datasets that are unused or to be used, and aggressively utilizing the capacity of the upper, higher performant storage tiers. We evaluated a prototype implementation of Data Jockey under a synthetic workload based on a year's worth of Oak Ridge Leadership Computing Facility's (OLCF) operational logs. Our evaluations suggest that Data Jockey leads to higher utilization of the upper storage tiers while minimizing the programming effort of data movement compared to human involved, per-domain ad-hoc data management scripts.
Stream processing systems have become important, as applications like media broadcasting, sensor network monitoring and on-line data analysis increasingly rely on real-time stream processing. Such systems are often ch...
详细信息
ISBN:
(纸本)9781424437511
Stream processing systems have become important, as applications like media broadcasting, sensor network monitoring and on-line data analysis increasingly rely on real-time stream processing. Such systems are often challenged by the bursty nature of the applications. In this paper, we present BARRE (Burst Accommodation through Rate REconfiguration), a system to address the problem of bursty data streams in distributed stream processing systems. Upon the emergence of a burst, BARRE dynamically reserves resources dispersed across the nodes of a distributed stream processing system, based on the requirements of each application as well as the resources available on the nodes. Our experimental results over our Synergy distributed stream processing system demonstrate the efficiency of our approach.
The rapid growth of Internet of Things (IoT) and au-tonomous systems has led to the deployment of edge devices close to the sensing data source for low-latency computation.
ISBN:
(纸本)9781665497473
The rapid growth of Internet of Things (IoT) and au-tonomous systems has led to the deployment of edge devices close to the sensing data source for low-latency computation.
distributed embedded systems are increasingly prevalent in numerous applications, and with pervasive network access within these systems, security is also a critical design concern. In this paper, we present a modelin...
详细信息
ISBN:
(纸本)9781509036820
distributed embedded systems are increasingly prevalent in numerous applications, and with pervasive network access within these systems, security is also a critical design concern. In this paper, we present a modeling and optimization framework for distributed reconfigurable embedded systems, which maps tasks on a distributed embedded system with the goal of optimizing latency, energy, and/or security across all computing and communication levels. The proposed modeling framework for dataflow applications integrates models for computational latency, security levels for inter-task and intra-task communication, communication latency, and power consumption. We evaluate the proposed methodology using a video-based object detection and tracking application.
暂无评论