This research paper presents an approach to clustering the prevalence of chronic conditions among children with public insurance in the United States. The data consist of prevalence estimates at the community level fo...
详细信息
This research paper presents an approach to clustering the prevalence of chronic conditions among children with public insurance in the United States. The data consist of prevalence estimates at the community level for 25 pediatric chronic conditions. We employ a spatial clustering algorithm to identify clusters of communities with similar chronic condition prevalences. The primary challenge is the computational effort needed to estimate the spatial clustering for all communities in the U.S. To address this challenge, we develop a distributed computing approach to spatial clustering. Overall, we found that the burden of chronic conditions in rural communities tends to be similar but with wide differences in urban communities. This finding suggests similar interventions for managing chronic conditions in rural communities but targeted interventions in urban areas.
This article describes the study results of semi-structured data processing and analysis of the Russian court decisions (almost 30 million) using distributed cluster-computing framework and machine learning. Spark was...
详细信息
This article describes the study results of semi-structured data processing and analysis of the Russian court decisions (almost 30 million) using distributed cluster-computing framework and machine learning. Spark was used for data processing and decisions trees were used for analysis. The results of the automation of data collection and structuring of court decisions are presented. The methods for extracting and structuring knowledge from semi-structured data for the field of justice, taking into account the specifics of the Russian Federation legislation, have been developed. On the example of the fire safety law, the machine learning method for identify the effectiveness of changes in the law and predictions of the consequences of changing the law is demonstrated. It is also shown an association on the impact of lawmaking on law enforcement. The regularities in law enforcement change associate by changes in the law. The connections of law enforcement with economic and social indicators between the regions are identified. The judicial interpretations of the observations are also described in this article what proves the compliance of the results.
The rise of heterogeneous systems has given place to great challenges for users as they involve new concepts, restrictions, and frameworks. Their exploitation is further complicated in the context of distributed memor...
详细信息
The rise of heterogeneous systems has given place to great challenges for users as they involve new concepts, restrictions, and frameworks. Their exploitation is further complicated in the context of distributed memory systems, which require the usage of additional different programming paradigms and tools. In this paper, we propose a novel approach to program heterogeneous clusters that is based on high-level abstractions such as tiles and hierarchical decomposition combined with the powerful APIs that data types and embedded languages can provide in languages such as C++. Rather than building our proposal from scratch, we have implemented it as a natural integration of the existing Hierarchically Tiled Arrays (HTA) and Heterogeneous Programming Library (HPL) projects, ie, the first one being focused on distributed computing and the second one on heterogeneous processing. The result, called Heterogeneous Hierarchically Tiled Arrays (H(2)TA), is very intuitive and easy to use thanks to the global view of the data and the single-threaded view of the execution that it provides at cluster level together with the transparency it provides with respect to the management of the heterogeneous devices. An evaluation comparing our proposal with MPI-based implementations shows its large programmability advantages and the reasonable overhead incurred.
How can we optimally trade extra computing power to reduce the communication load in distributed computing? We answer this question by characterizing a fundamental tradeoff between computation and communication in dis...
详细信息
How can we optimally trade extra computing power to reduce the communication load in distributed computing? We answer this question by characterizing a fundamental tradeoff between computation and communication in distributed computing, i.e., the two are inversely proportional to each other. More specifically, a general distributed computing framework, motivated by commonly used structures like MapReduce, is considered, where the overall computation is decomposed into computing a set of "Map" and "Reduce" functions distributedly across multiple computing nodes. A coded scheme, named "coded distributed computing" (CDC), is proposed to demonstrate that increasing the computation load of the Map functions by a factor of r (i.e., evaluating each function at r carefully chosen nodes) can create novel coding opportunities that reduce the communication load by the same factor. An information-theoretic lower bound on the communication load is also provided, which matches the communication load achieved by the CDC scheme. As a result, the optimal computation-communication tradeoff in distributed computing is exactly characterized. Finally, the coding techniques of CDC is applied to the Hadoop TeraSort benchmark to develop a novel CodedTeraSort algorithm, which is empirically demonstrated to speed up the overall job execution by 1.97x - 3.39x, for typical settings of interest.
This paper presents a distributed computing architecture for solving a distribution optimal power flow (DOPF) model based on a smart grid communication middleware (SGCM) system. The system is modeled as an unbalanced ...
详细信息
This paper presents a distributed computing architecture for solving a distribution optimal power flow (DOPF) model based on a smart grid communication middleware (SGCM) system. The system is modeled as an unbalanced three-phase distribution system, which includes different kind of loads and various components of distribution systems. In this paper, fixed loads are modeled as constant impedance, current and power loads, and neural network models of controllable smart loads are integrated into the DOPF model. A genetic algorithm is used to determine the optimal solutions for controllable devices, in particular load tap changers, switched capacitors, and smart loads in the context of an energy management system for practical feeders, accounting for the fact that smart loads consumption should not be significantly affected by network constraints. Since the number of control variables in a realistic distribution power system is large, solving the DOPF for real-time applications is computationally expensive. Hence, to reduce computational times, a decentralized system with parallel computing nodes based on an SGCM system is proposed. Using a "MapReduce" model, the SGCM system runs the DOPF model, communicates between master and worker computing nodes, and sends/receives data among different parts of parallel computing system. Compared to a centralized approach, the proposed architecture is shown to yield better optimal solutions in terms of reducing energy losses and/or energy drawn from the substation within adequate practical run-times for a realistic test feeder.
Inspired by social networks and complex systems, we propose a core-periphery network architecture that supports fast computation for many distributed algorithms, is robust and uses a linear number of links. Rather tha...
详细信息
Inspired by social networks and complex systems, we propose a core-periphery network architecture that supports fast computation for many distributed algorithms, is robust and uses a linear number of links. Rather than providing a concrete network model, we take an axiom-based design approach. We provide three intuitive and independent algorithmic axioms and prove that any network that satisfies all axioms enjoys an efficient algorithm for a range of tasks (such as MST, sparse matrix multiplication, and more). We also show the necessity of our axiom set: for networks that satisfy any subset of the axioms, the same efficiency cannot be guaranteed for any deterministic algorithm. (C) 2016 Elsevier Inc. All rights reserved.
Nowadays specialists in distributed computing try to reduce execution time for complex calculations. One of the tasks about increasing the efficiency of distributed computing is - ensure survivability, which consists ...
详细信息
ISBN:
(纸本)9781538666111
Nowadays specialists in distributed computing try to reduce execution time for complex calculations. One of the tasks about increasing the efficiency of distributed computing is - ensure survivability, which consists in the fastest possible restoration of the distributed program system in the event of the failure of some hardware resources. This article is devoted to the study the issues of ensuring the functional reliability of software. The rollback process formalization for survivability of a distributed computing are presented. The rules of resource management are formulated. Method of providing survivability by memory dump of software components was developed. This method can be implemented as a separate middleware or by embedding it in distributed software, for example, as a library of functions.
The imbalanced load between clusters is a key issue in distributed computing environment. All existing dynamic load balancing algorithms are post-active, as balancing activities start after system turn into imbalanced...
详细信息
ISBN:
(纸本)9789811082375;9789811082368
The imbalanced load between clusters is a key issue in distributed computing environment. All existing dynamic load balancing algorithms are post-active, as balancing activities start after system turn into imbalanced state. The better approach is to design pro-active load balancing algorithm which starts working with scheduling algorithms. It helps scheduling algorithms to schedule incoming jobs in such a way that system remains in balanced state. The pluggable to scheduler dynamic load balancing algorithm (P2S_DLB) is designed and evaluated over priority scheduling algorithm in our previous research work. The P2S_DLB is pro-active dynamic load balancing algorithm. In this paper, we have measured and evaluated the performance of P2S_DLB over First Come First Serve (FCFS), Shortest Job First (SJF), and Earliest Deadline First (EDF) scheduling algorithms. The experimental result shows that algorithm has improved the cluster utilization and decreased the imbalance level of distributed computing environment in case of all the three scheduling algorithms.
This paper studies the computation-communication tradeoff in a heterogeneous MapReduce computing system where each distributed node is equipped with different computation capability. We first obtain an achievable comm...
详细信息
ISBN:
(数字)9781728109626
ISBN:
(纸本)9781728109633
This paper studies the computation-communication tradeoff in a heterogeneous MapReduce computing system where each distributed node is equipped with different computation capability. We first obtain an achievable communication load for any given computation load and any given function assignment at each node. The proposed file allocation strategy has two steps: first, the input files are partitioned into disjoint batches, each with possibly different size and computed by a distinct node;then, each node computes additional files from its non-computed files according to its redundant computation capability. In the Shuffle phase, coded multicasting opportunities are exploited thanks to the repetitive file allocation among different nodes. Based on this scheme, we further propose the computation-aware and the shuffle-aware function assignments. We prove that, by using proper function assignments, our achievable communication load for any given computation load is within a constant multiplicative gap to the optimum in an equivalent homogeneous system with the same average computation load. Numerical results show that our scheme with shuffle-aware function assignment achieves better computation-communication tradeoff than existing works in some cases.
distributed computing platforms are evolving to heterogeneous ecosystems with Clusters, Grids and Clouds introducing in its computing nodes, processors with different core architectures, accelerators (i.e. GPUs, FPGAs...
详细信息
distributed computing platforms are evolving to heterogeneous ecosystems with Clusters, Grids and Clouds introducing in its computing nodes, processors with different core architectures, accelerators (i.e. GPUs, FPGAs), as well as different memories and storage devices in order to achieve better performance with lower energy consumption. As a consequence of this heterogeneity, programming applications for these distributed heterogeneous platforms becomes a complex task. Additionally to the complexity of developing an application for distributed platforms, developers must also deal now with the complexity of the different computing devices inside the node. In this article, we present a programming model that aims to facilitate the development and execution of applications in current and future distributed heterogeneous parallel architectures. This programming model is based on the hierarchical composition of the COMP Superscalar and Omp Superscalar programming models that allow developers to implement infrastructure-agnostic applications. The underlying runtime enables applications to adapt to the infrastructure without the need of maintaining different versions of the code. Our programming model proposal has been evaluated on real platforms, in terms of heterogeneous resource usage, performance and adaptation.
暂无评论