Big datasets are now becoming a standard quantity in large-scale data analysis; they involve social and information network, and scientific mesh computations. These datasets are commonly stored and processed across mu...
详细信息
Big datasets are now becoming a standard quantity in large-scale data analysis; they involve social and information network, and scientific mesh computations. These datasets are commonly stored and processed across multiple machines due to limited capabilities (such as memory and CPU) of single machines. However, many available analysis tools are still lacking in terms of an ability to fully utilize existing distributed-memory architectures. As these datasets are usually processed and analyzed in the form of graphs or meshes, we propose scalable and efficient approaches for graph and mesh computations for distributed-memory systems in this dissertation. Although graph and mesh computations are closely related regarding their parallelization approaches, some of their unique characteristics still need to be addressed separately. Thus, we organize the dissertation into two parts. The first part is for distributed graph computations, and the second part is for distributed mesh *** the first part of the dissertation, we focus on graph computations. First, we study a problem of Single-Source Shortest Path (SSSP) by analyzing and evaluating three well-known SSSP algorithms, i.e, Dijkstra's, Bellman-Ford, and $\Delta$-stepping algorithms. We implement these algorithms to run on distributed-memory systems based on a bulk synchronous parallel model. Their performances are evaluated and compared. Next, we propose our SSSP algorithm by combining advantages of these SSSP algorithms and utilizing a two-dimensional (2D) graph layout for our graph data structures. Then, we extend our study of the 2D graph data structures and optimization approaches to other well-known graph algorithms including breadth-first search, approximate diameter, connected components, and PageRank on various real-world graphs. Our objective is to implement an efficient graph framework for distributed-memory systems that works efficiently for many graph algorithms on various graph types. Finally,
Linear Temporal Logic (LTL) Model Checking is a very important and popular technique for the automatic verification of safety-critical hardware and software systems, aiming at ensuring their quality. However, it is we...
详细信息
Linear Temporal Logic (LTL) Model Checking is a very important and popular technique for the automatic verification of safety-critical hardware and software systems, aiming at ensuring their quality. However, it is well known that LTL model checking suffers from the state explosion problem, often leading to insurmountable scalability problems when applying it to real-world systems. While there has been work on distributedalgorithms for explicit on-the-fly LTL model checking, these are not sufficiently scalable and capable of tolerating faults during computation, significantly limiting their usefulness in huge cluster environments. Moreover, implementing these algorithms is generally viewed as a very challenging, error-prone task. In this paper, we instead rely on Pregel, a simple yet powerful model for distributed computation on large graphs. Pregel has from the start been designed for efficient, scalable and fault tolerant operation on clusters of thousands of computers, including large cloud setups. To harness Pregel's power, we propose a new vertex centric distributedalgorithm for explicit LTL model checking of concurrent systems. Experimental results illustrate feasibility and scalability of the proposed algorithm. Compared with other distributedalgorithms, our algorithm is more scalable, reliable and efficient. (C) 2014 Elsevier Inc. All rights reserved.
The reactor physics (neutronics) method of the Coarse Mesh Radiation Transport (COMET) code has been used to solve whole core reactor eigenvalue and power distribution problems. COMET solutions are computed to Monte C...
详细信息
The reactor physics (neutronics) method of the Coarse Mesh Radiation Transport (COMET) code has been used to solve whole core reactor eigenvalue and power distribution problems. COMET solutions are computed to Monte Carlo accuracy on a single processor with several orders of magnitude faster computational speed. However, to extend the method to include on-the-fly depletion and incident flux response expansion function calculations via Monte Carlo an implementation for a parallel execution of deterministic COMET calculations has been developed. COMET involves inner and outer iterations;inner iterations contain local (i.e., response data) calculations that can be carried out independently, making the algorithm amenable to parallelization. Taking advantage of this fact, a distributed memory algorithm featuring domain decomposition was developed. To allow for efficient parallel implementation of a distributedalgorithm, changes to response data access and sweep order are made, along with considerations for communications between processors. These changes make the approach generalizable to many different problem types. A software implementation called COMET-MPI was developed and implemented for several benchmark problems. Analysis of the computational performance of COMET-MPI resulted in an estimated parallel fraction of 0.98 for the code, implying a high level of parallelism. In addition, wall clock times on the order of minutes are achieved when the code is used to solve whole core benchmark problems, showing vastly improved computational efficiency using the distributed memory algorithm. (C) 2017 Elsevier Ltd. All rights reserved.
A recently proposed scenario decomposition algorithm for stochastic 0-1 programs finds an optimal solution by evaluating and removing individual solutions that are discovered by solving scenario subproblems. In this w...
详细信息
ISBN:
(纸本)9781509036820
A recently proposed scenario decomposition algorithm for stochastic 0-1 programs finds an optimal solution by evaluating and removing individual solutions that are discovered by solving scenario subproblems. In this work, we develop an asynchronous, distributed implementation of the algorithm which has computational advantages over existing synchronous implementations of the algorithm. Improvements to both the synchronous and asynchronous algorithm are proposed. We test the results on well known stochastic 0-1 programs from the SIPLIB test library and is able to solve one previously unsolved instance from the test set.
Support Vector Machines (SVM) is a supervised Machine Learning and Data Mining (MLDM) algorithm, which has become ubiquitous largely due to its high accuracy and obliviousness to dimensionality. The objective of SVM i...
详细信息
ISBN:
(纸本)9781467365987
Support Vector Machines (SVM) is a supervised Machine Learning and Data Mining (MLDM) algorithm, which has become ubiquitous largely due to its high accuracy and obliviousness to dimensionality. The objective of SVM is to find an optimal boundary - also known as hyperplane - which separates the samples (examples in a dataset) of different classes by a maximum margin. Usually, very few samples contribute to the definition of the boundary. However, existing parallel algorithms use the entire dataset for finding the boundary, which is sub-optimal for performance reasons. In this paper, we propose a novel distributed memory algorithm to eliminate the samples which do not contribute to the boundary definition in SVM. We propose several heuristics, which range from early (aggressive) to late (conservative) elimination of the samples, such that the overall time for generating the boundary is reduced considerably. In a few cases, a sample may be eliminated (shrunk) pre-emptively - potentially resulting in an incorrect boundary. We propose a scalable approach to synchronize the necessary data structures such that the proposed algorithm maintains its accuracy. We consider the necessary trade-offs of single/multiple synchronization using in-depth time space complexity analysis. We implement the proposed algorithm using MPI and compare it with libsvm - de facto sequential SVM software - which we enhance with OpenMP for multi-core/many-core parallelism. Our proposed approach shows excellent efficiency using up to 4096 processes on several large datasets such as UCI HIGGS Boson dataset and Offending URL dataset.
暂无评论