We live in an era of big data and the analysis of these data is becoming a bottleneck in many domains including biology and the internet. To make these analyses feasible in practice, we need efficient data reduction a...
详细信息
ISBN:
(纸本)9781509042975
We live in an era of big data and the analysis of these data is becoming a bottleneck in many domains including biology and the internet. To make these analyses feasible in practice, we need efficient data reduction algorithms. The Singular Value Decomposition (SVD) is a data reduction technique that has been used in many different applications. For example, SVDs have been extensively used in text analysis. The best known sequential algorithms for the computation of SVDs take cubic time which may not be acceptable in practice. As a result, many parallel algorithms have been proposed in the literature. There are two kinds of algorithms for SVD, namely, QR decomposition and Jacobi iterations. Researchers have found out that even though QR is sequentially faster than Jacobi iterations, QR is difficult to parallelize. As a result, most of the parallel algorithms in the literature are based on Jacobi iterations. For example, the Jacobi Relaxation Scheme (JRS) of the classical Jacobi algorithm has been shown to be very effective in parallel. In this paper we propose a novel variant of the classical Jacobi algorithm that is more efficient than the JRS algorithm. Our experimental results confirm this assertion. The key idea behind our algorithm is to select the pivot elements for each sweep appropriately. We also show how to efficiently implement our algorithm on such parallel models as the PRAM and the mesh.
Computing k-Nearest Neighbors (KNN) is one of the core kernels used in many machine learning, data mining and scientific computing applications. Although kd-tree based O(log n) algorithms have been proposed for comput...
详细信息
ISBN:
(纸本)9781509021406
Computing k-Nearest Neighbors (KNN) is one of the core kernels used in many machine learning, data mining and scientific computing applications. Although kd-tree based O(log n) algorithms have been proposed for computing KNN, due to its inherent sequentiality, linear algorithms are being used in practice. This limits the applicability of such methods to millions of data points, with limited scalability for Big Data analytics challenges in the scientific domain. In this paper, we present parallel and highly optimized kd-tree based KNN algorithms (both construction and querying) suitable for distributed architectures. Our algorithm includes novel approaches for pruning search space and improving load balancing and partitioning among nodes and threads. Using TB-sized datasets from three science applications: astrophysics, plasma physics, and particle physics, we show that our implementation can construct kd-tree of 189 billion particles in 48 seconds on utilizing similar to 50,000 cores. We also demonstrate computation of KNN of 19 billion queries in 12 seconds. We demonstrate almost linear speedup both for shared and distributed memory computers. Our algorithms outperforms earlier implementations by more than order of magnitude;thereby radically improving the applicability of our implementation to state-of-the-art Big Data analytics problems.
In this paper, we present a bottom-up approach to parallel anisotropic mesh generation by building a mesh generator from principles. Applications focusing on high-lift design or dynamic stall, or numerical methods and...
详细信息
ISBN:
(纸本)9781509028238
In this paper, we present a bottom-up approach to parallel anisotropic mesh generation by building a mesh generator from principles. Applications focusing on high-lift design or dynamic stall, or numerical methods and modeling test cases still focus on the two-dimensions. Our push-button parallel mesh generation approach can generate high-fidelity unstructured meshes with anisotropic boundary layers for use in the computational fluid dynamics field. The anisotropy requirement adds a level of complexity to a parallel meshing algorithm by making computation depend on the local alignment of elements, which in turn is dictated by geometric boundaries and the density functions. Our experimental results show 70% parallel efficiency over the fastest sequential isotropic mesh generator on 256 distributed memory nodes.
The paper describes the approach for a distributed execution of data mining algorithms and using this approach for building a Cloud for Data Mining. The suggested approach allows us to execute data mining algorithms i...
详细信息
ISBN:
(纸本)9783319415611;9783319415604
The paper describes the approach for a distributed execution of data mining algorithms and using this approach for building a Cloud for Data Mining. The suggested approach allows us to execute data mining algorithms in different parallel and distributed environments. Thus, the created Cloud for Data Mining can be used as an analytic service and a platform for research and debugging parallel and distributed data mining algorithms.
Community detection is an important data clustering technique for studying graph structures. Many serial algorithms have been developed and well studied in the literature. As the problem size grows, the research atten...
详细信息
ISBN:
(纸本)9783319413211;9783319413204
Community detection is an important data clustering technique for studying graph structures. Many serial algorithms have been developed and well studied in the literature. As the problem size grows, the research attention has recently been turning to parallelizing the technique. However, the conventional parallelization strategies that divide the problem domain into non-overlapping subdomains do not scale with problem size and the number of processes. The main obstacle lies in the fact that the graph algorithms often exhibit a high degree of data dependency, which makes developing scalable parallel algorithms a great challenge. We present PMEP, a distributed-memory based parallel community detection algorithm that adopts an unconventional data partitioning strategy. PMEP divides a graph into subgraphs and assigns each pair of subgraphs to one process. This method duplicates a portion of computational workload among processes in exchange for a significantly reduced communication cost required in the later stages. After data partitioning, each process runs MEP on the assigned subgraph pair. MEP is a community detection algorithm based on the idea of maximizing equilibrium and purity. Our data partitioning method effectively simplifies the communication required for combining the local results into a global one and hence allows us to achieve better scalability over existing parallel algorithms without sacrificing the result quality. Our experimental results show a speedup of 126.95 on 190 MPI processes for using synthetic data sets and a speedup of 204.22 on 1225 processes for using a real-world data set.
Frequent items in high-speed streaming data are important to many applications like network monitoring and anomaly detecting. To deal with high arrival rate of streaming data, it is desirable that such systems be capa...
详细信息
ISBN:
(纸本)9781509007684
Frequent items in high-speed streaming data are important to many applications like network monitoring and anomaly detecting. To deal with high arrival rate of streaming data, it is desirable that such systems be capable of supporting high processing throughput with tight guarantees on errors. In this paper, we address the problem of finding frequent and top-k items, and present a parallel version of the Space Saving algorithm in the context of the open source distributed computing system. Based on the theoretical analysis, the errors are restrictively bounded in our algorithm, and our parallel design could achieve high throughput. Taking advantage of the distributed computing resources, our evaluation reveals that such design delivers linear speedup with remarkable scalability.
The article presents an algorithmic model of sound propagation in rooms to run on parallel and distributed computer systems. This algorithm is used by the authors in an implementation of an adaptable high-performance ...
详细信息
ISBN:
(纸本)9781509034840
The article presents an algorithmic model of sound propagation in rooms to run on parallel and distributed computer systems. This algorithm is used by the authors in an implementation of an adaptable high-performance computer system simulating various fields and providing scalability on an arbitrary number of parallel central and graphical processors as well as distributed computer clusters. Many general-purpose computer simulation systems have limited usability when it comes to high-precision simulation associated with large numbers of elementary computations due to their lack of scalability on various parallel and distributed platforms. The more the required adequacy of the model is, the higher the numbers of steps of the simulation algorithms are. Scalability permits a use hybrid parallel computer systems and improves efficiency of the simulation with respect to adequacy, time consumptions, and total costs of simulation experiments. The report covers such an algorithm which is based on an approximate superposition of acoustical fields and provides adequate results, as long as the used equations of acoustics are linear. The algorithm represents reflecting surfaces as sets of vibrating pistons and uses the Rayleigh integral to calculate their scattering properties. The article also provides a parallel form of the algorithm and analysis of its properties in parallel and sequential forms.
With the explosive growth of data, we have entered the era of big data. In order to sift through masses of information, many data mining algorithms using parallelization are being implemented. Cluster analysis occupie...
详细信息
ISBN:
(纸本)9781509039364
With the explosive growth of data, we have entered the era of big data. In order to sift through masses of information, many data mining algorithms using parallelization are being implemented. Cluster analysis occupies a pivotal position in data mining, and the DBSCAN algorithm is one of the most widely used algorithms for clustering. However, when the existing parallel DBSCAN algorithms create data partitions, the original database is usually divided into several disjoint partitions;with the increase in data dimension, the splitting and consolidation of high-dimensional space will consume a lot of time. To solve the problem, this paper proposes a parallel DBSCAN algorithm (S_DBSCAN) based on Spark, which can quickly realize the partition of the original data and the combination of the clustering results. It is divided into the following steps: 1) partitioning the raw data based on a random sample, 2) computing local DBSCAN algorithms in parallel, 3) merging the data partitions based on the centroid. Compared with the traditional DBSCAN algorithm, the experimental result shows the proposed S_DBSCAN algorithm provides better operating efficiency and scalability.
The main contribution of this paper is to present Bitwise parallel Bulk Computation (BPBC) technique, to accelerate bulk computation, which executes the same algorithm for a lot of instances in turn or in parallel. Th...
详细信息
ISBN:
(纸本)9781509036820
The main contribution of this paper is to present Bitwise parallel Bulk Computation (BPBC) technique, to accelerate bulk computation, which executes the same algorithm for a lot of instances in turn or in parallel. The idea of the BPBC technique is to simulate a combinational logic circuit for 32 inputs at the same time using bitwise logic operators for 32-bit integers supported by most processing devices. We will show that the BPBC technique works very efficiently on a CPU as well as on a GPU. As a simple example of the BPBC, we first show that the pairwise sums of a lot of integers can be computed faster using the BPBC technique, if the values of input integers are not large. We also show that the CKY parsing for context-free grammars can be implemented in the GPU efficiently using the BPBC technique. The experimental results using Intel Core i7 CPU and GeForce GTX TITAN X GPU show that the GPU implementation for the CKY parsing can be more than 400 times faster than the CPU implementation.
The induction of a minimal nondeterministic finite automaton (NFA) consistent with a given set of examples and counter examples, which is known to be computationally hard, is discussed. The paper is an extension to th...
详细信息
ISBN:
(纸本)9783319321493;9783319321486
The induction of a minimal nondeterministic finite automaton (NFA) consistent with a given set of examples and counter examples, which is known to be computationally hard, is discussed. The paper is an extension to the novel approach of transforming the problem of NFA induction into the integer nonlinear programming (INLP) problem. An improved formulation of the problem is proposed along with the two parallel algorithms to solve it. The methods for the distribution of tasks among processors along with distributed termination detection are presented. The experimental results for selected benchmarks are also reported.
暂无评论