A good number of parallel and distributed frequent pattern mining algorithms have been proposed so far for the large and/or distributed databases. Not only occurrence frequency of a pattern but also occurrence behavio...
详细信息
ISBN:
(纸本)9781634396226
A good number of parallel and distributed frequent pattern mining algorithms have been proposed so far for the large and/or distributed databases. Not only occurrence frequency of a pattern but also occurrence behavior (regularity) of a pattern may be treated as an emerging area in data mining research. So far some efforts have been made to mine regular patterns but there is no suitable algorithm exists to mine frequent-regular patterns in parallel and distributed environment. Therefore, in this paper we introduced a new method called PFRP-method (parallel Frequent Regular Pattern-method) to discover frequent-regular patterns in large databases using vertical data format which requires only one database scan. Our method works in parallel at each local site in order to reduce I/O cost and inter-process communication, generates all frequent-regular patterns in the final phase. Our experiment results show that our PFRP-method is highly efficient in large databases.
The process of discovering interesting patterns in large, possibly huge, data sets is referred to as data mining, and can be performed in several flavours, known as "data mining functions." Among these funct...
详细信息
The process of discovering interesting patterns in large, possibly huge, data sets is referred to as data mining, and can be performed in several flavours, known as "data mining functions." Among these functions, outlier detection discovers observations which deviate substantially from the rest of the data, and has many important practical applications. Outlier detection in very large data sets is however computationally very demanding and currently requires high-performance computing facilities. We propose a family of parallel and distributed algorithms for graphic processing units (GPU) derived from two distance-based outlier detection algorithms: BruteForce and SolvingSet. The algorithms differ in the way they exploit the architecture and memory hierarchy of the GPU and guarantee significant improvements with respect to the CPU versions, both in terms of scalability and exploitation of parallelism. We provide a detailed discussion of their computational properties and measure performances with an extensive experimentation, comparing the several implementations and showing significant speedups.
parallel Newton two-stage iterative methods to solve nonlinear systems are studied. These algorithms are based on both the multisplitting technique and the two-stage iterative methods. Convergence properties of these ...
详细信息
parallel Newton two-stage iterative methods to solve nonlinear systems are studied. These algorithms are based on both the multisplitting technique and the two-stage iterative methods. Convergence properties of these methods are studied when the Jacobian matrix is either monotone or an H-matrix. Furthermore, in order to illustrate the performance of the algorithms studied, computational results about these methods on a distributed memory multiprocessor are discussed.
In this work, the parallel Fast Condensed Nearest Neighbor (PFCNN) rule, a distributed method for computing a consistent subset of a very large data set for the nearest neighbor classification rule is presented. In or...
详细信息
In this work, the parallel Fast Condensed Nearest Neighbor (PFCNN) rule, a distributed method for computing a consistent subset of a very large data set for the nearest neighbor classification rule is presented. In order to cope with the communication overhead typical of distributed environments and to reduce memory requirements, different variants of the basic PFCNN method are introduced. An analysis of spatial cost, CPU cost, and communication overhead is accomplished for all the algorithms. Experimental results, performed on both synthetic and real very large data sets, revealed that these methods can be profitably applied to enormous collections of data. Indeed, they scale up well and are efficient in memory consumption, confirming the theoretical analysis, and achieve noticeable data reduction and good classification accuracy. To the best of our knowledge, this is the first distributed algorithm for computing a training set consistent subset for the nearest neighbor rule.
We study whether iterated vector fields (vector fields composed with themselves) are conservative. We give explicit examples of vector fields for which this self-composition preserves conservatism. Notably, this inclu...
详细信息
We study whether iterated vector fields (vector fields composed with themselves) are conservative. We give explicit examples of vector fields for which this self-composition preserves conservatism. Notably, this includes gradient vector fields of loss functions associated to some generalized linear models. In the context of federated learning, we show that when clients have loss functions whose gradient satisfies this condition, federated averaging is equivalent to gradient descent on a surrogate loss function. We leverage this to derive novel convergence results for federated learning. By contrast, we demonstrate that when the client losses violate this property, federated averaging can yield behavior which is fundamentally distinct from centralized optimization. Finally, we discuss theoretical and practical questions our analytical framework raises for federated learning.
A new family of parameterized LU decomposition algorithms, JHU LU, is presented. For moderate size matrices, JHU LU is twice as fast as vendor-tuned ScaLAPACK pdgetrf In contrast to algorithms inspired by geometricall...
详细信息
ISBN:
(纸本)1932415068
A new family of parameterized LU decomposition algorithms, JHU LU, is presented. For moderate size matrices, JHU LU is twice as fast as vendor-tuned ScaLAPACK pdgetrf In contrast to algorithms inspired by geometrically regular data distribution, JHU L U is based on an upper diagonal work distribution. Comparatively coarse-grained parallelism is used to reduce the total number of communications between processors while exploiting high per-processor performance available by using BLAS subroutines. Another key departure from ScaLAPACK and other libraries is a requirement that the algorithms finish with the resulting factors L and U residing in a single processor memory - in particular, the processor memory that instially stores the matrix A to be decomposed. During processing, JHU L U iteratively updates the data distribution with processors dropping off successively, much like a pipeline discharge. This allows computation to continue while gathering results. Algorithm parameters control the number of processors used, problem size, and block size. A new set of parameters controls the quantity of work done by each processor. JHU L U is implemented on an HP SuperDome system with 550 MHz processors and is compared to ScaLAPACK pdgetrf Using 16 processors, the time to factor a 1000 x 1000 matrix is 195 ms for JHU and 390 ms for ScaLAPACK. In addition, for matrices between 500 x 500 and 3000 x 3000, JHU L U achieves faster execution.
Hierarchical clustering is a fundamental tool in data mining, machine learning and statistics. Popular hierarchical clustering algorithms include top-down divisive approaches such as bisecting k-means, k-median, and k...
详细信息
ISBN:
(纸本)9783030461508;9783030461492
Hierarchical clustering is a fundamental tool in data mining, machine learning and statistics. Popular hierarchical clustering algorithms include top-down divisive approaches such as bisecting k-means, k-median, and k-center and bottom-up agglomerative approaches such as single-linkage, average-linkage, and centroid-linkage. Unfortunately, only a few scalable hierarchical clustering algorithms are known, mostly based on the single-linkage algorithm. So, as datasets increase in size every day, there is a pressing need to scale other popular methods. We introduce efficient distributedalgorithms for bisecting k-means, k-median, and k-center as well as centroid-linkage. In particular, we first formalize a notion of closeness for a hierarchical clustering algorithm, and then we use this notion to design new scalable distributed methods with strong worst case bounds on the running time and the quality of the solutions. Finally, we show experimentally that the introduced algorithms are efficient and close to their sequential variants in practice.
Graphs enjoy profound importance because of their versatility and expressivity. They can be effectively used to represent social networks, web search engines and genome sequencing. The field of graph pattern matching ...
详细信息
ISBN:
(纸本)9780769550060
Graphs enjoy profound importance because of their versatility and expressivity. They can be effectively used to represent social networks, web search engines and genome sequencing. The field of graph pattern matching has been of significant importance and has wide-spread applications. Conceptually, we want to find subgraphs that match a pattern in a given graph. Much work has been done in this field with solutions like Subgraph Isomorphism and Regular Expression matching. With Big Data, scientists are frequently running into massive graphs that have amplified the challenge that this area poses. We study the speedup and communication behavior of three distributedalgorithms for inexact graph pattern matching. We also study the impact of different graph partitionings on runtime and network I/O. Our extensive results show that the algorithms exhibit excellent scalable behavior and min-cut partitioning can lead to improved performance under some circumstances, and can drastically reduce the network traffic as well.
Low-density parity check (LDPC) codes have been extensively applied in mobile communication systems due to their excellent error correcting capabilities. However, their broad adoption has been hindered by the high com...
详细信息
ISBN:
(纸本)9781728105543
Low-density parity check (LDPC) codes have been extensively applied in mobile communication systems due to their excellent error correcting capabilities. However, their broad adoption has been hindered by the high complexity of the LDPC decoder. Although to date, dedicated hardware has been used to implement low latency LDPC decoders, recent advancements in the architecture of mobile processors have made it possible to develop software solutions. In this paper, we propose a multi-stream LDPC decoder designed for a mobile device. The proposed decoder uses graphics processing unit (GPU) of a mobile device to achieve efficient real-time decoding. The proposed solution is implemented on an NVIDIA Tegra board as a system on a chip (SoC), where our results indicate that we can control the load on the central processing units through the multi-stream structure.
暂无评论