This article is devoted to the analysis of parallelization efficiency of algorithms for solving linear systems of equations. The article presents an assessment of the quality of optimization of two algorithms for dist...
详细信息
ISBN:
(纸本)9781665404761
This article is devoted to the analysis of parallelization efficiency of algorithms for solving linear systems of equations. The article presents an assessment of the quality of optimization of two algorithms for distributing the computational load between cluster nodes. The hardware is a computing cluster based on single-board computers with Linux Debian 8 as an operation system. Hydra Process Manager with an Slurm resource manager that implements a queue for multi-user access to the system is used to run and debug parallel tasks. The resulting system is planned to be used in the future in the educational process of MEPhI (Moscow Engineering Physics Institute) for a laboratory workshop on parallel computing.
Clustering multi-dimensional points is a fundamental task in many fields, and density-based clustering supports many applications as it can discover clusters of arbitrary shapes. This paper addresses the problem of De...
详细信息
ISBN:
(纸本)9781450383431
Clustering multi-dimensional points is a fundamental task in many fields, and density-based clustering supports many applications as it can discover clusters of arbitrary shapes. This paper addresses the problem of Density-Peaks Clustering (DPC), a recently proposed density-based clustering framework. Although DPC already has many applications, its straightforward implementation incurs a quadratic time computation to the number of points in a given dataset, thereby does not scale to large datasets. To enable DPC on large datasets, we propose efficient algorithms for DPC. Specifically, we propose an exact algorithm, Ex-DPC, and two approximation algorithms, Approx-DPC and S-Approx-DPC. Under a reasonable assumption about a DPC parameter, our algorithms are sub-quadratic, i.e., break the quadratic barrier. Besides, Approx-DPC does not require any additional parameters and can return the same cluster centers as those of Ex-DPC, rendering an accurate clustering result. S-Approx-DPC requires an approximation parameter but can speed up its computational efficiency. We further present that their efficiencies can be accelerated by leveraging multicore processing. We conduct extensive experiments using synthetic and real datasets, and our experimental results demonstrate that our algorithms are efficient, scalable, and accurate.
Short text clustering deals with the problem of grouping together semantically similar documents with small lengths. Nowadays, huge amounts of text data is being generated by numerous applications such as microblogs, ...
详细信息
ISBN:
(纸本)9781665408981
Short text clustering deals with the problem of grouping together semantically similar documents with small lengths. Nowadays, huge amounts of text data is being generated by numerous applications such as microblogs, messengers, and services that generate or aggregate entitled entities. This large volume of highly dimensional and sparse information may easily overwhelm the current serial approaches and render them inefficient, or even inapplicable. Although many traditional clustering algorithms have been successfully parallelized in the past, the parallelization of short text clustering algorithms is a rather overlooked problem. In this paper we introduce pVEPHC, a short text clustering method that can be executed in parallel in large computer clusters. The algorithm draws inspiration from VEPHC, a recent two-stage approach with decent performance in several diverse tasks. More specifically, in this work we employ the Apache Spark framework to design parallel implementations of both stages of VEPHC. During the first stage, pVEPHC generates an initial clustering by identifying and modelling common low-dimensional vector representations of the original documents. In the sequel, the initial clustering is improved in the second stage by applying cluster split and merge operations in a hierarchical fashion. We have attested our implementation on an experimental Spark cluster and we report an almost linear improvement in the execution times of the algorithm.
The possibility to create a flood wave in a river network depends on the geometric properties of the river basin. Among the models that try to forecast the Instantaneous Unit Hydrograph (IUH) of rainfall precipitation...
详细信息
The possibility to create a flood wave in a river network depends on the geometric properties of the river basin. Among the models that try to forecast the Instantaneous Unit Hydrograph (IUH) of rainfall precipitation, the so-called Multifractal Instantaneous Unit Hydrograph (MIUH) rather successfully connects the multifractal properties of the river basin to the observed IUH. Such properties can be assessed through different types of analysis (fixed-size algorithm, correlation integral, fixed-mass algorithm, sandbox algorithm, and so on). The fixed-mass algorithm is the one that produces the most precise estimate of the properties of the multifractal spectrum that are relevant for the MIUH model. However, a disadvantage of this method is that it requires very long computational times to produce the best possible results. In a previous work, we proposed a parallel version of the fixed-mass algorithm, which drastically reduced the computational times almost proportionally to the number of Central Processing Unit (CPU) cores available on the computational machine by using the Message Passing Interface (MPI), which is a standard for distributed memory clusters. In the present work, we further improved the code in order to include the use of the Open Multi-Processing (OpenMP) paradigm to facilitate the execution and improve the computational speed-up on single processor, multi-core workstations, which are much more common than multi-node clusters. Moreover, the assessment of the multifractal spectrum has also been improved through a direct computation method. Currently, to the best of our knowledge, this code represents the state-of-the-art for a fast evaluation of the multifractal properties of a river basin, and it opens up a new scenario for an effective flood forecast in reasonable computational times.
The locality of a graph problem is the smallest distance T such that each node can choose its own part of the solution based on its radius-T neighborhood. In many settings, a graph problem can be solved efficiently wi...
详细信息
ISBN:
(纸本)9783030795269;9783030795276
The locality of a graph problem is the smallest distance T such that each node can choose its own part of the solution based on its radius-T neighborhood. In many settings, a graph problem can be solved efficiently with a distributed or parallel algorithm if and only if it has a small locality. In this work we seek to automate the study of solvability and locality: given the description of a graph problem., we would like to determine if. is solvable and what is the asymptotic locality of. as a function of the size of the graph. Put otherwise, we seek to automatically synthesize efficient distributed and parallel algorithms for solving.. We focus on locally checkable graph problems;these are problems in which a solution is globally feasible if it looks feasible in all constant-radius neighborhoods. Prior work on such problems has brought primarily bad news: questions related to locality are undecidable in general, and even if we focus on the case of labeled paths and cycles, determining locality is PSPACE-hard (Balliu et al. PODC 2019). We complement prior negative results with efficient algorithms for the cases of unlabeled paths and cycles and, as an extension, for rooted trees. We study locally checkable graph problems from an automata-theoretic perspective by representing a locally checkable problem. as a nondeterministic finite automaton M over a unary alphabet. We identify polynomial-time-computable properties of the automaton M that near-completely capture the solvability and locality of Pi in cycles and paths, with the exception of one specific case that is co-NP-complete.
We discuss parallel algorithms to compute the ghost layer in computational, distributed memory, recursively adapted meshes. Its creation is a fundamental, necessary task in executing most parallel, element-based compu...
详细信息
We discuss parallel algorithms to compute the ghost layer in computational, distributed memory, recursively adapted meshes. Its creation is a fundamental, necessary task in executing most parallel, element-based computer simulations. Common methods differ in that the ghost layer may either be inherently part of the mesh data structure that is maintained and modified, or kept separate and constructed/deleted as needed. In this work, we present a design following the latter approach, which we chose for its modularity of algorithms and data structures. We target arbitrary adaptive, nonconforming forest-of-trees meshes of mixed element shapes, such as cubes, prisms, and tetrahedra, and restrict ourselves to ghost elements across mesh faces. Our algorithm has low code complexity and redundancy since we reduce it to generic co dimension-1 subalgorithms that can be flexibly combined. We recover older algorithms for cubic elements as special cases and optimize further using recursive, amortized tree searches and traversals.
Recent technological advancements have enabled generating and collecting huge amounts of data in a daily manner. This data is used for different purposes that may impact us on an unprecedented scale. Understanding the...
详细信息
ISBN:
(纸本)9781728191843
Recent technological advancements have enabled generating and collecting huge amounts of data in a daily manner. This data is used for different purposes that may impact us on an unprecedented scale. Understanding the data, including detecting its outliers, is a critical step before utilizing it. Outlier detection has been studied well in the literature but the existing approaches fail to scale to these very large settings. In this paper, we propose DBSCOUT, an efficient exact algorithm for outlier detection with a linear complexity that can run in parallel over multiple independent machines, making it a fit for the settings with billions of tuples. Besides the theoretical analysis, our experiment results confirm orders of magnitude improvement over the existing work, proving the efficiency, scalability, and effectiveness of our approach.
Fraud detection is a pressing challenge for most financial and commercial platforms. In this paper, we study the processing pipeline of fraud detection in a large e-commerce platform of TaoBao. Graph label propagation...
详细信息
ISBN:
(纸本)9781450383431
Fraud detection is a pressing challenge for most financial and commercial platforms. In this paper, we study the processing pipeline of fraud detection in a large e-commerce platform of TaoBao. Graph label propagation (LP) is a core component in this pipeline to detect suspicious clusters from the user-interaction graph. Furthermore, the run-time of the LP component occupies 75% overhead of TaoBao's automated detection pipeline. To enable real-time fraud detection, we propose a GPU-based framework, called GLP, to support large-scale LP workloads in enterprises. We have identified two key challenges when integrating GPU acceleration into TaoBao's data processing pipeline: (1) programmability for evolving fraud detection logics;(2) demand for real-time performance. Motivated by these challenges, we offer a set of expressive APIs that data engineers can customize and deploy efficient LP algorithms on GPUs with ease. We propose novel GPU-centric optimizations by leveraging the community as well as power-law properties of large graphs. Extensive experiments have confirmed the effectiveness of our proposed optimizations. With a single GPU, GLP supports a real billion-scale graph workload from the fraud detection pipeline of TaoBao and achieves 8.2x speedup to the current in-house distributed solution running on high-end multicore machines.
Broadcasting is the process of dissemination of information in a network where a particular message is sent starting from an originator. The ultimate objective is to inform all entities of the network as soon as possi...
详细信息
ISBN:
(纸本)9781665414906
Broadcasting is the process of dissemination of information in a network where a particular message is sent starting from an originator. The ultimate objective is to inform all entities of the network as soon as possible. Besides, the theory of broadcasting has been used in a wide range of applications. Although the problem has been proven to be NP-Complete for an arbitrary graph, it is tractable for a few families of networks, such as Hypercubes, one of the most successful large-scale parallel architectures. In this paper, we propose a new heuristic for broadcasting in a hypercube of trees in which each vertex of the hypercube could be the root of a tree. Not only does this heuristic have the same approximation ratio as the best-known algorithm and provide a good theoretical bound, but our numerical results also show its superiority in most experiments. Our heuristic is able to outperform the current upper bound in up to 90% of the situations, resulting in an average speedup of 30%. Most importantly, our results illustrate that it can maintain its performance even if the size of the network grows, which makes the proposed heuristic practically useful.
In recent days we can see that multicore computers have the ability to easily manipulate digit numbers however as numbers get bigger the computation becomes more complex, the reason is that the size of both CPU regist...
详细信息
In recent days we can see that multicore computers have the ability to easily manipulate digit numbers however as numbers get bigger the computation becomes more complex, the reason is that the size of both CPU registers and buses is limited. As a result, the arithmetic operations such as addition, subtraction, multiplication, and division for CPU become more complex to perform. For solving the problem of how to do computation on big digit numbers, a number of algorithms have been developed. However, the existing algorithms are noticeably slow because they operate on bits individually and are designed to run over single-core computers only. In this paper, an AI model is presented that performs a computation on tokens of 8 -digit numbers to assist boost the CPU computation performance.. (C) 2021 The Authors. Published by Elsevier B.V.
暂无评论