In this paper, we consider mixed-integer global optimization problems and propose a parallel algorithm for solving problems of this class based on information-statistical approach for solving continuous global optimiz...
详细信息
In this paper, we consider mixed-integer global optimization problems and propose a parallel algorithm for solving problems of this class based on information-statistical approach for solving continuous global optimization problems. Within this algorithm, we suggest using a local tuning scheme based on the assumption that the multiextremality of the discussed problem is weak. We also compare the sequential version of the algorithm with other similar methods. The effectiveness of parallelizing the algorithm has been confirmed by solving a series of mixed-integer global optimization problems on the Lobachevskii supercomputer.
The widely used alternating least squares (ALS) algorithm for the canonical polyadic (CP) tensor decomposition is dominated in cost by the matricized-tensor times Khatri-Rao product (MTTKRP) kernel. This kernel is nec...
详细信息
ISBN:
(纸本)9781665440660
The widely used alternating least squares (ALS) algorithm for the canonical polyadic (CP) tensor decomposition is dominated in cost by the matricized-tensor times Khatri-Rao product (MTTKRP) kernel. This kernel is necessary to set up the quadratic optimization subproblems. State-of-the-art parallel ALS implementations use dimension trees to avoid redundant computations across M1TKRPs within each ALS sweep. In this paper, we propose two new parallel algorithms to accelerate CP-ALS. We introduce the multi-sweep dimension tree (MSDT) algorithm, which requires the contraction between an order N input tensor and the first-contracted input matrix once every (N - 1) /N sweeps. This algorithm reduces the leading order computational cost by a factor of 2(N - 1)/N relative to the best previously known approach. In addition, we introduce a more communication-efficient approach to parallelizing an approximate CP-ALS algorithm, pairwise perturbation. This technique uses perturbative corrections to the subproblems rather than recomputing the contractions, and asymptotically accelerates ALS. Our benchmark results on 1024 processors on the Stampede2 supercomputer show that CP decomposition obtains a 1.25X speed-up from MSDT and a 1.94X speed-up from pairwise perturbation compared to the state-of-the-art dimension-tree based CP-ALS implementations.
The enumeration of all cliques in a graph or finding the largest clique are important problems that unfortunately are computationally intensive. Another alternative is to select only the most important motifs (e.g., s...
详细信息
We consider the problem of nonnegative tensor completion. We adopt the alternating optimization framework and solve each nonnegative matrix completion problem via a stochastic variation of the accelerated gradient alg...
详细信息
ISBN:
(纸本)9789082797060
We consider the problem of nonnegative tensor completion. We adopt the alternating optimization framework and solve each nonnegative matrix completion problem via a stochastic variation of the accelerated gradient algorithm. We experimentally test the effectiveness and the efficiency of our algorithm using both real-world and synthetic data. We develop a shared-memory implementation of our algorithm using the multi-threaded API OpenMP, which attains significant speedup. We believe that our approach is a very competitive candidate for the solution of very large nonnegative tensor completion problems.
Subgraph isomorphism is one of the most challenging problems on graph-based representations. Despite many efficient sequential algorithms have been proposed over the last decades, solving this problem on large graphs ...
详细信息
ISBN:
(纸本)9783030739720;9783030739737
Subgraph isomorphism is one of the most challenging problems on graph-based representations. Despite many efficient sequential algorithms have been proposed over the last decades, solving this problem on large graphs is still a time demanding task. For this reason, there is a recently growing interest in realizing effective parallel algorithms able to exploit at their best the modern multi-core architectures commonly available on servers and workstations. We propose a comparison of four parallel algorithms derived from the state-of-the-art sequential algorithm VF3-Light;two of them were presented in previous works, while the other two are introduced in this paper. In order to evaluate strong points and weaknesses of each algorithm, we performed a benchmark over six datasets of random large and dense graphs, both labelled and unlabelled, measuring memory usage, speed-up and efficiency. We also add a comparison with a different parallel algorithm, named Glasgow, that is not derived from VF3-Light.
Lattice problems are a class of optimization problems that are notably hard. There are no classical or quantum algorithms known to solve these problems efficiently. Their hardness has made lattices a major cryptograph...
详细信息
ISBN:
(纸本)9781665410168
Lattice problems are a class of optimization problems that are notably hard. There are no classical or quantum algorithms known to solve these problems efficiently. Their hardness has made lattices a major cryptographic primitive for post-quantum cryptography. Several different approaches have been used for lattice problems with different computational profiles;some suffer from super-exponential time, and others require exponential space. This motivated us 10 develop a novel lattice problem solver, CMAP-LAP, based on the clever coordination of different algorithms that run massively in parallel. With our flexible framework, heterogeneous modules run asynchronously in parallel on a large-scale distributed system while exchanging information, which drastically boosts the overall performance. We also implement full checkpoint-and-restart functionality, which is vital to high-dimensional lattice problems. CMAP-LAP facilitates the implementation of large-scale parallel strategies for lattice problems since all the functions are designed to he customizable and abstract. Through numerical experiments with up to 103,680 cores, we evaluated the performance and stability of our system and demonstrated its high capability for future massive-scale experiments.
Given all pairwise weights (distances) among a set of objects, filtered graphs provide a sparse representation by only keeping an important subset of weights. Such graphs can be passed to graph clustering algorithms t...
详细信息
We present a randomized parallel algorithm, in the Exclusive-Read Exclusive-Write (EREW) PRAM model, that computes a Maximal Independent Set (MIS) in O(log n) time and using O(m log(2) n) work, with high probability. ...
详细信息
ISBN:
(纸本)9781611976465
We present a randomized parallel algorithm, in the Exclusive-Read Exclusive-Write (EREW) PRAM model, that computes a Maximal Independent Set (MIS) in O(log n) time and using O(m log(2) n) work, with high probability. Thus, MIS is an element of RNC1. This time complexity is optimal and it improves on the celebrated O(log(2) n) time algorithms of Luby [STOC'85] and Alon, Babai, and Itai [JALG'86], which had remained the state of the art for the past 35 years.
Community detection in social networks is the process of identifying the cohesive groups of similar nodes. Detection of these groups can be helpful in many applications, such as finding networks of protein interaction...
详细信息
Community detection in social networks is the process of identifying the cohesive groups of similar nodes. Detection of these groups can be helpful in many applications, such as finding networks of protein interaction in biological networks, finding the users of similar mind for ads and suggestions, finding a shared research field in collaborative networks, analyzing public health, future link prediction in social networks, analyzing criminology, and many more. However, with the increase in the number of profiles and content shared on social media platforms, the analysis is often time-consuming and exhaustive. In order to speed up and optimize the community detection process, parallel processing and Shared/Distributed memory techniques are widely used. Despite community detection has widespread use in social networks, no attempt has ever been made to compile and systematically discuss research efforts on the emerging subject of identifying parallel and distributed methods for community detection in social networks. Most of the surveys described the serial algorithms used for community detection. Our survey work comes under the scope of new design techniques, exciting or novel applications, components or standards, and applications of an educational, transactional, and co-operational nature. This paper accommodates and presents a systematic literature review with state-of-the-art research on the application of parallel processing and Shared/Distributed techniques to determine communities for social network analysis. Advanced search strategy has been performed on several digital libraries for extracting several studies for the review. The systematic search landed in finding 3220 studies, among which 65 relevant studies are selected after conducting various screening phases for further review. The application of parallel computing, shared memory, and distributed memory on the existing community detection methodologies have been discussed thoroughly. More specifically,
Search trees are one of the most important and widely used data structures, and parallelization is an effective method to improve their performance. However, many existing parallel search trees incur high synchronizat...
详细信息
ISBN:
(纸本)9781665410168
Search trees are one of the most important and widely used data structures, and parallelization is an effective method to improve their performance. However, many existing parallel search trees incur high synchronization costs and low memory I/O efficiency, which limits their performance. We propose PPBT, a batched parallel search tree which minimizes synchronization by partitioning the tree using novel algorithms and minimizing I/O cost using buffering. We give a new sequential algorithm for batch processing on search trees with optimal I/O efficiency for insert and delete operations, and also present a fast parallel algorithm for joining disjoint search trees. We show experimentally that PPBT is over 6x faster than the state-of-the-art parallel tree in [1] and over 40x faster than the concurrent search tree in [7], and achieves 21x speedup using 32 threads. PPBT's throughput on searches is lower due to reduced opportunities for buffering, but is still 1.3x that of [1]. In addition, PPBT has good response times for searches, for example completing 100K searches in under 1 ms in a tree with 10M elements.
暂无评论