检索结果-内蒙古大学图书馆

parallel RANDOMIZED TUCKER DECOMPOSITION algorithms

SIAM JOURNAL ON SCIENTIFIC COMPUTING 2024年第2期46卷 A1186-A1213页

作者： Minster, Rachel Li, Zitong Ballard, Grey Wake Wake Forest Univ Winston Salem NC 27109 USA Univ Calif Irvine Irvine CA 92697 USA

The Tucker tensor decomposition is a natural extension of the singular value decomposition (SVD) to multiway data. We propose to accelerate Tucker tensor decomposition algorithms by using randomization and parallelization. We present two algorithms that scale to large data and many processors, significantly reduce both computation and communication cost compared to previous deterministic and randomized approaches, and obtain nearly the same approximation errors. The key idea in our algorithms is to perform randomized sketches with Kronecker-structured random matrices, which reduces computation compared to unstructured matrices and can be implemented using a fundamental tensor computational kernel. We provide probabilistic error analysis of our algorithms and implement a new parallel algorithm for the structured randomized sketch. Our experimental results demonstrate that our combination of randomization and parallelization achieves accurate Tucker decompositions much faster than alternative approaches. We observe up to a 16X speedup over the fastest deterministic parallel implementation on 3D simulation data.

关键词： Key words. Tucker decompositions tensors randomized algorithms parallel algorithms low- rank multilinear algebra

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms for large scale constrained tensor decomposition 40

Parallel algorithms for large scale constrained tensor decom...

引用

40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015

作者： Liavas, Athanasios P. Sidiropoulos, Nicholas D. Department of ECE Technical University of Crete Greece Department of ECE University of Minnesota United States

ISBN: (纸本)9781467369978

Most tensor decomposition algorithms were developed for in-memory computation on a single machine. There are a few recent exceptions that were designed for parallel and distributed computation, but these cannot easily incorporate practically important constraints, such as nonnegativity. A new constrained tensor factorization framework is proposed in this paper, building upon the Alternating Direction method of Multipliers (ADMoM). It is shown that this simplifies computations, bypassing the need to solve constrained optimization problems in each iteration, yielding algorithms that are naturally amenable to parallel implementation. The methodology is exemplified using nonnegativity as a baseline constraint, but the proposed framework can incorporate many other types of constraints. Numerical experiments are encouraging, indicating that ADMoM-based nonnegative tensor factorization (NTF) has high potential as an alternative to state-of-the-art approaches. © 2015 IEEE.

关键词： CANDECOMP constrained optimization nonnegative factorization PARAFAC parallel algorithms Tensors

来源：评论

学校读者我要写书评

暂无评论

High Performance parallel algorithms for the Tucker Decomposition of Sparse Tensors

High Performance Parallel Algorithms for the Tucker Decompos...

引用

International Conference on parallel Processing (ICPP)

作者： Oguz Kaya Bora Uçar INRIA and LIP ENS Lyon Lyon France

We investigate an efficient parallelization of a class of algorithms for the well-known Tucker decomposition of general N-dimensional sparse tensors. The targeted algorithms are iterative and use the alternating least squares method. At each iteration, for each dimension of an N-dimensional input tensor, the following operations are performed: (i) the tensor is multiplied with (N - 1) matrices (TTMc step), (ii) the product is then converted to a matrix, and (iii) a few leading left singular vectors of the resulting matrix are computed (TRSVD step) to update one of the matrices for the next TTMc step. We propose an efficient parallelization of these algorithms for the current parallel platforms with multicore nodes. We discuss a set of preprocessing steps which takes all computational decisions out of the main iteration of the algorithm and provides an intuitive shared-memory parallelism for the TTM and TRSVD steps. We propose a coarse and a fine-grain parallel algorithm in a distributed memory environment, investigate data dependencies, and identify efficient communication schemes. We demonstrate how the computation of singular vectors in the TRSVD step can be carried out efficiently following the TTMc step. Finally, we develop a hybrid MPI-OpenMP implementation of the overall algorithm and report scalability results on up to 4096 cores on 256 nodes of an IBM BlueGene/Q supercomputer.

关键词： Tensile stress Matrix decomposition Sparse matrices parallel algorithms Prediction algorithms Matrix converters

来源：评论

学校读者我要写书评

暂无评论

New sequential and parallel algorithms for computing the β-spectrum

引用

THEORETICAL COMPUTER SCIENCE 2015年 590卷 73-85页

作者： Kowaluk, Miroslaw Majewska, Gabriela Univ Warsaw Inst Informat Warsaw Poland

beta-skeletons, prominent members of the neighborhood graph family, have interesting geometric properties and various applications ranging from geographic networks to archeology. This paper focuses on computing the beta-spectrum, a labeling of the edges of the Delaunay triangulation, DT(V), which makes it possible to quickly find the lune-ased beta-skeleton of V for any query value beta is an element of [1,2]. We consider planar n-point sets V with L-p metric, 1 < p < infinity. We present an O (n log(2) n) time sequential, and an O (log(4) n) time parallel, beta-spectrum labeling. We also show a parallel algorithm, which for a given beta is an element of [1,2] finds the lune-based beta-skeleton in O (log(2) n) time. The parallel algorithms use O(n) processors in the CREW-PRAM model. (C) 2015 Elsevier B.V. All rights reserved.

关键词： Neighborhood graph beta-Skeleton beta-Spectrum Delaunay triangulation parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Efficient parallel algorithms for k-Center Clustering

Efficient Parallel Algorithms for k-Center Clustering

引用

International Conference on parallel Processing (ICPP)

作者： Jessica McClintock Anthony Wirth Department of Computing and Information Systems The University of Melbourne

ISBN: (纸本)9781509028245

The k-center problem is a classic NP-hard clustering question. For contemporary massive data sets, RAM-based algorithms become impractical. Although there exist good algorithms for k-center, they are all inherently sequential. In this paper, we design and implement parallel approximation algorithms for k-center. We observe that Gonzalez's greedy algorithm can be efficiently parallelized in several MapReduce rounds, in practice, we find that two rounds are sufficient, leading to a 4-approximation. In practice, we find this parallel scheme is about 100 times faster than the sequential Gonzalez algorithm, and barely compromises solution quality. We contrast this with an existing parallel algorithm for k-center that offers a 10-approximation. Our analysis reveals that this scheme is often slow, and that its sampling procedure only runs if k is sufficiently small, relative to input size. In practice, it is slightly more effective than Gonzalez's approach, but is slow. To trade off runtime for approximation guarantee, we parameterize this sampling algorithm. We prove a lower bound on the parameter for effectiveness, and find experimentally that with values even lower than the bound, the algorithm is not only faster, but sometimes more effective.

关键词： Approximation algorithms Clustering algorithms Optimized production technology Algorithm design and analysis parallel algorithms Measurement Random access memory

来源：评论

学校读者我要写书评

暂无评论

Model-driven product line engineering for mapping parallel algorithms to parallel computing platforms

Model-driven product line engineering for mapping parallel a...

引用

International Conference on Model-Driven Engineering and Software Development (MODELSWARD)

作者： Ethem Arkin Bedir Tekinerdogan Aselsan A.S. Ankara Turkey Wageningen University Information Technology Wageningen The Netherlands

ISBN: (纸本)9781509058983

Mapping parallel algorithms to parallel computing platforms requires several activities such as the analysis of the parallel algorithm, the definition of the logical configuration of the platform, the mapping of the algorithm to the logical configuration platform and the implementation of the source code. Applying this process from scratch for each parallel algorithm is usually time consuming and cumbersome. Moreover, for large platforms this overall process becomes intractable for the human engineer. To support systematic reuse we propose to adopt a model-driven product line engineering approach for mapping parallel algorithms to parallel computing platforms. Using model-driven transformation patterns we support the generation of logical configurations of the computing platform and the generation of the parallel source code that runs on the parallel computing platform nodes. The overall approach is illustrated for mapping an example parallel algorithm to parallel computing platforms.

关键词： parallel algorithms Computational modeling Requirements engineering Algorithm design and analysis Systematics Automation

来源：评论

学校读者我要写书评

暂无评论

Application of Graph Sparsification in Developing parallel algorithms for Updating Connected Components

Application of Graph Sparsification in Developing Parallel A...

引用

IEEE International Symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Sriram Srinivasan Sanjukta Bhowmick Sajal Das Computer Science Department University of Nebraska-Omaha Computer Science Department Missouri University of Science and Technology

ISBN: (纸本)9781509036837

Analyzing large dynamic networks is an important problem with applications in a wide range of disciplines. A key operation is updating the network properties as its topology changes. In this paper we present graph sparsification as an efficient abstraction for updating the properties of dynamic networks. We demonstrate the applicability of graph sparsification in updating the connected components in random and scale-free networks on shared memory systems. Our results show that the updating is scalable (10X on 16 processors for larger networks). To the best of our knowledge this is the first parallel implementation of graph sparsification. Based on these initial results, we discuss how the current implementation can be further improved and how graph sparsification can be applied to updating other network properties.

关键词： Vegetation Heuristic algorithms Algorithm design and analysis parallel algorithms Indexes Clustering algorithms Computer science

来源：评论

学校读者我要写书评

暂无评论

Scalable parallel algorithms for Shared Nearest Neighbor Clustering

Scalable Parallel Algorithms for Shared Nearest Neighbor Clu...

引用

International Conference on High Performance Computing

作者： Sonal Kumari Saurabh Maurya Poonam Goyal Sundar S Balasubramaniam Navneet Goyal Department of Computer Science & Information Systems Advanced Data Analytics & Parallel Technologies Laboratory INDIA

ISBN: (纸本)9781509054121

Clustering is a popular data mining technique which discovers structure in unlabeled data by grouping objects together on the basis of a similarity criterion. Traditional similarity measures lose their meaning as the number of dimensions increases and as a consequence, distance or density based clustering algorithms become less meaningful. Shared Nearest Neighbor (SNN) is a solution to clustering high-dimensional data with the ability to find clusters of varying density. SNN assigns objects to a cluster, which share a large number of their nearest neighbors. However, SNN is compute and memory intensive for data of large size and/or dimensionality. Nearest neighbor queries are responsible for a major proportion of computations in SNN, resulting in lower efficiency for higher value of number of nearest neighbors (k). The main motivation of this work is to improve the efficiency of SNN and to parallelize it so that it can be used for clustering large high-dimensional datasets and for large values of k. Existing SNN algorithms become inefficient in these situations. In this paper, we present a new sequential SNN algorithm, R-SNN, which uses R-tree for executing neighborhood queries efficiently and exploiting spatial locality to minimize memory usage. R-SNN is benchmarked against the best available implementation of SNN and is found up to 77 times faster when tested on various real datasets. R-SNN is parallelized for distributed memory, shared memory, and hybrid systems. Significant speedup and scalability achieved can be attributed to parallelization and good load balancing strategies and also to exploitation of spatial locality. Experimental results demonstrate the same for datasets of varying dimensionality and size. The maximum speedup achieved for shared, distributed, and hybrid models are 427.19 using 48 threads, 394.24 using 32 processes, and 1380.69 on 32 nodes (with each node spawning 4 threads), respectively. Super-linear speedup for some datasets is attributed

关键词： Clustering algorithms Algorithm design and analysis Time complexity Spatial databases parallel algorithms Memory management Density measurement

来源：评论

学校读者我要写书评

暂无评论

Applying the list method to the transformation of parallel algorithms into account temporal characteristics of operations

Applying the list method to the transformation of parallel a...

引用

International Conference on Soft Computing and Measurements (SCM)

作者： Y. A. Shichkina M. S. Kupriyanov Saint Petersburg Electrotechnical University “LETI” Saint Petersburg Russia

ISBN: (纸本)9781467389204

A program that is working inefficiently leads to inevitable losses in computer performance. These losses should be avoided or at least minimized. In order to do this we need to apply the approved research and development techniques and also equivalent algorithm conversions. In the present paper we suggest a technique of parallel algorithm modification with the aim of improving its efficiency due to balanced load of processors. The technique itself consists in rearranging processes among processors and enlargement of algorithm operations. All information dependencies of an algorithm are preserved and algorithm performance time and the number of processors involved can only be reduced.

关键词： parallel algorithms Program processors Algorithm design and analysis Optimization Computational modeling Merging

来源：评论

学校读者我要写书评

暂无评论

Review of Serial and parallel Min-Cut/Max-Flow algorithms for Computer Vision

引用

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023年第2期45卷 2310-2329页

作者： Jensen, Patrick M. M. Jeppesen, Niels Dahl, Anders B. B. Dahl, Vedrana A. A. Tech Univ Denmark Dept Appl Math & Comp Sci DK-2800 Lyngby Denmark

Minimum cut/maximum flow (min-cut/max-flow) algorithms solve a variety of problems in computer vision and thus significant effort has been put into developing fast min-cut/max-flow algorithms. As a result, it is difficult to choose an ideal algorithm for a given problem. Furthermore, parallel algorithms have not been thoroughly compared. In this paper, we evaluate the state-of-the-art serial and parallel min-cut/max-flow algorithms on the largest set of computer vision problems yet. We focus on generic algorithms, i.e., for unstructured graphs, but also compare with the specialized GridCut implementation. When applicable, GridCut performs best. Otherwise, the two pseudoflow algorithms, Hochbaum pseudoflow and excesses incremental breadth first search, achieves the overall best performance. The most memory efficient implementation tested is the Boykov-Kolmogorov algorithm. Amongst generic parallel algorithms, we find the bottom-up merging approach by Liu and Sun to be best, but no method is dominant. Of the generic parallel methods, only the parallel preflow push-relabel algorithm is able to efficiently scale with many processors across problem sizes, and no generic parallel method consistently outperforms serial algorithms. Finally, we provide and evaluate strategies for algorithm selection to obtain good expected performance. We make our dataset and implementations publicly available for further research.

关键词： Computer vision parallel algorithms Benchmark testing Image segmentation Sun Partitioning algorithms Heuristic algorithms algorithms computer vision graph algorithms graph-theoretic methods parallel algorithms performance evaluation of algorithms and systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：