检索结果-内蒙古大学图书馆

Synthesis and generalization of parallel algorithm for matrix-vector multiplication

IPSJ Transactions on System LSI Design Methodology 2020年 13卷 31-34页

作者： Miyasaka, Yukio Goda, Akihiro Mittal, Ashish Fujita, Masahiro University of Tokyo Bunkyo Tokyo113-0032 Japan Indian Institute of Technology Bombay Powai Mumbai Maharashtra400076 India

Recently, there have been more chances to calculate matrix-vector multiplication due to the growing use of the neural network. We have proposed the method to automatically synthesize the optimum parallel algorithm for the given environment and synthesized an algorithm for matrix-vector multiplication of a specific size matrix with 4 nodes connected in a oneway ring. This paper proposes a method to generalize the synthesized algorithm to deal with any size matrix. We generalized the synthesized algorithm for the 32 × 32 matrix to calculate N × N matrix-vector multiplication. © 2020 Information Processing Society of Japan

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel Global Optimization for Non-convex Mixed-Integer Problems 5th

Parallel Global Optimization for Non-convex Mixed-Integer Pr...

引用

5th Russian Supercomputing Days Conference (RuSCDays)

作者： Barkalov, Konstantin Lebedev, Ilya Lobachevsky State Univ Nizhni Novgorod Nizhnii Novgorod Russia

ISBN: (纸本)9783030365929;9783030365912

The paper considers the mixed-integer global optimization problems. A novel parallel algorithm for solving the problems of this class based on the index algorithm for solving the continuous global optimization problems has been proposed. The comparison of this algorithm with known analogs demonstrates the efficiency of the developed approach. The proposed algorithm allows an efficient parallelization including the employment of the graphics accelerators. The results of performed numerical experiments (solving a series of 100 multiextremal mixedinteger problems) confirm a good speedup of the algorithm with the use of GPU.

关键词： Global optimization Non-convex constraints Mixed-integer problems parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel Subdomain Level DGTD Method with Automatic Load Balancing

Parallel Subdomain Level DGTD Method with Automatic Load Bal...

引用

Photonics and Electromagnetics Research Symposium - Fall (PIERS - Fall)

作者： Ren, Qiang Mi, Jiamei Beihang Univ Sch Elect & Informat Engn Beijing 100191 Peoples R China

ISBN: (纸本)9781728153049

In this paper, a parallel subdomain-level discontinuous Galerkin time domain (DGTD) method based on the Message Passing Interface (MPI) library has been proposed for simulating complex structures or multiscale electromagnetic problems. The efficiency of parallel algorithm is greatly affected by load distribution, so an automatic load balancing strategy is proposed to reduce the load difference between processes in order to show the advantages of parallel algorithms. First, the relationship between the time required to solve system matrix and the degree of freedom (DoF) of the subdomains applying tetrahedral or hexahedral elements is obtained by some numerical experiments. Then the partition is adjusted to achieve load balancing and reduce the data exchange between processes by relationship above, so that the parallel algorithm approaches the linear speedup ratio. Finally, some numerical cases have been simulated to demonstrate the reliability and efficiency of the algorithm.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

CSF: An Efficient parallel Deduplication Algorithm by Clustering Scattered Fingerprints 17

CSF: An Efficient Parallel Deduplication Algorithm by Cluste...

引用

IEEE Int Conf on parallel and Distributed Processing with Applications, Big Data and Cloud Computing, Sustainable Computing and Communications, Social Computing and Networking (ISPA/BDCloud/SocialCom/SustainCom)

作者： Hao, Fan Xu, Guangping Zhang, Yi Yuan, Liming Xue, Yanbing Tianjin Univ Technol Sch Comp Sci & Engn Tianjin Peoples R China Minist Educ Key Lab Comp Vis & Syst Tianjin Peoples R China Tianjin Key Lab Intelligence Comp & Novel Softwar Tianjin Peoples R China

ISBN: (纸本)9781728143286

Deduplication is one of the most effective and efficient techniques to save memory space. It is widely used in data centers and cloud storage systems. Multi-stream concurrency is expected to increase the throughput of deduplication. However, multiple data streams hurt the locality of accessed data and weaken the benefit of data concurrency, which forms a challenge for data deduplication. Usually, the ordered index can reshape the locality of data streams, which can improve the cache hit rate during deduplication. In this paper, we first propose an efficient parallel deduplication algorithm by clustering scattered fingerprints, called CSF, to exploit the data locality as much as possible. It tries to improve the utilization rate of the fingerprint page by the clustered fingerprints. Moreover, it retains the scattered fingerprint to next round fingerprint comparison by re-using the fingerprints on the same page. Thus the number of the fingerprint pages to read is reduced. We further optimize the proposed algorithm by a scheduling strategy, which effectively schedules the task of part streams ahead while ensuring the overall performance. Finally, we evaluated the performance of our algorithm with various data sets in experiments. The experimental results show that our proposed algorithm achieves better performance than the state-of-the-art method.

关键词： Data streams Deduplication parallel algorithms Storage systems

来源：评论

学校读者我要写书评

暂无评论

A parallel MCMC Algorithm for the Balanced Graph Coloring Problem 1

引用

12th IAPR-TC15 Workshop on Graph-Based Representations in Pattern Recognition (GbRPR)

作者： Conte, Donatello Grossi, Giuliano Lanzarotti, Raffaella Lin, Jianyi Petrini, Alessandro Univ Tours Comp Sci Lab LIFAT EA6300 64 Ave Jean Portalis F-37000 Tours France Univ Milan Dipartimento Informat Via Celoria 18 I-20133 Milan Italy Khalifa Univ Sci & Technol Dept Math Al Saada StPOB 127788 Abu Dhabi U Arab Emirates

ISBN: (数字)9783030200817

ISBN: (纸本)9783030200817;9783030200800

In parallel computation domain, graph coloring is widely studied in its own and represents a reference problem for scheduling of parallel tasks. Unfortunately, common graph coloring strategies usually focus on minimizing the number of colors without any concern for the sizes of each color class, thus producing highly skewed color class distributions. However, to guarantee efficiency in parallel computations, but also in other application contexts, it is important to keep the color classes highly balanced in their sizes. In this paper we address this challenging issue for large scale graphs, proposing a fast parallel MCMC heuristic for sparse graphs that randomly generates good balanced colorings provided that a sufficient number of colors are made available. We show its effectiveness through some numerical simulations on random graphs.

关键词： Balanced graph coloring Markov Chain Monte Carlo method Greedy colorer parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A Free Stale Synchronous parallel Strategy for Distributed Machine Learning 2019

A Free Stale Synchronous Parallel Strategy for Distributed M...

引用

International Conference on Big Data Engineering (BDE)

作者： Shi, Hang Zhao, Yue Zhang, Bofeng Yoshigoe, Kenji Vasilakos, Athanasios V. Shanghai Univ Sch Comp Engn & Sci Shanghai Peoples R China Toyo Univ Fac Informat Networking Innovat & Design INIAD Tokyo Japan Lulea Univ Technol Dept Comp Sci Elect & Space Engn Skelleftea Sweden

ISBN: (纸本)9781450360913

With the machine learning applications processing larger and more complex data, people tend to use multiple computing nodes to execute the machine learning tasks in distributed way. However, in real world, people always encounter a problem that a few nodes in system exhibit poor performance and drag down the efficiency of the whole system. In existing parallel strategies such as bulk synchronous parallel and stale synchronous parallel, these nodes with poor performance may not be monitored and found out in time. To address this problem, we proposed a free stale synchronous parallel (FSSP) strategy to free the system from the negative impact of those nodes. Our experimental results on some classical machine leaning algorithms and datasets demonstrated that FSSP strategy outperformed other existing parallel computing strategy.

关键词： Big data distributed computing parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel Subdomain Level DGTD Method with Load Balancing

Parallel Subdomain Level DGTD Method with Load Balancing

引用

USNC-URSI Radio Science Meeting / IEEE International Symposium on Antennas and Propagation (AP-S)

作者： Mi, Jiamei Ren, Qiang Beihang Univ Sch Elect & Informat Engn Beijing 100191 Peoples R China

ISBN: (纸本)9781728106922

In this paper, a parallel subdomain level discontinuous Galerkin time domain (DGTD) method is achieved via Message Passing Interface (MPI) library based on upwind flux and Runge-Kutta explicit time integration. A load balancing scheme is proposed for subdomains with multiple kinds of elements in different orders. Numerical examples demonstrate the effectiveness of the proposed scheme and good speedup is achieved.

关键词： Time-domain analysis Load management parallel algorithms Finite element analysis Finite difference methods Method of moments Cavity resonators

来源：评论

学校读者我要写书评

暂无评论

parallel decompression of gzip-compressed files and random access to DNA sequences 33

Parallel decompression of gzip-compressed files and random a...

引用

33rd IEEE International parallel and Distributed Processing Symposium (IPDPS)

作者： Kerbiriou, Mael Chikhi, Rayan Univ Lille CRIStAL INRIA Lille Nord Europe Lille France Inst Pasteur C3BI USR 3756 Paris France CNRS Paris France

ISBN: (纸本)9781538655559

Decompressing a file made by the gzip program at an arbitrary location is in principle impossible, due to the nature of the DEFLATE compression algorithm. Consequently, no existing program can take advantage of parallelism to rapidly decompress large gzip-compressed files. This is an unsatisfactory bottleneck, especially for the analysis of large sequencing data experiments. Here we propose a parallel algorithm and an implementation, pugz, that performs fast and exact decompression of any text file. We show that pugz is an order of magnitude faster than gunzip, and 5x faster than a highly-optimized sequential implementation (libdeflate). We also study the related problem of random access to compressed data. We give simple models and experimental results that shed light on the structure of gzip-compressed files containing DNA sequences. Preliminary results show that random access to sequences within a gzip-compressed FASTQ file is almost always feasible at low compression levels, yet is approximate at higher compression levels.

关键词： compression bioinformatics DNA sequences parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Conditional Hardness Results for Massively parallel Computation from Distributed Lower Bounds 60

Conditional Hardness Results for Massively Parallel Computat...

引用

60th IEEE Annual Symposium on Foundations of Computer Science (FOCS)

作者： Ghaffari, Mohsen Kuhn, Fabian Uitto, Jara Swiss Fed Inst Technol Dept Comp Sci Zurich Switzerland Univ Freiburg Dept Comp Sci Freiburg Germany Aalto Univ Dept Comp Sci Espoo Finland

ISBN: (纸本)9781728149523

We present the first conditional hardness results for massively parallel algorithms for some central graph problems including (approximating) maximum matching, vertex cover, maximal independent set, and coloring. In some cases, these hardness results match or get close to the state of the art algorithms. Our hardness results are conditioned on a widely believed conjecture in massively parallel computation about the complexity of the connectivity problem. We also note that it is known that an unconditional variant of such hardness results might be somewhat out of reach for now, as it would lead to considerably improved circuit complexity lower bounds and would concretely imply that NC1 is a proper subset of P. We obtain our conditional hardness result via a general method that lifts unconditional lower bounds from the well-studied LOCAL model of distributed computing to the massively parallel computation setting.

关键词： parallel algorithms Algorithm design and analysis

来源：评论

学校读者我要写书评

暂无评论

A novel accelerated implementation of RSA using parallel processing

引用

JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY 2019年第2期22卷 309-322页

作者： Rawat, Abhishek Sehgal, Kartik Tiwari, Amartya Sharma, Abhishek Joshi, Ashish Bharati Vidyapeeths Coll Engn Dept Informat Technol New Delhi 110063 India

Past research has evidently proved that public key cryptosystems are usually slower than symmetric key cryptosystems due to the reason that they use one additional cryptographic key and different methods for encryption and decryption. RSA is one of the most common asymmetric key cryptography algorithms. Recent research has focused on speeding up RSA using various techniques. With the introduction of distributed computing, parallelization of algorithms enables them to run on multiple cores concurrently at a time. RSA consists of two resource intensive operations namely Modular Exponentiation of up to 1024-bit exponents and repeated calculation of Greatest common divisor. Thus, RSA lays the perfect base for application of Montgomery Reduction algorithm to optimize the Repeated Modular multiplication in exponentiation. In this paper we proposed a parallel scheme for RSA using a new parallel data structure known as Concurrent Indexed List of character blocks. The aim of our research was to improve the speed of RSA encryption and decryption using parallelism and also make it compatible with leading industry cryptography standards. We have simulated four different approaches namely both parallel and sequential with and without Montgomery. We have also integrated our parallel paradigm with renowned C++ Crypto library and achieved a speed-up of upto four times than sequential approach. Unlike any other previous approaches, our implementation got easily integrated with any external library and thus can be adopted by any other algorithmic scheme.

关键词： RSA parallel algorithms Montgomery Reduction Hyperthreading

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：