检索结果-内蒙古大学图书馆

parallel algorithms for Generating Random Networks with Given Degree Sequences

INTERNATIONAL JOURNAL OF parallel PROGRAMMING 2017年第1期45卷 109-127页

作者： Alam, Maksudul Khan, Maleq Virginia Tech Virginia Bioinformat Inst Dept Comp Sci Blacksburg VA 24061 USA Virginia Tech Virginia Bioinformat Inst Network Dynam & Simulat Sci Lab Blacksburg VA 24061 USA

Random networks are widely used for modeling and analyzing complex processes. Many mathematical models have been proposed to capture diverse real-world networks. One of the most important aspects of these models is degree distribution. Chung-Lu (CL) model is a random network model, which can produce networks with any given arbitrary degree distribution. The complex systems we deal with nowadays are growing larger and more diverse than ever. Generating random networks with any given degree distribution consisting of billions of nodes and edges or more has become a necessity, which requires efficient and parallel algorithms. We present an MPI-based distributed memory parallel algorithm for generating massive random networks using CL model, which takes time with high probability and O(n) space per processor, where n, m, and P are the number of nodes, edges and processors, respectively. The time efficiency is achieved by using a novel load-balancing algorithm. Our algorithms scale very well to a large number of processors and can generate massive power-law networks with one billion nodes and 250 billion edges in one minute using 1024 processors.

关键词： Massive Networks parallel algorithms Network Generator

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms for LU decomposition on a shared memory multiprocessor

引用

APPLIED MATHEMATICS AND COMPUTATION 2005年第1期163卷 179-191页

作者： Kaya, D Wright, K Firat Univ Dept Math TR-23119 Elazig Turkey Newcastle Univ Dept Comp Sci Newcastle Upon Tyne NE1 7RU Tyne & Wear England

In this work, we present the numerical results (using C++) obtained from seven different versions of the LU decomposition algorithms. Four of the algorithms use Crout-like reduction and three of the algorithms use Doo... 详细信息

关键词： the LU decomposition method parallel algorithms shared memory multiprocessor

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms FOR MAXIMUM A-POSTERIORI ESTIMATION OF SPIN-DENSITY AND SPIN-SPIN DECAY IN MAGNETIC-RESONANCE-IMAGING

引用

IEEE TRANSACTIONS ON MEDICAL IMAGING 1995年第2期14卷 362-373页

作者： SCHAEWE, TJ MILLER, MI WASHINGTON UNIV MALLINCKRODT INST RADIOLDEPT ELECT ENGNST LOUISMO 63130 WASHINGTON UNIV INST BIOMED COMPST LOUISMO 63130

A maximum a posteriori (MAP) algorithm is presented for the estimation of spin-density and spin-spin decay distributions from frequency and phase-encoded magnetic resonance imaging data. Linear spatial localization gradients are assumed: the y-encode gradient applied during the phase preparation time of duration tau before measurement collection, and the x-encode gradient applied during the full data collection time t greater than or equal to 0, The MRT signal model developed in [22] is used in which a signal resulting from M phase encodes (rows) and N frequency encode dimensions (columns) is modeled as a superposition of MN sine-modulated exponentially decaying sinusoids with unknown spin-density and spin-spin decay parameters, The nonlinear least-squares MAP estimate of the spin density and spin-spin decay distributions solves for the 2MN spin-density and decay parameters minimizing the squared-error between the measured data and the sine-modulated exponentially decay signal model using an iterative expectation-maximization algorithm. A covariance diagonalizing transformation is derived which decouples the joint estimation of MN sinusoids into M separate N sinusoid optimizations, yielding an order of magnitude speed up in convergence, The MAP solutions are demonstrated to deliver a decrease in standard deviation of image parameter estimates on brain phantom data of greater than a factor of two over Fourier-based estimators of the spin density and spin-spin decay distributions. A parallel processor implementation is demonstrated which maps the N sinusoid coupled minimization to separate individual simple minimizations, one for each processor.

关键词： parallel algorithms Maximum a posteriori estimation Frequency estimation Magnetic resonance imaging Phase estimation Phase measurement Time measurement Density measurement Expectation-maximization algorithms Yield estimation

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms for the circuit value update problem

引用

THEORY OF COMPUTING SYSTEMS 1997年第6期30卷 583-597页

作者： Leiserson, CE Randall, KH Laboratory for Computer Science MIT 545 Technology Square Cambridge MA 02139 USA cel@mit.edu randall@theory.lcs.mit.edu US

The circuit value update problem is the problem of updating values in a representation of a combinational circuit when some of the inputs are changed. We assume for simplicity that each combinational element has bounded fan-in and fan-out and can be evaluated in constant time. This problem is easily solved on an ordinary serial computer in O(W + D) time, where W is the number of elements in the altered subcircuit and D is the subcircuit's embedded depth (its depth measured in the original circuit). In this paper we show how to solve the circuit value update problem efficiently on a P-processor parallel computer. We give a straightforward synchronous, parallel algorithm that runs in O(W/P + D1g P) expected time. Our main contribution, however, is an optimistic, asynchronous, parallel algorithm that runs in O(W/P + D + 1g W + 1g P) expected time, where W and D are the size and embedded depth, respectively, of the ''volatile'' subcircuit, the subcircuit of elements that have inputs which either change or glitch as a result of the update. To our knowledge, our analysis provides the first analytical bounds on the running time of an optimistic, asynchronous, parallel algorithm.

关键词： Computer systems parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms for counting and randomly generating integer partitions

引用

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 1996年第1期34卷 29-35页

作者： Sanchis, LA Squire, MB IBM CORP NETWORKING HARDWARE DIVRES TRIANGLE PKNC 27709

This paper presents parallel algorithms for determining the number of partitions of a given integer N, where the partitions may be subject to restrictions, such as being composed of distinct parts, of a given number of parts, and/or of parts belonging to a specified set. We present a series of adaptive algorithms suitable for varying numbers of processors. The fastest of these algorithms computes the number of partitions of n with largest part equal to k, for 1 less than or equal to k less than or equal to n less than or equal to N, in time O(log(2)(N)) using O(N-2/log N) processors. parallel logarithmic time algorithms that generate partitions uniformly at random, using these quantities, are also presented. (C) 1996 Academic Press, Inc.

关键词： Restriction partitions PART Less than or equal to PROCESSOR parallel algorithms Equal

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms development for programmable logic devices

引用

ADVANCES IN ENGINEERING SOFTWARE 2006年第9期37卷 561-582页

作者： Damaj, Issam W. Hariri Canadian Acad Sci & Technol Dept Elect & Comp Engn Chouf 2010 Lebanon

Programmable logic devices (PLDs) continue to grow in size and currently contain several millions of gates. At the same time, research effort is going into higher-level hardware synthesis methodologies for reconfigurable computing that can exploit PLD technology. In this paper, we explore the effectiveness and extend one such formal methodology in the design of massively parallel algorithms. We take a step-wise refinement approach to the development of correct reconfigurable hardware circuits from formal specifications. A functional programming notation is used for specifying algorithms and for reasoning about them. The specifications are realised through the use of a combination of function decomposition strategies, data refinement techniques, and off-the-shelf refinements based upon higher-order functions. The off-the-shelf refinements are inspired by the operators of communicating sequential processes (CS.P) and map easily to programs in Handel-C (a hardware description language). The Handel-C descriptions are directly compiled into reconfigurable hardware. The practical realisation of this methodology is evidenced by a case studying the matrix multiplication algorithm as it is relatively simple and well known. In this paper, we obtain several hardware implementations with different performance characteristics by applying different refinements to the algorithm. The developed designs are compiled and tested under Celoxica's RC-1000 reconfigurable computer with its 2 million gates Virtex-E FPGA. Performance analysis and evaluation of these implementations are included. (C) 2006 Elsevier Ltd. All rights reserved.

关键词： formal models gate array methodologies parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms development for programmable devices with application from cryptography

引用

INTERNATIONAL JOURNAL OF parallel PROGRAMMING 2007年第6期35卷 529-572页

作者： Damaj, Issam W. Dhofar Univ Salalah Oman

Reconfigurable devices, such as Field Programmable Gate Arrays (FPGAs), have been witnessing a considerable increase in density. State-of-the-art FPGAs are complex hybrid devices that contain up to several millions of gates. Recently, research effort has been going into higher-level parallelization and hardware synthesis methodologies that can exploit such a programmable technology. In this paper, we explore the effectiveness of one such formal methodology in the design of parallel versions of the Serpent cryptographic algorithm. The suggested methodology adopts a functional programming notation for specifying algorithms and for reasoning about them. The specifications are realized through the use of a combination of function decomposition strategies, data refinement techniques, and off-the-shelf refinements based upon higher-order functions. The refinements are inspired by the operators of Communicating Sequential Processes and map easily to programs in Handel-C (a hardware description language). In the presented research, we obtain several parallel Serpent implementations with different performance characteristics. The developed designs are tested under Celoxica's RC-1000 reconfigurable computer with its two million gates Virtex-E FPGA. Performance analysis and evaluation of these implementations are included.

关键词： parallel algorithms methodologies data encryption formal models gate array

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms for the Hamiltonian cycle and Hamiltonian path problems in semicomplete bipartite digraphs

引用

ALGORITHMICA 1997年第1期17卷 67-87页

作者： BangJensen, J ElHaddad, M Manoussakis, Y Przytycka, TM UNIV PARIS 11 LRIF-91405 ORSAYFRANCE

We give an O (log(4) n)-time O(n(2))-processor CRCW PRAM algorithm to find a hamiltonian cycle in a strong semicomplete bipartite digraph, B, provided that a factor of B (i.e., a collection of vertex disjoint cycles covering the vertex set of B)is computed in a preprocessing step. The factor is found (if it exists) using a bipartite matching algorithm, hence placing the whole algorithm in the class Random-NC. We show that any parallel algorithm which can check the existence of a hamiltonian cycle in a strong semicomplete bipartite digraph in time O(r(n)) using p(n) processors can be used to check the existence of a perfect matching in a bipartite graph in time O(r(n) + n(2)/p(n)) using p(n) processors. Hence, our problem belongs to the class NC if and only if perfect matching in bipartite graphs belongs to NC. We also consider the problem of finding a hamiltonian path in a semicomplete bipartite digraph.

关键词： graph algorithms Hamilton cycle parallel algorithms semicomplete bipartite graphs randomized algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms for the training process of a neural network-based system

引用

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS 2000年第1期14卷 3-25页

作者： Ammar, HH Miao, ZH W Virginia Univ Dept Comp Sci & Elect Engn Morgantown WV 26506 USA

This paper addresses the problem of developing efficient parallel algorithms for the training procedure of a neural network-based Fingerprint Image Comparison (FIC) system. The target architecture is assumed to be a coarse-grain distributed-memory parallel architecture. Two types of parallelism-node parallelism and training set parallelism (TSP)-are investigated. Theoretical analysis and experimental results show that node parallelism has low speedup and poor scalability, while TSP proves to have the best speedup performance. TSP, however, is amenable to a slow convergence rate. To reduce this effect, a modified training set parallel algorithm using weighted contributions of synaptic connections is proposed. Experimental results show that this algorithm provides a fast convergence rate while keeping the best speedup performance obtained. The combination of TSP with node parallelism is also investigated. A good performance is achieved by this approach. This provides better scalability with the trade-off of a slight decrease in speedup. The above algorithms are implemented on a 32-node CM-5.

关键词： Neural networks (Computer science) parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms for force directed scheduling of flattened and hierarchical signal flow graphs

引用

IEEE TRANSACTIONS ON COMPUTERS 1999年第7期48卷 762-768页

作者： Prabhakaran, P Banerjee, P Compaq Comp Corp Shrewsbury MA 01545 USA Northwestern Univ Dept Elect & Comp Engn Ctr Parallel & Distributed Comp Evanston IL 60208 USA

In this paper, we present some novel algorithms for scheduling hierarchical signal flow graphs in the domain of high-level synthesis. With complex chips that need to be designed in the future, it is expected that the runtimes of these scheduling algorithms will be quite large. The key contributions of this paper are as follows: First, we develop a novel extension of the sequential force-directed scheduling algorithm which naturally handles loops and conditionals by coming up with a scheme of scheduling hierarchical signal flow graphs. Second, we develop three new parallel algorithms for the scheduling problem. Our parallel algorithms are portable across a wide range of parallel platforms. We report results on a set of high-level synthesis benchmarks on 8-processor SGI Origin and a 64 processor IBM SP-2. While some parallel algorithms for VLSI CAD reported by earlier researchers have reported a loss of qualities of results, our parallel algorithms produce exactly the same results as the sequential algorithms on which they are based.

关键词： high-level synthesis force-directed scheduling hierarchical graphs parallel algorithms multiprocessors

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：