检索结果-内蒙古大学图书馆

Speeding Localization of Pulsed Signal Transitions Using Multicore Processors

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT 2011年第5期60卷 1588-1593页

作者： Barford, Lee Agilent Technol Measurement Res Lab Reno NV 89503 USA

Microprocessor clock rates-which for three decades doubled about every 18 months-have essentially stopped increasing. Instead, the number of processor cores (identical processing units capable of all usual microprocessor functions) in a microprocessor is increasing exponentially with time. In order to increase performance as the number of cores increase, a measurement analysis software will have to take advantage of this parallelism. The objectives of this paper are to study one example of a measurement analysis having serial dependencies among the input data and to show that there is a practical parallel algorithm despite the data dependencies within the measured time series. The measurement analysis studied is transition localization in digital signals. A parallel scan-type algorithm is presented. The results of applying the parallel algorithm on both synthetic data and actual measured data are presented, and the speedup obtained on a twenty-four core computer analyzed. The parallel method produces exactly the same measurement results, bit for bit, as the original serial method. It is argued that what is desired for this and many other measurement processing algorithms is scalability in throughput with number of cores. Such scalability is achieved by the proposed algorithm, with throughput up to about a dozen cores.

关键词： parallel algorithms parallel programming pulse measurements signal analysis timing jitter

来源：评论

学校读者我要写书评

暂无评论

A Novel Low-Complexity and parallel Algorithm for DCT IV Transform and Its GPU Implementation

引用

APPLIED SCIENCES-BASEL 2024年第17期14卷 7491页

作者： Chiper, Doru Florin Dobrea, Dan Marius Gheorghe Asachi Tech Univ Fac Elect Telecommun & Informat Technol Iasi 700506 Romania Tech Sci Acad Romania ASTR Iasi 700050 Romania Acad Romanian Scientists AOSR Bucharest 030167 Romania

This study proposes a novel factorization method for the DCT IV algorithm that allows for breaking it into four or eight sections that can be run in parallel. Moreover, the arithmetic complexity has been significantly reduced. Based on the proposed new algorithm for DCT IV, the speed performance has been improved substantially. The performance of this algorithm was verified using two different GPU systems produced by the NVIDIA company. The experimental results show that the novel proposed DCT algorithm achieves an impressive reduction in the total processing time. The proposed method is very efficient, improving the algorithm speed by more than 4-times-that was expected by segmenting the DCT algorithm into four sections running in parallel. The speed improvements are about five-times higher-at least 5.41 on Jetson AGX Xavier, and 10.11 on Jetson Orin Nano-if we compare with the classical implementation (based on a sequential approach) of DCT IV. Using a parallel formulation with eight sections running in parallel, the improvement in speed performance is even higher, at least 8.08-times on Jetson AGX Xavier and 11.81-times on Jetson Orin Nano.

关键词： parallel algorithms discrete trigonometric transforms DCT-IV GPU

来源：评论

学校读者我要写书评

暂无评论

Implementation of a finite element and absorbing boundary conditions package on a par allel shared memory computer

引用

IEEE TRANSACTIONS ON MAGNETICS 1998年第5期34卷 3343-3346页

作者： Vollaire, C Nicolas, L Ecole Cent Lyon CEGELY UPRESA CNRS 5005 F-69131 Ecully France

A nodal-based finite element formulation coupled with absorbing boundary conditions has been developed to solve open boundary microwave problems, Only parallel computation enables to modelize large devices. We show in this paper how the code has been implemented on a parallel shared memory computer. Each step of the code is analyzed. Two types of storage for the matrix and two preconditioning methods for the conjugate gradient algorithm are particularly compared.

关键词： finite element methods parallel algorithms shared memory systems electromagnetic scattering

来源：评论

学校读者我要写书评

暂无评论

PLASMA PARTICLE SIMULATIONS ON THE MARK-III HYPERCUBE

引用

MATHEMATICAL AND COMPUTER MODELLING 1988年第C期11卷 53-54页

作者： LIEWER, PC DECYK, VK DAWSON, JD FOX, GC CALTECH JET PROP LABPASADENACA 91109 UNIV CALIF LOS ANGELES DEPT PHYSLOS ANGELESCA 90024 CALTECH DIV PHYS MATH & ASTROPASADENACA 91125

Plasma particle simulations are used extensively for the study of nonlinear phenomena in both space and laboratory plasmas. Here, a well-benchmarked plasma simulation code has been implemented on the 32-node JPL Mark III hypercube to study the applicability of parallel architecture to particle simulation models. In the sequential version of the code, about 90% of the computation time is spent updating the particle positions and velocities. When implemented in parallel on the Mark III Hypercube, this part of the code was sped up by a factor of about 27 (83% efficiency). Computation times on the Mark III have also been compared with times on a variety of other computers.

关键词： Plasma particle simulation parallel processing parallel algorithms hypercubes

来源：评论

学校读者我要写书评

暂无评论

COARSE MESH PARTITIONING FOR TREE-BASED AMR

引用

SIAM JOURNAL ON SCIENTIFIC COMPUTING 2017年第5期39卷 C364-C392页

作者： Burstedde, Carsten Holke, Johannes Rhein Friedrich Wilhelms Univ Bonn INS D-53115 Bonn Germany Rhein Friedrich Wilhelms Univ Bonn HCM D-53115 Bonn Germany

In tree-based adaptive mesh refinement, elements are partitioned between processes using a space-filling curve. The curve establishes an ordering between all elements that derive from the same root element, the tree. When representing complex geometries by connecting several trees, the roots of these trees form an unstructured coarse mesh. We present an algorithm to partition the elements of the coarse mesh such that (a) the fine mesh can be load-balanced to equal element counts per process regardless of the element-to-tree map, and (b) each process that holds fine mesh elements has access to the meta data of all relevant trees. As an additional feature, the algorithm partitions the meta data of relevant ghost (halo) trees as well. We develop in detail how each process computes the communication pattern for the partition routine without handshaking and with minimal data movement. We demonstrate the scalability of this approach on up to 917e3 MPI ranks and 371e9 coarse mesh elements, measuring run times of one second or less.

关键词： adaptive mesh refinement coarse mesh mesh partitioning parallel algorithms forest of octrees high-performance computing

来源：评论

学校读者我要写书评

暂无评论

Fast image recovery using dynamic load balancing in parallel architectures, by means of incomplete projections

引用

IEEE TRANSACTIONS ON IMAGE PROCESSING 2001年第4期10卷 493-499页

作者： González-Castaño, FJ García-Palomares, UM Alba-Castro, JL Pousada-Carballo, JM ETSI Telecomun Dept Tecnol Comunicac Vigo 36200 Spain Univ Simon Bolivar Dept Proc & Sistemas Caracas 1080A Venezuela

This paper formulates an incomplete projection algorithm that is applied to the image recovery problem. The algorithm allows an easy implementation of dynamic load balancing for parallel architectures. Furthermore, the local computation - communication load ratio can be adjusted, since each processor performs a finite number of iterations of any projection-type technique, and this number can be provided as a parameter of the algorithm. Numerical results compare favorably with those obtained by the extrapolated method of parallel subgradient projections.

关键词： load balancing parallel algorithms recovery restoration

来源：评论

学校读者我要写书评

暂无评论

A fine-grained loop-level parallel approach to efficient fuzzy community detection in complex networks

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2020年第5期32卷 e5537-e5537页

作者： Munoz-Caro, Camelia Nino, Alfonso Reyes, Sebastian Univ Castilla La Mancha Escuela Super Informat Paseo Univ 4 Ciudad Real 13004 Spain

Determining the inner organizational structure of sets of networked elements is of paramount importance to analyze real-world systems such as social, biological, or economic networks. To such an end, it is necessary to identify communities of interrelated nodes within the networks. Recently, a fuzzy community detection approach based on the minimization of a topological error functional has been proposed in the form of a gradient-based algorithm design pattern. However, the intrinsic quadratic algorithmic complexity of the procedure limits the problem size that can be efficiently treated. Here, we extend the ability of this approach to analyze larger networks resorting to parallelism. Thus, we identify the concurrency sources in the gradient-based algorithm design pattern. To determine the parallelization limits, we develop a two-dimensional performance model as a function of the number of processors and network size. The model permits to compute the maximum possible speedup. Another model is presented to find the maximum problem size tractable in a given amount of time. Application of the previous models to a set of benchmark networks shows that parallelization enhances the proposed fuzzy community detection approach in more than an order of magnitude. This allows treatment of networks with several hundred thousand nodes in a time frame of hours.

关键词： complex networks fuzzy communities machine learning parallel algorithms performance model

来源：评论

学校读者我要写书评

暂无评论

The parallel downhill simplex algorithm for unconstrained optimisation

引用

CONCURRENCY-PRACTICE AND EXPERIENCE 1998年第2期10卷 121-137页

作者： Coetzee, L Botha, EC Univ Pretoria Dept Elect & Elect Engn ZA-0002 Pretoria South Africa

In this paper we present a parallel implementation of a well-known heuristic optimisation algorithm (the downhill simplex algorithm developed by Nelder and Mead in 1965) which is well suited for unconstrained optimisation, We present the sequential algorithm as well as the parallel algorithm which we used to generate numerical results. They include numerical results of experiments on neural networks and a test suite of functions which demonstrate the parallel algorithm's increased robustness and convergence rate for high-dimensional problems compared to the sequential algorithm. (C) 1998 John Wiley & Sons, Ltd.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Editorial: Celebration of the 25th Volume

引用

INTERNATIONAL JOURNAL OF parallel EMERGENT AND DISTRIBUTED SYSTEMS 2010年第1期25卷 1-2页

作者： Stojmenovic, Ivan

International Journal of parallel, Emergent and Distributed Systems is celebrating its 25th volume. IJPEDS is the continuation of the journal parallel algorithms and Applications which existed from 1993 to 2004. parallel algorithms and Applications was founded by the late David J. Evans, who served as Editor-in-Chief until 1996. Graham Megson (his former student) served as Editor-in-Chief of PAA from 1996 to 2004. They deserve credit for founding it and for their excellent stewardship of the journal in their roles. I was pleased to serve as Associate Editor/EIC of PAA from the beginning in 1992 until 2004 and Regional Editor thereafter. From 2005 (starting with volume 20), the journal was renamed International Journal of parallel, Emergent and Distributed Systems, expanding its scope (the new scope includes the areas of emergent and distributed systems, algorithms, architectures and applications), and I took over as Editor-in-Chief. It is my honour and pleasure to serve this journal since it was established and to lead it during the last six years.

关键词： journals parallel algorithms Volume Editors pancreatic acinar atrophy (PAA) Pride iodine chiefs Distributed Systems Happiness

来源：评论

学校读者我要写书评

暂无评论

PPB-MCTS: A novel distributed-memory parallel partial-backpropagation Monte Carlo tree search algorithm

引用

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2024年 193卷

作者： Naderzadeh, Yashar Grosu, Daniel Chinnam, Ratna Babu Wayne State Univ Dept Comp Sci 5057 Woodward Ave Detroit MI 48202 USA Wayne State Univ Dept Ind & Syst Engn 4815 Fourth St Detroit MI 48202 USA

Monte -Carlo Tree Search (MCTS) is an adaptive and heuristic tree -search algorithm designed to uncover sub -optimal actions at each decision -making point. This method progressively constructs a search tree by gathering samples throughout its execution. Predominantly applied within the realm of gaming, MCTS has exhibited exceptional achievements. Additionally, it has displayed promising outcomes when employed to solve NP -hard combinatorial optimization problems. MCTS has been adapted for distributed -memory parallel platforms. The primary challenges associated with distributed -memory parallel MCTS are the substantial communication overhead and the necessity to balance the computational load among various processes. In this work, we introduce a novel distributed -memory parallel MCTS algorithm with partial backpropagations, referred to as parallel Partial-Backpropagation MCTS (PPB-MCTS). Our design approach aims to significantly reduce the communication overhead while maintaining, or even slightly improving, the performance in the context of combinatorial optimization problems. To address the communication overhead challenge, we propose a strategy involving transmitting an additional backpropagation message. This strategy avoids attaching an information table to the communication messages exchanged by the processes, thus reducing the communication overhead. Furthermore, this approach contributes to enhancing the decision -making accuracy during the selection phase. The load balancing issue is also effectively addressed by implementing a shared transposition table among the parallel processes. Furthermore, we introduce two primary methods for managing duplicate states within distributed -memory parallel MCTS, drawing upon techniques utilized in addressing duplicate states within sequential MCTS. Duplicate states can transform the conventional search tree into a Directed Acyclic Graph (DAG). To evaluate the performance of our proposed parallel algorithm, we conduct

关键词： parallel algorithms Monte Carlo tree search Job shop scheduling

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：