检索结果-内蒙古大学图书馆

A parallel FFT-accelerated layered-medium integral-equation solver for electronic packages

INTERNATIONAL JOURNAL OF NUMERICAL MODELLING-ELECTRONIC NETWORKS DEVICES AND FIELDS 2020年第2期33卷

作者： Liu, Chang Aygun, Kemal Yilmaz, Ali E. Univ Texas Austin Dept Elect & Comp Engn Austin TX 78712 USA Intel Corp Chandler AZ 85226 USA

A parallel iterative layered-medium integral-equation solver is presented for fast and scalable network parameter extraction of electronic packages. The solver, which relies on a 2-D fast Fourier transform (FFT)-based algorithm and a sparse preconditioner to reduce computational complexity, is parallelized using three workload decomposition strategies, including a pencil decomposition that increases the scalability of the computationally dominant FFT-based multiplication stage. A set of increasingly difficult benchmark problems, which require network parameter computations for N-trace = 1 to 257 package-scale interconnects, are solved on a petaflop scale computer to quantify the solver's accuracy, efficiency, and scalability. The total serialized computation time is observed to scale asymptotically as Ntrace2.6logNtrace. For the largest problem, using similar to 1.14 million unknowns and 1536 processes, the solver requires a wall-clock time of similar to 0.05 s per iteration, similar to 1 minute per excitation, similar to 9 h per frequency, and similar to 424 hours to extract the 514-port network parameters at 40 sample frequencies between 1 to 40 GHz.

关键词： FFT electronic packages layered medium method of moments (MoM) parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms for Graph Optimization Using Tree Decompositions

Parallel Algorithms for Graph Optimization Using Tree Decomp...

引用

IEEE International Symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Blair D. Sullivan Dinesh Weerapurage Chris Groër Oak Ridge National Laboratory Oak Ridge TN Link Analytics Atlanta GA

Although many NP-hard graph optimization problems can be solved in polynomial time on graphs of bounded tree-width, the adoption of these techniques into mainstream scientific computation has been limited due to the high memory requirements of the dynamic programming tables and excessive runtimes of sequential implementations. This work addresses both challenges by proposing a set of new parallel algorithms for all steps of a tree decomposition-based approach to solve the maximum weighted independent set problem. A hybrid OpenMP/MPI implementation includes a highly scalable parallel dynamic programming algorithm leveraging the MADNESS task based runtime, and computational results demonstrate scaling. This work enables a significant expansion of the scale of graphs on which exact solutions to maximum weighted independent set can be obtained, and forms a framework for solving additional graph optimization problems with similar techniques.

关键词： Dynamic programming Heuristic algorithms Runtime Optimization parallel algorithms Vegetation Memory management

来源：评论

学校读者我要写书评

暂无评论

Asynchronous versions of Jacobi, multigrid, and Chebyshev solvers

Asynchronous versions of Jacobi, multigrid, and Chebyshev so...

引用

作者： Wolfson-Pou, Jordi Georgia Institute of Technology

学位级别：博士

Iterative methods are commonly used for solving large, sparse systems of linear equations on parallel computers. Implementations of parallel iterative solvers contain kernels (e.g., parallel sparse matrix-vector products) in which parallel processes alternate between phases of computation and communication. Standard software packages use synchronous implementations where there are one or more synchronization points per iteration. These synchronization points occur during communication phases where each process sends data to other processes and idles until all data needed for the next iteration is received. Synchronization points scale poorly on massively parallel machines and may become the primary bottleneck for future exascale computers. This calls for research and development of asynchronous iterative methods, which is the subject of this dissertation. In asynchronous iterative methods there are no synchronization points. This means that, after a phase of computation, processes immediately proceed to the next phase of computation using whatever data is currently available. Since the late 1960s, research on asynchronous methods has primarily considered basic fixed-point methods, e.g., Jacobi, where proving asymptotic convergence bounds has been the focus. However, the practical behavior of asynchronous methods is not well understood, and asynchronous versions of certain fast-converging solvers have not been developed. This dissertation focuses on studying the practical behavior of asynchronous Jacobi, developing new communication-avoiding asynchronous iterative solvers, and introducing the first asynchronous versions of multigrid and Chebyshev. To better understand the practical behavior of asynchronous Jacobi, we examine a model of asynchronous Jacobi where communication delays are neglected. We call this model simplified asynchronous Jacobi. Simplified asynchronous Jacobi can be used to model asynchronous Jacobi implemented in shared memory or distributed memo

关键词： Iterative solvers Sparse linear systems Asynchronous methods parallel algorithms Jacobi Gauss-Seidel Southwell Multigrid Chebyshev

来源：评论

学校读者我要写书评

暂无评论

GPU-aided edge computing for processing the k nearest-neighbor query on SSD-resident data

引用

INTERNET OF THINGS 2021年 15卷

作者： Velentzas, Polychronis Vassilakopoulos, Michael Corral, Antonio Univ Thessaly Dept Elect Comp Eng Data Structuring Eng Lab Volos Greece Univ Almeria Dept Informat Almeria Spain

Edge computing aims at improving performance by storing and processing data closer to their source. The k Nearest-Neighbor (k-NN) query is a common spatial query in several applications. For example, this query can be used for distance classification of a group of points against a big reference dataset to derive the dominating feature class. Typically, GPU devices have much larger numbers of processing cores than CPUs and faster device memory than main memory accessed by CPUs, thus, providing higher computing power. However, since device and/or main memory may not be able to host an entire reference dataset, the use of secondary storage is inevitable. Solid State Disks (SSDs) could be used for storing such a dataset. In this paper, we propose an architecture of a distributed edge-computing environment where large-scale processing of the k-NN query can be accomplished by executing an efficient algorithm for processing the k-NN query on its (GPU and SSD enabled) edge nodes. We also propose a new algorithm for this purpose, a GPU-based partitioning algorithm for processing the k-NN query on big reference data stored on SSDs. We implement this algorithm in a GPU-enabled edge-computing device, hosting reference data on an SSD. Using synthetic datasets, we present an extensive experimental performance comparison of the new algorithm against two existing ones (working on memory-resident data) proposed by other researchers and two existing ones (working on SSD-resident data) recently proposed by us. The new algorithm excels in all the conducted experiments and outperforms its competitors. (C) 2021 Elsevier B.V. All rights reserved.

关键词： k Nearest-neighbor query GPU SSD Spatial query parallel algorithms Edge computing

来源：评论

学校读者我要写书评

暂无评论

Lower bounds for parallel and randomized convex optimization

引用

The Journal of Machine Learning Research 2020年第1期21卷 153-183页

作者： Jelena Diakonikolas Cristóbal Guzmán Department of Computer Sciences University of Wisconsin-Madison Madison WI Institute for Mathematical and Computational Engineering Faculty of Mathematics and School of Engineering Millennium Nucleus Center for the Discovery of Structures in Complex Data Pontificia Universidad Católica de Chile Santiago Chile

We study the question of whether parallelization in the exploration of the feasible set can be used to speed up convex optimization, in the local oracle model of computation and in the high-dimensional regime. We show that the answer is negative for both deterministic and randomized algorithms applied to essentially any of the interesting geometries and nonsmooth, weakly-smooth, or smooth objective functions. In particular, we show that it is not possible to obtain a polylogarithmic (in the sequential complexity of the problem) number of parallel rounds with a polynomial (in the dimension) number of queries per round. In the majority of these settings and when the dimension of the space is polynomial in the inverse target accuracy, our lower bounds match the oracle complexity of sequential convex optimization, up to at most a logarithmic factor in the dimension, which makes them (nearly) tight. Another conceptual contribution of our work is in providing a general and streamlined framework for proving lower bounds in the setting of parallel convex optimization. Prior to our work, lower bounds for parallel convex optimization algorithms were only known in a small fraction of the settings considered in this paper, mainly applying to Euclidean (l2) and l∞ spaces.

关键词： lower bounds convex optimization parallel algorithms randomized algorithms non-Euclidean optimization

来源：评论

学校读者我要写书评

暂无评论

Matheuristics and Column Generation for a Basic Technician Routing Problem

引用

algorithms 2021年第11期14卷 313页

作者： Dupin, Nicolas Parize, Remi Talbi, El-Ghazali Univ Paris Saclay Lab Interdisciplinaire Sci Numer LISN F-91405 Orsay France Univ Lille CNRS UMR 9189 CRIStAL Ctr Rech Informat Signal & Automat Lill F-59000 Lille France

This paper considers a variant of the Vehicle Routing Problem with Time Windows, with site dependencies, multiple depots and outsourcing costs. This problem is the basis for many technician routing problems. Having both site-dependency and time window constraints lresults in difficulties in finding feasible solutions and induces highly constrained instances. Matheuristics based on Mixed Integer Linear Programming compact formulations are firstly designed. Column Generation matheuristics are then described by using previous matheuristics and machine learning techniques to stabilize and speed up the convergence of the Column Generation algorithm. The computational experiments are analyzed on public instances with graduated difficulties in order to analyze the accuracy of algorithms for ensuring feasibility and the quality of solutions for weakly to highly constrained instances. The results emphasize the interest of the multiple types of hybridization between mathematical programming, machine learning and heuristics inside the Column Generation framework. This work offers perspectives for many extensions of technician routing problems.

关键词： optimization operations research mathematical programming mixed integer linear programming Dantzig-Wolfe decomposition column generation matheuristics hybrid heuristics parallel algorithms workforce scheduling and routing vehicle routing problems

来源：评论

学校读者我要写书评

暂无评论

parallelization of Data Buffering and Processing Mechanism in Mesh Wireless Sensor Network for IoT Applications 6th

Parallelization of Data Buffering and Processing Mechanism i...

引用

6th International Conference on Advanced Computing, Networking, and Informatics, ICACNI 2018

作者： Jain, Monika Saxena, Rahul Jaidka, Siddharth Jhamb, Mayank Kumar School of Computing and Information Technology Manipal University Jaipur Jaipur India

ISBN: (纸本)9789811396793

IoT, being a field of great interest and importance for the coming generations, involves certain challenging and improving aspects for the IoT application developers and researchers to work upon. A wireless sensor mesh networking has emerged as an attractive option for wide range of low-power IoT applications. This paper shows that how the data can be stored, read and processed parallelly by the parent node in the cluster from multiple sensor nodes, thus reducing the response time drastically. The use of parallelized algorithm for the communication protocol optimized using OpenMP standards for multi-core architecture between the sensors and parent node enables multiple radio technologies to be used for an application which could not be more than one in case of serial processing. The proposed algorithm has been tested for a wireless network application measuring temperature and moisture concentrations using numerous sensors for which the response time is recorded to be less than 10 ms. The paper also discusses in detail the hardware configurations for the application tested along with the results throwing light on the parallel mechanism for buffering and processing the messages. Finally, the paper is concluded by claiming the edge of parallel algorithm-based routing protocol over the serial in the light of graphical results and analysis. © 2020, Springer Nature Singapore Pte Ltd.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Using supercomputer technologies to research the influence of abiotic factors on the biogeochemical cycle variability in the azov sea 14th

Using supercomputer technologies to research the influence o...

引用

14th International Scientific Conference on parallel Computational Technologies, PCT 2020

作者： Sukhinov, Alexander I. Belova, Yulia V. Chistyakov, Alexander E. Filina, Alena A. Litvinov, Vladimir N. Nikitina, Alla V. Leontyev, Anton L. Don State Technical University Rostov-on-Don Russia Supercomputers and Neurocomputers Research Center Taganrog Russia Azov-Black Sea Engineering Institute of Don State Agrarian University Zernograd Russia Southern Federal University Rostov-on-Don Russia Science and Technology University "Sirius" Sochi Russia

ISBN: (纸本)9783030553258

The paper covers the mathematical modelling of the main biogenic matter transformations in the production-destruction processes of phytoplankton populations in the Azov Sea, taking into account the influence of external factors, including the salinity and temperature. A multi-species mathematical model of the phytoplankton population dynamics taking into account the transport and transformation of nutrients was developed and researched. A numerical algorithm of salinity and temperature field restoration in shallow water for the case of the Azov Sea was offered to simulate biogeochemical cycles of the main pollutants, including nitrogen, phosphorus and silicon. A discrete analogue of the proposed water ecology model problem was developed based on the schemes of the increased accuracy taking into account the partial filling of the computational cells. A parallel algorithm adapted for hybrid computing systems using the NVIDIA CUDA architecture was developed for the numerical implementation of the proposed interrelated mathematical models of biological kinetics. © Springer Nature Switzerland AG 2020.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Improved Work Span Tradeoff for Single Source Reachability and Approximate Shortest Paths 20

Improved Work Span Tradeoff for Single Source Reachability a...

引用

32nd ACM Symposium on parallelism in algorithms and Architectures, SPAA 2020

作者： Cao, Nairen Fineman, Jeremy T. Russell, Katina Georgetown University WashingtonDC United States

This brief announcement presents parallel algorithms with a tradeoff between work and span for single source reachability and approximate shortest paths on directed graphs. Both algorithms have ∼O(mρ2 + nρ4) work a... 详细信息

ISBN: (纸本)9781450369350

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallelization of the method of simulated annealing when solving multicriteria optimization problems 2

Parallelization of the method of simulated annealing when so...

引用

2nd International Workshop on Control, Optimisation and Analytical Processing of Social Networks, COAPSN 2020

作者： Mochurad, Lesia Boyko, Nataliya Sheketa, Vasyl Department of Artificial Intelligent Systems Lviv Polytechnic National University 12 S. Bandery str. Lviv79000 Ukraine Ivano-Frankivsk National Technical University of Oil and Gas Ukraine

In the analysis of methods of multicriteria optimization. The detailed implementation of the parallel algorithm of the simulated annealing method is reproduced by the example of the extension of a large-scale travelling salesman problems. For this purpose are used such properties as multithreading and multicore of modern computer systems. An application software system was developed. We conducted a number of experimental studies. Adhering to the results that indicate that more computational process optimization is available that is at the optimal gap of the multicriteria optimization problem, the large rate for probable variations are parallel threads and computer cores. Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：