This work considers a parallel algorithm for solving multiextremal problems with non-convex constraints. The distinctive feature of this algorithm, which does not use penalty functions, is the separate consideration o...
详细信息
ISBN:
(纸本)9783319629322;9783319629315
This work considers a parallel algorithm for solving multiextremal problems with non-convex constraints. The distinctive feature of this algorithm, which does not use penalty functions, is the separate consideration of each problem constraint. The search process can be conducted by reducing the original multidimensional problem to a number of related one-dimensional problems and solving this set of problems in parallel. An experimental assessment of parallel algorithm efficiency was conducted by finding the numeric solution to several hundred randomly generated multidimensional multiextremal problems with non-convex constraints.
作者:
Omrani, RezaKattan, LinaCIMA
3027 Harvester RdSuite 400 Burlington ON L7N 3G7 Canada Univ Calgary
Transportat Syst Optimizat 2500 Univ Dr NW Calgary AB T2N 1N4 Canada
This paper is aimed at developing an optimization framework for the concurrent calibration of demand and supply parameters in a dynamic traffic assignment (DTA) model. The proposed approach calibrates route choice, al...
详细信息
This paper is aimed at developing an optimization framework for the concurrent calibration of demand and supply parameters in a dynamic traffic assignment (DTA) model. The proposed approach calibrates route choice, along with drivers' behavioral parameters, and estimates origin-destination (OD) flows in a large-scale network in a Paramics microscopic traffic simulation model. A mathematical formulation is defined to quantify the reliability of the observations. A genetic algorithm (GA) is selected as a suitable solution algorithm for the resulting nonlinear stochastic optimization problem. The application of the proposed methodology is implemented in the large-scale network in the business district core of downtown Toronto, Ontario, Canada. For this network, the emerging traffic surveillance data from in-vehicle navigation system technology provide an enriched source of disaggregated speed data. The empirical results from various experiments support the hypothesis that incorporating in-vehicle navigation system speed data can improve the calibration accuracy and minimize the reliance of the calibration process on a priori OD flows. The quality of the solution and convergence speed of a GA is further enhanced by dividing the GA population into multiple demes and running the GA on a high-performance computing cluster (HPCC) with multiple processors (i.e.,parallel distributed GA, PDGA). In addition, this research takes a further step toward analyzing the temporal variations of the driving behavior of travelers. The case study establishes an example for modelers and practitioners who are interested in calibrating a large-scale traffic simulation model. The developed simulation model for traffic has the potential to serve as a test bed on a HPCC for more efficient computation and integration with other optimization tools such as GAs. (C) 2017 American Society of Civil Engineers.
Many large-scale scientific computations require eigenvalue solvers in a scaling regime where efficiency is limited by data movement. We introduce a parallel algorithm for computing the eigenvalues of a dense symmetri...
详细信息
ISBN:
(纸本)9781450345934
Many large-scale scientific computations require eigenvalue solvers in a scaling regime where efficiency is limited by data movement. We introduce a parallel algorithm for computing the eigenvalues of a dense symmetric matrix, which performs asymptotically less communication than previously known approaches. We provide analysis in the Bulk Synchronous parallel (BSP) model with additional consideration for communication between a local memory and cache. Given sufficient memory to store c copies of the symmetric matrix, our algorithm requires Theta(root c) less interprocessor communication than previously known algorithms, for any c <= p(1/3) when using p processors. The algorithm first reduces the dense symmetric matrix to a banded matrix with the same eigenvalues. Subsequently, the algorithm employs successive reduction to O(log p) thinner banded matrices. We employ two new parallel algorithms that achieve lower communication costs for the full-to-band and band-to-band reductions. Both of these algorithms leverage a novel QR factorization algorithm for rectangular matrices.
The complexity of the 3D buildings and road networks gives the simulation of urban noise difficulty and significance. To solve the problem of computing complexity, a systematic methodology for computing urban traffic ...
详细信息
The complexity of the 3D buildings and road networks gives the simulation of urban noise difficulty and significance. To solve the problem of computing complexity, a systematic methodology for computing urban traffic noise maps under 3D complex building environments is presented on a supercomputer. A parallel algorithm focused on controlling the compute nodes of the supercomputer is designed. Moreover, a rendering method is provided to visualize the noise map. In addition, a strategy for obtaining a real-time dynamic noise map is elaborated. Two efficiency experiments are implemented. One experiment involves comparing the expansibility of the parallel algorithm with various numbers of compute nodes and various computing scales to determine the expansibility. With an increase in the number of compute nodes, the computing time increases linearly, and an increased computing scale leads to computing efficiency increases. The other experiment is a comparison of the computing speed between a supercomputer and a normal computer;the computing node of Tianhe-2 is found to be six tones faster than that of a normal computer. Finally, the traffic noise suppression effect of buildings is analyzed. It is found that the building groups have obvious shielding effect on traffic noise.
Kernel matrices appear in machine learning and non-parametric statistics. Given N points in d dimensions and a kernel function that requires O(d) work to evaluate, we present an O(dN logN)-work algorithm for the appro...
详细信息
ISBN:
(纸本)9781538639146
Kernel matrices appear in machine learning and non-parametric statistics. Given N points in d dimensions and a kernel function that requires O(d) work to evaluate, we present an O(dN logN)-work algorithm for the approximate factorization of a regularized kernel matrix, a common computational bottleneck in the training phase of a learning task. With this factorization, solving a linear system with a kernel matrix can be done with O(N logN) work. Our algorithm only requires kernel evaluations and does not require that the kernel matrix admits an efficient global low rank approximation. Instead, our factorization only assumes low-rank properties for the off-diagonal blocks under an appropriate row and column ordering. We also present a hybrid method that, when the factorization is prohibitively expensive, combines a partial factorization with iterative methods. As a highlight, we are able to approximately factorize a dense 11M x11M kernel matrix in 2 minutes on 3,072 x86 "Haswell" cores and a 4.5Mx4.5M matrix in 1 minute using 4,352 "Knights Landing" cores.
In this paper we introduce efficient parallel algorithms for finding the girth in a graph or digraph, where girth is the length of a shortest cycle. We empirically compare our algorithms by using two common APIs for p...
详细信息
In this paper we consider the problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles. This is a common operation that arises in many agent-based simulation studies, and is of c...
详细信息
ISBN:
(纸本)9781538640289
In this paper we consider the problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles. This is a common operation that arises in many agent-based simulation studies, and is of central importance in the context of High Level Architecture (HLA), where it is at the core of the Data Distribution Management (DDM) service. Several realizations of the DDM service have been proposed;however, many of them are either inefficient or inherently sequential. We propose a parallel version of the Sort-Based Matching algorithm for shared-memory multiprocessors. SortBased Matching is one of the most efficient serial algorithms for the DDM problem, but is quite difficult to parallelize because of data dependencies. We describe the algorithm and compute its asymptotic running time;we complete the analysis by assessing its performance and scalability through extensive experiments on two commodity multicore systems based on a dual socket Intel Xeon processor, and a single socket Intel Core i7 processor.
Species distribution modeling (SDM) calculates a species' probabilistic distribution by combining Environmental raster layers with species datasets. Such models can help to answer complex questions in Ecology/Biol...
详细信息
ISBN:
(纸本)9781538615621
Species distribution modeling (SDM) calculates a species' probabilistic distribution by combining Environmental raster layers with species datasets. Such models can help to answer complex questions in Ecology/Biology/Health, e.g., by calculating impacts of climate changes in Biodiversity, or the potential for a disease spread (vectors' modeling). Machine learning is largely applied in SDM, being the Genetic Algorithm for Rule-set Production (GARP) one of the most reliable solutions. However, GARP's convergence needs to speedup under certain conditions (high resolution or number of layers), for which this paper proposes P-GARP, a parallel, scalable implementation of GARP. P-GARP was implemented onto a SGI Altix XE 1300 cluster with 2 quad-core processors/node. Preliminary results show an expressive 3.2/node speedup. Premature convergence is not observed in PGARP and its accuracy is very similar to GARP ' s. Effective solutions to improve this speedup in even larger scale are proposed, along with a discussion about P-GARP correctness and efficiency.
Random graphs (or networks) have gained a significant increase of interest due to its popularity in modeling and simulating many complex real-world systems. Degree sequence is one of the most important aspects of thes...
详细信息
ISBN:
(纸本)9781538627150
Random graphs (or networks) have gained a significant increase of interest due to its popularity in modeling and simulating many complex real-world systems. Degree sequence is one of the most important aspects of these systems. Random graphs with a given degree sequence can capture many characteristics like dependent edges and non-binomial degree distribution that are absent in many classical random graph models such as the Erdos-Renyi graph model. In addition, they have important applications in uniform sampling of random graphs, counting the number of graphs having the same degree sequence, as well as in string theory, random matrix theory, and matching theory. In this paper, we present an OpenMP-based shared-memory parallel algorithm for generating a random graph with a prescribed degree sequence, which achieves a speedup of 20.4 with 32 cores. We also present a comparative study of several structural properties of the random graphs generated by our algorithm with that of the real-world graphs and random graphs generated by other popular methods. One of the steps in our parallel algorithm requires checking the Erdos-Gallai characterization, i.e., whether there exists a graph obeying the given degree sequence, in parallel. This paper presents a non-trivial parallel algorithm for checking the Erdos-Gallai characterization, which achieves a speedup of 23 with 32 cores.
A parallel iterative solver for multiport network parameter extraction of large-scale electronic packages is presented. The proposed solver is based on a frequency-domain integral-equation formulation that accounts fo...
详细信息
ISBN:
(纸本)9781467364836
A parallel iterative solver for multiport network parameter extraction of large-scale electronic packages is presented. The proposed solver is based on a frequency-domain integral-equation formulation that accounts for the substrate using planar layered-medium Green's functions, conductor loss/roughness using an impedance boundary condition, and port truncations using a non-radiating lumped port model. The parallel iterative solution is accelerated by a sparse preconditioner and an FFT-based matrix-vector multiplication algorithm. A scalability study demonstrates the solver's suitability for analyzing high-fidelity and large-scale package models.
暂无评论