The work we present in this paper focuses on understanding the propagation of flu-like infectious outbreaks between geographically distant regions due to the movement of people outside their base location. Our approac...
详细信息
The work we present in this paper focuses on understanding the propagation of flu-like infectious outbreaks between geographically distant regions due to the movement of people outside their base location. Our approach incorporates geographic location and a transportation model into our existing region-based, closed-world EpiGraph simulator to model a more realistic movement of the virus between different geographic areas. This paper describes the MPI-based implementation of this simulator, including several optimization techniques such as a novel approach for mapping processes onto available processing elements based on the temporal distribution of process loads. We present an extensive evaluation of EpiGraph in terms of its ability to simulate large-scale scenarios, as well as from a performance perspective. (C) 2014 Elsevier B.V. All rights reserved.
The aim of work is developing the technology representing a complex approach for studding geophysical objects with complex subsurface geometry on the basis of numerical modeling of seismic filed from point sources. An...
详细信息
ISBN:
(纸本)9781509040704
The aim of work is developing the technology representing a complex approach for studding geophysical objects with complex subsurface geometry on the basis of numerical modeling of seismic filed from point sources. An important stage of successful solution of dynamic problem of the theory of elasticity is to develop the model representing the object under study in details and carrying out a series of calculations of elastic wave propagation in inho-mogeneous media. We present a programs software for solving the forward geophysical tasks using grid methods. Particular attention is paid to the software interface that allows you to carry out the preparation of geophysical models for theoretical experiments. The developing software for simulation is designed for usage on modern high-performance computing systems. Information and analytical set of programs can be used in the interpretation of experimental data, in design and verification of 2D and 2.5D models when compare experimental and theoretical results. Studying the structure of the Baikal rift zone is one of the geophysical tasks where 2D modeling is necessary. This work was partially supported by RFBR grants No. 16-07-01052, 15-31-20150, 15-07-06821, 14-05-00867, 14-07-00312, MES RK 1760/GF4.
In this paper, we study the problem of discovering visual patterns and partial-duplicate images, which is fundamental to visual concept representation and image parsing, but very challenging when the database is extre...
详细信息
In this paper, we study the problem of discovering visual patterns and partial-duplicate images, which is fundamental to visual concept representation and image parsing, but very challenging when the database is extremely large, such as billions of images indexed by a commercial search engine. Although extensive research with sophisticated algorithms has been conducted for either partial-duplicate clustering or visual pattern discovery, most of them can not be easily extended to this scale, since both are clustering problems in nature and require pairwise comparisons. To tackle this computational challenge, we introduce a novel and highly parallelizable framework to discover partial-duplicate images and visual patterns in a unified way in distributed computing systems. We emphasize the nested property of local features, and propose the generalized nested feature (GNF) as a mid-level representation for regions and local patterns. Initial coarse clusters are then discovered by GNFs, upon which - gram GNF is defined to represent co-occurrent visual patterns. After that, efficient merging and refining algorithms are used to get the partial-duplicate clusters, and logical combinations of probabilistic GNF models are leveraged to represent the visual patterns of partially duplicate images. Extensive experiments show the parallelizable property and effectiveness of the algorithms on both partial-duplicate clustering and visual pattern discovery. With 2000 machines, it costs about eight and 400 minutes to process one million and 40 million images respectively, which is quite efficient compared to previous methods.
We study effective parallelization of approximation algorithms for the one-dimensional bin packing problem on a multicore platform. Bin packing is a classic combinatorial optimization problem that aims to pack a given...
详细信息
ISBN:
(纸本)9781509054121
We study effective parallelization of approximation algorithms for the one-dimensional bin packing problem on a multicore platform. Bin packing is a classic combinatorial optimization problem that aims to pack a given sequence of items into a minimum number of equal-sized bins. The problem potentially serves as a model for a wide variety of applications. Examples include: packing data into chunks in a memory hierarchy in a given system to increase application performance, loading vehicles subject to weight limitations, and packing TV commercials into station breaks. Bin packing has long served as a proving ground for the analysis of approximation algorithms and played a crucial role in the development of much of the theory of approximation algorithms. Its parallelization, however, has received comparatively much less attention. In this work, we develop multiple parallel versions of an effective approximation algorithm (First Fit Decreasing) for the problem and investigate the trade-off between solution quality and execution time. We use OpenMP and Cilk Plus as mechanisms for achieving the parallelization. The new parallel algorithms obtain a speedup of more than 10× (on 32 cores) for moderate to large input sequences without sacrificing much on the quality of solution produced by the sequential algorithm - in particular, we see only about 3 to 30% increase in the number of bins compared to the sequential version. In turn, the solution obtained by the sequential First Fit Decreasing algorithm is provably almost optimal (the approximation ratio is less than 1.3).
Due to the increasing complexity of VLSI circuits, power grid simulation has become more and more time-consuming. Hence, there is a need for fast and accurate power grid simulator. In order to perform power grid simul...
详细信息
Due to the increasing complexity of VLSI circuits, power grid simulation has become more and more time-consuming. Hence, there is a need for fast and accurate power grid simulator. In order to perform power grid simulation in a timely manner, parallel algorithms have been developed to accelerate the simulation. In this dissertation, we present parallel algorithms and software for power grid simulation on CPU-GPU platforms. The power grid is divided into disjoint partitions. The partitions are enlarged using Breath First Search (BFS) method. In the partition enlarging process, a portion of edges are ignored to make the matrix factorization light-weight. Solving the enlarged partitions using a direct solver serves as a preconditioner for the Preconditioned Conjugate Gradient (PCG) method that is used to solve the power grid. This work combines the advantages of direct solvers and iterative solvers to obtain an efficient hybrid parallel solver. Two-tier parallelism is harnessed using MPI for partitions and CUDA within each partition. The experiments conducted on supercomputing clusters demonstrate significant speed improvements over a state-of-the-art direct solver in both static and transient analysis.
parallelization of Marchuk's method for solution of inverse problems based on adjoint equations and dual representation of contaminant concentration functional is considered here. There are N individual adjoint eq...
详细信息
ISBN:
(纸本)9783642148217
parallelization of Marchuk's method for solution of inverse problems based on adjoint equations and dual representation of contaminant concentration functional is considered here. There are N individual adjoint equations independently solved at each time step. Such conditions of numerical investigation allow application of high performance computations. For this purpose the following ways of parallelization are used: geometrical decomposition, functional decomposition and combination of geometrical and functional decompositions.
Bioinformatics is a cross subject of biological information processing. DNA sequence splicing is one of its research content. At present, most parallel algorithms are based on the operating environment of MapReduce. T...
详细信息
ISBN:
(纸本)9781509035762
Bioinformatics is a cross subject of biological information processing. DNA sequence splicing is one of its research content. At present, most parallel algorithms are based on the operating environment of MapReduce. There is a complex process for reading and writing to hard disk, which lead to inferiority that the speed of the algorithm will be slow. In this paper, Spark calculation model based on memory is proposed to solve the problem. At the same time, a new method of matching K-2 bit will be also used by us. Results of experiment show that the running environment based on Spark and the method can ensure accuracy of stitching results and make the algorithm more efficient.
Novel interconnect technologies offer solutions to on-chip communication scalability problems. This article outlines the prospects of wireless on-chip communication technologies pointing toward low-latency and energy-...
详细信息
Novel interconnect technologies offer solutions to on-chip communication scalability problems. This article outlines the prospects of wireless on-chip communication technologies pointing toward low-latency and energy-efficient broadcast even in large-scale chip multiprocessors. It also discusses the challenges and potential impact of adopting these technologies as key enablers of unconventional hardware architectures and algorithmic approaches to significantly improve the performance, energy efficiency, scalability, and programmability of many-core chips.
In this article, an efficient parallel algorithm for a hybrid CPU-GPU platform is proposed to enable large-scale molecular dynamics (MD) simulations of the metal solidification process. The results, implemented the pa...
详细信息
ISBN:
(纸本)9781509040940
In this article, an efficient parallel algorithm for a hybrid CPU-GPU platform is proposed to enable large-scale molecular dynamics (MD) simulations of the metal solidification process. The results, implemented the parallel algorithm program on the hybrid CPU-GPU platform shows better performance than the program based on previous algorithms running on the CPU cluster platform. By contrast, the total execution time of the new program has been obviously decreased. Particularly, because of the use of the modified load balancing method, the neighbor list update time is approximately zero. The parallel program based on the CUDA+OpenMP model shows a factor of 6 16-core calculation speedups compared to the parallel program based on the MPI+OpenMP model, and the optimal computational efficiency is achieved in the simulation system including 10,000,000 aluminum atoms. Finally, the good consistency between them verifies the correctness of the algorithm efficiently, by comparison of the theoretical results and experimental results.
Network topologies can have significant effect on the execution costs of parallel algorithms due to inter-processor communication. For particular combinations of computations and network topologies, costly network con...
详细信息
Network topologies can have significant effect on the execution costs of parallel algorithms due to inter-processor communication. For particular combinations of computations and network topologies, costly network contention may inevitably become a bottleneck, even if algorithms are optimally designed so that each processor communicates as little as possible. We obtain novel contention lower bounds that are functions of the network and the computation graph parameters. For several combinations of fundamental computations and common network topologies, our new analysis improves upon previous per-processor lower bounds which only specify the number of words communicated by the busiest individual processor. We consider torus and mesh topologies, universal fat-trees, and hypercubes; algorithms covered include classical matrix multiplication and direct numerical linear algebra, fast matrix multiplication algorithms, programs that reference arrays, N-body computations, and the FFT. For example, we show that fast matrix multiplication algorithms (e.g., Strassen's) running on a 3D torus will suffer from contention bottlenecks. On the other hand, this network is likely sufficient for a classical matrix multiplication algorithm. Our new lower bounds are matched by existing algorithms only in very few cases, leaving many open problems for network and algorithmic design.
暂无评论