Monte Carlo simulation method is efficient solution for reliability assessment of complex system with various failure distributions. But modern engineering systems have become more complex and larger in scale, which l...
详细信息
In this paper we propose a new algorithm for solving of challenging adaptive time-dependent problems with Crank-Nicolson kind of time integration in parallel. The new algorithm allows for parallel execution of computa...
详细信息
In this paper we propose a new algorithm for solving of challenging adaptive time-dependent problems with Crank-Nicolson kind of time integration in parallel. The new algorithm allows for parallel execution of computations from different time steps. Time steps are distributed between processors. The number of processors working over consecutive time steps increases with each iteration of the adaptive algorithm. The following time steps utilize the previous time steps's solutions with the same level of accuracy. Our new parallel algorithm is compared with other methods. First, we compare it with a traditional method which performs all the adaptive iterations in the first time step, next it restarts the adaptive iterations in the second time step, and continues, one time step after another. Second, we compare our algorithm with the one that performs all the adaptive iterations in the first time step, and then starts the following time steps with the optimal mesh obtained from the previous iteration. Finally, we compare our algorithm to the one that executes the projection-based interpolation of the material data in the first time step, then it solves the problem over the obtained mesh, and then starts the following time steps with the optimal mesh obtained from the previous iteration. All the mentioned algorithms are tested on the challenging computational problem, which is the solution of the Pennes equation over a human head. The heat source is obtained by approximation of the solution of the Maxwell equation computed over the model human head. From our numerical results it follows that 10 min (600s) of exposure to the cell phone radiation may cause up to 2 degrees C increase of the temperature of the brain in the range close to the cell phone. (C) 2015 Elsevier B.V. All rights reserved.
The paper touches upon the problem of local-best-match time series subsequence similarity search that assumes that a query sequence and a longer time series are given, and the task is to find all the subsequences whos...
详细信息
The paper touches upon the problem of local-best-match time series subsequence similarity search that assumes that a query sequence and a longer time series are given, and the task is to find all the subsequences whose distance from the query is the minimal among their neighboring subsequences whose distance from the query is under specified threshold. The Dynamic Time Warping (DTW) is used as a distance metric, which currently is recognized as the best similarity measure for most time series applications. However, computation of DTW is an expensive operation, in spite of the existing sophisticated software approaches. Existing hardware approaches to DTW computation involve GPU and FPGA architectures and ignore the potential of Intel Many Integrated Core architecture. The paper proposes a parallel algorithm for solving this problem using both the CPU and Intel Xeon Phi many-core coprocessor. The implementation is based on the OpenMP parallel programming technology and offload execution mode, where part of the code and data is transmitted to the coprocessor. The algorithm utilizes a queue of subsequences on the processor side, which are uploaded to the coprocessor for the DTW computations. The results of experiments confirms the effectiveness of the algorithm.
Molecular dynamics (MD) simulations are performed to model an electrospray thruster for the ionic liquid (IL) EMIM-BF4 using an effective-force coarse-grained potential. The MD simulations provide insight into the ato...
详细信息
Molecular dynamics (MD) simulations are performed to model an electrospray thruster for the ionic liquid (IL) EMIM-BF4 using an effective-force coarse-grained potential. The MD simulations provide insight into the atomistic modeling of a capillary-tip-extractor system, the basic elements of an electrospray thruster. A 1-D electric field showed an improvement in the model when compared with the use of a constant electric field. Then, the MD software was coupled to a Poisson solver derived from a particle-in-cell code. A transient 3-D electric field was used at each timestep, taking into account the induced electric field due to space charge repulsion. It was found that the inhomogeneous electric field as well as that of the IL space-charge improved agreement between modeling and experiment. The influence of numerical parameters, such as extraction potential and applied mass flow, was studied. Particular emphasis was put on the importance of parameters relative to the grid used to solve Poisson's equation, such as the grid cell size and the boundary conditions (BCs) in the vicinity of the capillary tip. The BCs were found to have a substantial impact on the potential and electric field.
A new partitioning method, called Wedging Insertion, is proposed for solving large-scale symmetric Traveling Salesman Problem (TSP). The idea of our proposed algorithm is to cut a TSP tour into four segments by nodes&...
详细信息
A new partitioning method, called Wedging Insertion, is proposed for solving large-scale symmetric Traveling Salesman Problem (TSP). The idea of our proposed algorithm is to cut a TSP tour into four segments by nodes' coordinate (not by rectangle, such as Strip, FRP, and Karp). Each node is located in one of their segments, which excludes four particular nodes, and each segment does not twist with other segments. After the partitioning process, this algorithm utilizes traditional construction method, that is, the insertion method, for each segment to improve the quality of tour, and then connects the starting node and the ending node of each segment to obtain the complete tour. In order to test the performance of our proposed algorithm, we conduct the experiments on various TSPLIB instances. The experimental results show that our proposed algorithm in this paper is more efficient for solving large-scale TSPs. Specifically, our approach is able to obviously reduce the time complexity for running the algorithm;meanwhile, it will lose only about 10% of the algorithm's performance.
The Galois system can automatically parallelize irregular algorithms written in a serial programming model and execute them efficiently on nonuniform memory access (NUMA) machines. Experimental results for five comple...
详细信息
The Galois system can automatically parallelize irregular algorithms written in a serial programming model and execute them efficiently on nonuniform memory access (NUMA) machines. Experimental results for five complex irregular algorithms show that the system scales up to 420x on large NUMA systems at 512 threads.
We present a highly parallelizable and flexible computational method to solve high-dimensional stochastic dynamic economic models. Solving such models often requires the use of iterative methods, like time iteration o...
详细信息
We present a highly parallelizable and flexible computational method to solve high-dimensional stochastic dynamic economic models. Solving such models often requires the use of iterative methods, like time iteration or dynamic programming. By exploiting the generic iterative structure of this broad class of economic problems, we propose a parallelization scheme that favors hybrid massively parallel computer architectures. Within a parallel nonlinear time iteration framework, we interpolate policy functions partially on GPUs using an adaptive sparse grid algorithm with piecewise linear hierarchical basis functions. GPUs accelerate this part of the computation one order of magnitude thus reducing overall computation time by 50%. The developments in this paper include the use of a fully adaptive sparse grid algorithm and the use of a mixed MPI-Intel TBB-CUDA/Thrust implementation to improve the interprocess communication strategy on massively parallel architectures. Numerical experiments on "Piz Daint" (Cray XC30) at the Swiss National Supercomputing Centre show that high-dimensional international real business cycle models can be efficiently solved in parallel. To the best of our knowledge, this performance on a massively parallel petascale architecture for such nonlinear high-dimensional economic models has not been possible prior to present work. (C) 2015 Elsevier B.V. All rights reserved.
In this paper, we propose a parallel guided ejection search algorithm to minimize the eet size in the NP-hard pickup and delivery problem with time windows. The parallel processes co-operate periodically to enhance th...
详细信息
The problem of time-domain BEM for the wave equation in acoustics and electromagnetism can be expressed as a sparse linear system composed of multiple interaction/convolution matrices. It can be solved by using sparse...
详细信息
The problem of time-domain BEM for the wave equation in acoustics and electromagnetism can be expressed as a sparse linear system composed of multiple interaction/convolution matrices. It can be solved by using sparse matrix-vector products which are inefficient to achieve high Flop-rate neither on CPUs nor GPUs. In this paper we extend the approach proposed in a previous work [1] in which we re-order the computation to get a special matrix structure with one dense vector per row. This new structure is called a slice matrix and is computed with a custom matrix/vector product operator. In this study, we present an optimized implementation of this operator on Nvidia CPUs based on two blocking strategies. We explain how we can obtain multiple block-values from a slice and how these can be computed efficiently on CPUs since we target heterogeneous nodes composed of CPUs and GPUs. In order to deal with different efficiencies of the processing units we use a greedy heuristic that dynamically balances work among the workers. We demonstrate the performance of our system by studying the quality of the balancing heuristic and the sequential Flop-rate of the blocked implementations. Finally, we validate our implementation with an industrial test case on 8 heterogeneous nodes, each composed of 12 CPUs and 3 GPUs. (C) 2015 Elsevier B.V. All rights reserved.
The article describes the mapping of the algorithm decomposed into functional blocks on a distributed execution environment. In addition, it describes the architecture and implementation of service to perform data min...
详细信息
ISBN:
(纸本)9781509004461
The article describes the mapping of the algorithm decomposed into functional blocks on a distributed execution environment. In addition, it describes the architecture and implementation of service to perform data mining algorithms in that environment. As an example, it describes the implementation and experiments with classification algorithm - 1R.
暂无评论