Multigrid methods are among the fastest numerical algorithms for solving large sparse linear systems. the Conjugate Gradient method with Multigrid as a preconditioner (MGCG) features a good convergence even when the M...
详细信息
ISBN:
(纸本)9783642552243
Multigrid methods are among the fastest numerical algorithms for solving large sparse linear systems. the Conjugate Gradient method with Multigrid as a preconditioner (MGCG) features a good convergence even when the Multigrid solver itself is not efficient. the parallel FEM package NuscaS allows us to solve adaptive FEM problems with 3D unstructured meshes on parallel computers such as PC-clusters. the parallel version of the library is based on the geometric decomposition applied for computing nodes of a parallel system;the distributed-memory architecture and message-passing model of parallel programming are assumed. In our previous works, we extend the NuscaS functionality by introducing parallel adaptation of tetrahedral FEM meshes and dynamic load balancing capabilities. In this work we focus on efficient implementation of Geometric Multigrid as a parallel preconditioner for the Conjugate Gradient iterative solver used in the NuscaS package. Based on the geometric decomposition, for each level of Multigrid, meshes are partitioned and assigned to processors of a parallel architecture. Fine-grid levels are constructed by subdivision of mesh elements using the parallel 8-tetrahedra longestedge refinement mesh algorithm, where every process keeps the assigned part of mesh on each level of Multigrid. the efficiency of the proposed implementation is investigated experimentally.
the quality evaluation of random number sequences is a rather complex and resource-expensive process with a peculiar property: there is no finite amount of testing that can ensure perfect randomness. Yet applying mult...
详细信息
ISBN:
(纸本)9781479965694
the quality evaluation of random number sequences is a rather complex and resource-expensive process with a peculiar property: there is no finite amount of testing that can ensure perfect randomness. Yet applying multiple randomness tests, each evaluating the sequence from one specific and significant point of view, is vital for obtaining relevant results which can guide the tester in accepting or rejecting the considered random number sequence. therefore, aiming to satisfy the increasing demand for large volumes of high quality random data, there is a stringent need for high performance and flexible statistical tests in order to provide a more comprehensive assessment. Our work enrolls in this direction and this paper introduces the improved, extended and parallelized Matrix Rank Test (the 5th test of the NIST Statistical Test Suite), describing several enhancement methods. Experimental results prove the significant performance improvement compared to the original version and show the comparative efficiencies of the proposed parallel implementations.
this paper presents a methodological framework for designing testing and measurement systems fully integrated withthe enterprise information system. In comparison withthe most common solutions for designing embedded...
详细信息
ISBN:
(纸本)9781479922802
this paper presents a methodological framework for designing testing and measurement systems fully integrated withthe enterprise information system. In comparison withthe most common solutions for designing embedded testing platforms the proposed framework sets itself at a higher level of abstraction. the proposed framework allows getting different, programmable test benches that can run in parallel, and it does not restrict the choice of hardware, sensors and actuators, as it happens with commercial development systems for the same kind of machines. the framework is conceived to be used on embedded boards equipped withthe GNU/Linux operating system and with at least one network interface. By using open data formats, the framework provides an easy way to exchange data with enterprise information systems, thus assuring interoperability with different IT solutions. the paper includes the description of a cooker hood testing system designed and implemented withthis framework, and which highlights the advantages of the proposed development method.
High Energy Physics code has been known for making poor use of high performance computing architectures. Efforts in optimising HEP code on vector and RISC architectures have yield limited results and recent studies ha...
详细信息
High Energy Physics code has been known for making poor use of high performance computing architectures. Efforts in optimising HEP code on vector and RISC architectures have yield limited results and recent studies have shown that, on modern architectures, it achieves a performance between 10% and 50% of the peak one. Although several successful attempts have been made to port selected codes on GPUs, no major HEP code suite has a "High Performance" implementation. With LHC undergoing a major upgrade and a number of challenging experiments on the drawing board, HEP cannot any longer neglect the less-than-optimal performance of its code and it has to try making the best usage of the hardware. this activity is one of the foci of the SFT group at CERN, which hosts, among others, the Root and Geant4 project. the activity of the experiments is shared and coordinated via a Concurrency Forum, where the experience in optimising HEP code is presented and discussed. Another activity is the Geant-V project, centred on the development of a high-performance prototype for particle transport. Achieving a good concurrency level on the emerging parallelarchitectures without a complete redesign of the framework can only be done by parallelizing at event level, or with a much larger effort at track level. Apart the shareable data structures, this typically implies a multiplication factor in terms of memory consumption compared to the single threaded version, together with sub-optimal handling of event processing tails. Besides this, the low level instruction pipelining of modern processors cannot be used efficiently to speedup the program. We have implemented a framework that allows scheduling vectors of particles to an arbitrary number of computing resources in a fine grain parallel approach. the talk will review the current optimisation activities within the SFT group with a particular emphasis on the development perspectives towards a simulation framework able to profit best from t
the Barnes-Hut algorithm is a widely used approximation method for the N-Body simulation problem. the irregular nature of this tree walking code presents interesting challenges for its computation on parallel systems....
详细信息
Data cleansing is an important process of data mining. It is the key technology for ensuring the quality of the data. Classical data pre-processing technique has limitation in processing massive data with missing info...
详细信息
ISBN:
(纸本)9781479974344
Data cleansing is an important process of data mining. It is the key technology for ensuring the quality of the data. Classical data pre-processing technique has limitation in processing massive data with missing information, and sometimes it can not obtain precise and reasonable results, which leads to low-quality data. To this end, through deep analysis of the classical pre-processing, combining withthe MapReduce programming model, A parallel algorithm for data cleansing in incomplete information systems using MapReduce is put forward to process the massive data with missing information. Finally, the new algorithm is applied to incomplete decision information system, and the analysis results show that the new algorithm is effective.
We present massively parallel high energy electromagnetic particle transportation through a finely segmented detector on a Graphics processing Unit (GPU). Simulating events of energetic particle decay in a general-pur...
详细信息
We present massively parallel high energy electromagnetic particle transportation through a finely segmented detector on a Graphics processing Unit (GPU). Simulating events of energetic particle decay in a general-purpose high energy physics (HEP) detector requires intensive computing resources, due to the complexity of the geometry as well as physics processes applied to particles copiously produced by primary collisions and secondary interactions. the recent advent of hardware architectures of many-core or accelerated processors provides the variety of concurrent programming models applicable not only for the high performance parallel computing, but also for the conventional computing intensive application such as the HEP detector simulation. the components of our prototype are a transportation process under a non-uniform magnetic field, geometry navigation with a set of solid shapes and materials, electromagnetic physics processes for electrons and photons, and an interface to a framework that dispatches bundles of tracks in a highly vectorized manner optimizing for spatial locality and throughput. Core algorithms and methods are excerpted from the Geant4 toolkit, and are modified and optimized for the GPU application. Program kernels written in C/C++ are designed to be compatible with CUDA and OpenCL and withthe aim to be generic enough for easy porting to future programming models and hardware architectures. To improve throughput by overlapping data transfers with kernel execution, multiple CUDA streams are used. Issues with floating point accuracy, random numbers generation, data structure, kernel divergences and register spills are also considered. Performance evaluation for the relative speedup compared to the corresponding sequential execution on CPU is presented as well.
this paper describes an implementation strategy of nonlinear model predictive controller for FPGA systems. A high-level synthesis of a real-time MPC algorithm by means of the MATLAB HDL Coder as well as the Vivado HLS...
详细信息
ISBN:
(纸本)9781479950119
this paper describes an implementation strategy of nonlinear model predictive controller for FPGA systems. A high-level synthesis of a real-time MPC algorithm by means of the MATLAB HDL Coder as well as the Vivado HLS tool is discussed. In order to exploit the parallelprocessing of FPGAs, the included integration schemes are parallelized using a fixed-point iteration approach. the synthesis results are demonstrated for two different example systems.
the proceedings contain 146 papers. the topics discussed include: exploiting data sparsity in parallel matrix powers computations;performance of dense eigensolvers on BlueGene/Q;experiences with a lanczos eigensolver ...
ISBN:
(纸本)9783642551949
the proceedings contain 146 papers. the topics discussed include: exploiting data sparsity in parallel matrix powers computations;performance of dense eigensolvers on BlueGene/Q;experiences with a lanczos eigensolver in high-precision arithmetic;adaptive load balancing for massively parallel multi-level Monte;parallel one-sided Jacobi SVD algorithm with variable blocking factor;an identity parareal method for temporal parallel computations;improving perfect parallelism;methods for high-throughput computation of elementary functions;engineering nonlinear pseudorandom number generators;extending the generalized Fermat prime number search beyond one million digits using GPUS;iterative solution of singular systems with applications;scheduling bag-of-tasks applications to optimize computation time and cost;and scheduling moldable tasks with precedence constraints and arbitrary speedup functions on multiprocessors.
the proceedings contain 146 papers. the topics discussed include: exploiting data sparsity in parallel matrix powers computations;performance of dense eigensolvers on BlueGene/Q;experiences with a lanczos eigensolver ...
ISBN:
(纸本)9783642552236
the proceedings contain 146 papers. the topics discussed include: exploiting data sparsity in parallel matrix powers computations;performance of dense eigensolvers on BlueGene/Q;experiences with a lanczos eigensolver in high-precision arithmetic;adaptive load balancing for massively parallel multi-level Monte;parallel one-sided Jacobi SVD algorithm with variable blocking factor;an identity parareal method for temporal parallel computations;improving perfect parallelism;methods for high-throughput computation of elementary functions;engineering nonlinear pseudorandom number generators;extending the generalized Fermat prime number search beyond one million digits using GPUS;iterative solution of singular systems with applications;scheduling bag-of-tasks applications to optimize computation time and cost;and scheduling moldable tasks with precedence constraints and arbitrary speedup functions on multiprocessors.
暂无评论