This work presents the main steps towards a parallel version of the PIC (Particle In Cell) code XPDP1 (X Plasma Device Planar 1-Dimensional), which uses a Monte Carlo procedure to treat collisions among the particles ...
详细信息
This work presents the main steps towards a parallel version of the PIC (Particle In Cell) code XPDP1 (X Plasma Device Planar 1-Dimensional), which uses a Monte Carlo procedure to treat collisions among the particles of different species of neutral and ionized pure gases such as argon, oxygen and others. The graphical interface of XPDP1 has been removed and it was parallelized by means of a hybrid approach, with message-passing for distributed memory (using MPI) and shared memory (using OpenMP). The tests for the efficiency and speedup were carried out on a hybrid homogeneous cluster and the results obtained show speedups of approximately ten for 32 cores on 4 servers, which allows the use of this code on problems which are infeasible with the serial version.
The abundance of semantically related information has resulted in semantic heterogeneity. Ontology matching is among the utilized techniques implemented for semantic heterogeneity resolution;however, ontology matching...
详细信息
The abundance of semantically related information has resulted in semantic heterogeneity. Ontology matching is among the utilized techniques implemented for semantic heterogeneity resolution;however, ontology matching being a computationally intensive problem can be a time-consuming process. Medium to large-scale ontologies can take from hours up to days of computation time depending upon the utilization of computational resources and complexity of matching algorithms. This delay in producing results, makes ontology matching unsuitable for semantic web-based interactive and semireal-time systems. This paper presents SPHeRe, a performance-based initiative that improves ontology matching performance by exploiting parallelism over multicore cloud platform. parallelism has been overlooked by ontology matching systems. SPHeRe avails this opportunity and provides a solution by: (i) creating and caching serialized subsets of candidate ontologies with single-step parallel loading;(ii) lightweight matcher-based and redundancy-free subsets result in smaller memory footprints and faster load time;and (iii) implementing data parallelism based distribution over subsets of candidate ontologies by exploiting the multicore distributed hardware of cloud platform for parallel ontology matching and execution. Performance evaluation of SPHeRe on a trinode (12-core) private cloud infrastructure has shown up to 3 times faster ontology load time with up to 8 times smaller memory footprint than Web Ontology Language (OWL) frameworks Jena and OWLAPI. Furthermore, by utilizing the computation resources most efficiently, SPHeRe provides the best scalability in contrast with other ontology matching systems, i.e., GOMMA, LogMap, AROMA, and AgrMaker. On a private cloud instance with 8 cores, SPHeRe outperforms the most performance efficient ontology matching system GOMMA by 40 % in scalability and 4 times in performance.
This paper aims to investigate Th-232/U-233 fuel cycles in a VVER-1000 reactor through calculation by computer. The 3D core geometry of VVER-1000 system was designed using the Serpent Monte Carlo 1.1.19 Code. The Serp...
详细信息
This paper aims to investigate Th-232/U-233 fuel cycles in a VVER-1000 reactor through calculation by computer. The 3D core geometry of VVER-1000 system was designed using the Serpent Monte Carlo 1.1.19 Code. The Serpent Code using parallel programming interface (Message Passing Interface-MPI), was run on a workstation with 12-core and 48 GB RAM. Th-232/U-235/U-238 oxide mixture was considered as fuel in the core, when the mass fraction of Th-232 was increased as 0.05-0.1- 0.2-0.3-0.4 respectively, the mass fraction of U-238 equally was decreased. In the system, the calculations were made for 3000 MW thermal power. For the burnup analyses, the core is assumed to deplete from initial fresh core up to a burnup of 16 MWd/kgU without refuelling considerations. In the burnup calculations, a burnup interval of 360 effective full power days (EFPDs) was defined. According to burnup, the mass changes of the Th-232, U-233, U-238, Np-237, Pu-239, Am-241 and Cm-244 were evaluated, and also flux and criticality of the system were calculated in dependence of the burnup rate.
Devices that form a wireless sensor network (WSN) system are usually remotely deployed in large numbers in a sensing field. WSNs have enabled numerous applications, in which location awareness is usually required. The...
详细信息
Devices that form a wireless sensor network (WSN) system are usually remotely deployed in large numbers in a sensing field. WSNs have enabled numerous applications, in which location awareness is usually required. Therefore, numerous localization systems are provided to assign geographic coordinates to each node in a network. In this paper, we describe and evaluate an integrated software framework WSNLS (Wireless Sensor Network Localization System) that provides tools for network nodes localization and the environment for tuning and testing various localization schemes. Simulation experiments can be performed on parallel and multi-core computers or computer clusters. The main component of the WSNLS framework is the library of solvers for calculating the geographic coordinates of nodes in a network. Our original solution implemented in WSNLS is the localization system that combines simple geometry of triangles and stochastic optimization to determine the position of nodes with unknown location in the sensing field. We describe and discuss the performance of our system due to the accuracy of location estimation and computation time. Numerical results presented in the paper confirm that our hybrid scheme gives accurate location estimates of network nodes in sensible computing time, and the WSNLS framework can be successfully used for efficient tuning and verification of different localization techniques.
The locally one-dimensional finite-difference time-domain (LOD-FDTD) method is a promising implicit technique for solving Maxwell's equations in numerical electromagnetics. This paper describes an efficient messag...
详细信息
The locally one-dimensional finite-difference time-domain (LOD-FDTD) method is a promising implicit technique for solving Maxwell's equations in numerical electromagnetics. This paper describes an efficient message passing interface (MPI)-parallel implementation of the LOD-FDTD method for Debye-dispersive media. Its computational efficiency is demonstrated to be superior to that of the parallel ADI-FDTD method. We demonstrate the effectiveness of the proposed parallel algorithm in the simulation of a bio-electromagnetic problem: the deep brain stimulation (DBS) in the human body.
Current generations of NUMA node clusters feature multicore or manycore processors. programming such architectures efficiently is a challenge because numerous hardware characteristics have to be taken into account, es...
详细信息
Current generations of NUMA node clusters feature multicore or manycore processors. programming such architectures efficiently is a challenge because numerous hardware characteristics have to be taken into account, especially the memory hierarchy. One appealing idea to improve the performance of parallel applications is to decrease their communication costs by matching the communication pattern to the underlying hardware architecture. In this paper, we detail the algorithm and techniques proposed to achieve such a result: first, we gather both the communication pattern information and the hardware details. Then we compute a relevant reordering of the various process ranks of the application. Finally, those new ranks are used to reduce the communication costs of the application.
To make applications with dynamic data sharing among threads benefit from GPU acceleration, we propose a novel software transactional memory system for GPU architectures (GPU-STM). The major challenges include ensurin...
详细信息
To make applications with dynamic data sharing among threads benefit from GPU acceleration, we propose a novel software transactional memory system for GPU architectures (GPU-STM). The major challenges include ensuring good scalability with respect to the massively multithreading of GPUs, and preventing livelocks caused by the SIMT execution paradigm of GPUs. To this end, we propose (1) a hierarchical validation technique and (2) an encounter-time lock-sorting mechanism to deal with the two challenges, respectively. Evaluation shows that GPU-STM outperforms coarse-grain locks on GPUs by up to 20x.
The electromagnetic transient (EMT) simulation of a large-scale power system consumes so much computational power that parallel programming techniques are urgently needed in this area. For example, realistic-sized pow...
详细信息
The electromagnetic transient (EMT) simulation of a large-scale power system consumes so much computational power that parallel programming techniques are urgently needed in this area. For example, realistic-sized power systems include thousands of buses, generators, and transmission lines. Massive-thread computing is one of the key developments that can increase the EMT computational capabilities substantially when the processing unit has enough hardware cores. Compared to the traditional CPU, the graphic-processing unit (GPU) has many more cores with distributed memory which can offer higher data throughput. This paper proposes a massive-thread EMT program (MT-EMTP) and develops massive-thread parallel modules for linear passive elements, the universal line model, and the universal machine model for offline EMT simulation. An efficient node-mapping structure is proposed to transform the original power system admittance matrix into a block-node diagonal sparse format to exploit the massive-thread parallel GPU architecture. The developed MT-EMTP program has been tested on large-scale power systems of up to 2458 three-phase buses with detailed component modeling. The simulation results and execution times are compared with mainstream commercial software, EMTP-RV, to show the improvement in performance with equivalent accuracy.
Transactional contention management policies show considerable variation in relative performance with changing workload characteristics. Consequently, incorporation of fixed-policy Transactional Memory (TM) in general...
详细信息
Transactional contention management policies show considerable variation in relative performance with changing workload characteristics. Consequently, incorporation of fixed-policy Transactional Memory (TM) in general purpose computing systems is suboptimal by design and renders such systems susceptible to pathologies. Of particular concern are Hardware TM (HTM) systems where traditional designs have hardwired policies in silicon. Adaptive HTMs hold promise, but pose major challenges in terms of design and verification costs. In this paper, we present the ZEBRA HTM design, which lays down a simple yet high-performance approach to implement adaptive contention management in hardware. Prior work in this area has associated contention with transactional code blocks. However, we discover that by associating contention with data (cache blocks) accessed by transactional code rather than the code block itself, we achieve a neat match in granularity with that of the cache coherence protocol. This leads to a design that is very simple and yet able to track closely or exceed the performance of the best performing policy for a given workload. ZEBRA, therefore, brings together the inherent benefits of traditional eager HTMs-parallel commits-and lazy HTMs-good optimistic concurrency without deadlock avoidance mechanisms-, combining them into a low-complexity design.
暂无评论