With recent technological advances, sharedmemory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data m...
详细信息
With recent technological advances, sharedmemory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining algorithms. We have developed a series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized full locking, and cache-sensitive locking. Unlike previous work on shared memory parallelization of specific data mining algorithms, all of our techniques apply to a large number of popular data mining algorithms. In addition, we propose a reduction-object-based interface for specifying a data mining algorithm. We show how our runtime system can apply any of the techniques we have developed starting from a common specification of the algorithm. We have carried out a detailed evaluation of the parallelization techniques and the programming interface. We have experimented with apriori and fp-tree-based association mining, k-means clustering, k-nearest neighbor classifier, and decision tree construction. The main results from our experiments are as follows: 1) Among full replication, optimized full locking, and cache-sensitive locking, there is no clear winner. Each of these three techniques can outperform others depending upon machine and dataset parameters. These three techniques perform significantly better than the other two techniques. 2) Good parallel efficiency is achieved for each of the four algorithms we experimented with, using our techniques and runtime system. 3) The overhead of the interface is within 10 percent in almost all cases. 4) In the case of decision tree construction, combining different techniques turned out to be crucial for achieving high performance.
In recent years, in-silico experimentation within the field of oncological medicine has been intensively investigated with the aim of better understanding tumor dynamics and dose-response relationships in cancer treat...
详细信息
In recent years, in-silico experimentation within the field of oncological medicine has been intensively investigated with the aim of better understanding tumor dynamics and dose-response relationships in cancer treatments. In a series of previous works, Lujan et al. (2018, 2017, 2016)we described the micro-environmental influence on micro-tumor infiltration patterns through in-silico/in-vitro experimentation. Here we present the latest version of the software utilized for, but not limited to, those studies: LibreGrowth, a libre tumor growth code able to simulate the core growth and peripheral tumor cell infiltration, considering a benign and a malignant stages. We implemented a reaction-diffusion based model, with spatially variable diffusion coefficient, into a three-dimensional domain, using C++ and OpenMP over a GNU/Linux system. LibreGrowth aims to provide a flexible implementation for depicting heterogeneous tissues and infiltration processes, and to shed light in current therapy optimization strategies. (C) 2019 Elsevier B.V. All rights reserved.
OpenMP is widely accepted as a de facto standard for sharedmemory parallel programming in Fortran, C and C++. Nested parallelization has been included in the first OpenMP specification, but it took a few years until ...
详细信息
OpenMP is widely accepted as a de facto standard for sharedmemory parallel programming in Fortran, C and C++. Nested parallelization has been included in the first OpenMP specification, but it took a few years until the first commercially available compilers supported this optional part of the specification. We employed nested parallelization using OpenMP in three production codes: a C++ code for content-based image retrieval, a C++ code for the computation of critical points in multi-block CFD datasets, and a multi-block Navier-Stokes solver written in Fortran90. In this paper we discuss the opportunities as well as the deficiencies of the nested parallelization support in OpenMP.
We describe a programming interface for parallel computing on NUMA (Non-Uniform memory Access) sharedmemory machines. Although the interest in this architecture is rapidly growing and more and more hardware manufactu...
详细信息
ISBN:
(纸本)3540628983
We describe a programming interface for parallel computing on NUMA (Non-Uniform memory Access) sharedmemory machines. Although the interest in this architecture is rapidly growing and more and more hardware manufacturers offer products of this type, there is still a lack in parallelization support. We developed SMI, the sharedmemory Interface and implemented it as a library on an SCI-coupled cluster of workstations. It aims at providing sophisticated support to account for the NUMA performance characteristic and to allow a step-by-step parallelization. We show it's application to the parallelization of a sparse matrix computation.
This paper aims at comparing the serial, shared memory parallelization, and distributed memoryparallelization of the dynamic programming algorithm for the Knapsack Problem. Knapsack Problem is one of the most popular...
详细信息
ISBN:
(纸本)9781665404761
This paper aims at comparing the serial, shared memory parallelization, and distributed memoryparallelization of the dynamic programming algorithm for the Knapsack Problem. Knapsack Problem is one of the most popular optimization problems. This is the decision-making problem and uses for real-world situations such as business projects, airline cargo business, cryptography, and decision-making industry processes, etc. The algorithm under consideration is the table-based dynamic programming algorithm based on Bellman's optimality principle. We used the C-HF programming language. To solve this problem on sharedmemory systems, we used the OpenMP. For the distributed memoryparallelization, we employed the MPL The structure of the algorithm, the data distribution, synchronization, and communication schemes are explained in detail. Extensive experiments for the developed algorithms were carried out. The obtained results helped to make a comparative analysis of the developed algorithms.
With the installation af the NEC SX-4/16 in 1996 at the Swiss Center for Scientific Computing ABSTRACT (CSCS/SCSC), CSCS/SCSC and NEC embarked on a joint program for the porting and development of applications of stra...
详细信息
With the installation af the NEC SX-4/16 in 1996 at the Swiss Center for Scientific Computing ABSTRACT (CSCS/SCSC), CSCS/SCSC and NEC embarked on a joint program for the porting and development of applications of strategic importance to the Swiss user community, also known as the 'SX-4 Task Force.' The primary objective of this collaborative program was to contribute to the progress of the users' R&D programs by ensuring optimum use of the installed SX-4 supercomputer. The results presented demonstrate the great benefit to the user community from the Swiss Federal Institutes of Technology and the Swiss universities. Significant contributions to computational science in Switzerland could be made. Examples are given where the outstanding performance obtained for key application codes opened the door, in the sense of true feasibility breakthroughs, to novel types of simulations and modeling. Notable examples are the simulation of molecules of unprecedented size and the direct simulation of turbulence at resolutions unattainable thus far.
This paper deals with the numerical determination of the stress and displacement distribution in a solid body subjected to the applied external force. The tackled solid mechanics problem is governed by the Navier-Cauc...
详细信息
暂无评论