In this paper, two different acceleration techniques for a deterministic DIRECT (DIviding RECTangles)-type global optimization algorithm, DIRECT-GLce, are considered. We adopt dynamic data structures for better memory...
详细信息
In this paper, two different acceleration techniques for a deterministic DIRECT (DIviding RECTangles)-type global optimization algorithm, DIRECT-GLce, are considered. We adopt dynamic data structures for better memory usage in matlab implementation. We also study shared and distributed parallel implementations of the original DIRECT-GLce algorithm, and a distributed parallel version for the aggressive counterpart. The efficiency of DIRECT-type parallel versions is evaluated solving box- and generally constrained global optimizations problems with varying complexity, including a practical NASA speed reducer design problem. Numerical results show a good efficiency, especially for the distributed parallel version of the original DIRECT-GLce on a multi-core PC. (C) 2020 Elsevier Inc. All rights reserved.
Today multiprocessors, multicores, clusters and heterogeneous computing are becoming the most popular architectures to achieve high performance computing. The different approaches are made by system designers to enhan...
详细信息
ISBN:
(纸本)9781479969180
Today multiprocessors, multicores, clusters and heterogeneous computing are becoming the most popular architectures to achieve high performance computing. The different approaches are made by system designers to enhance the system performance such as increasing clock frequency of CPUs from MHz to GHz and addition of more number of CPU cores i.e from single core processor to dual core, quad core, hexa core, octo core, ten core and more processors. Still, multicore processing creates some challenges of its own. The extra core results into increased processor size and also high power consumption. Meanwhile, General Purpose Graphics Processing Units (GPGPUs) are designed and implemented that contain hundreds of cores with more number of Arithmetic and Logic Units and Control Units. These GPGPUs can be used in addition to CPU for heterogeneous computing for the enhancement of system performance for selected applications by data parallelism. The heterogeneous programming environment that includes other processors like GPGPU in addition to CPU can be used to enhance the execution performance of computational intensive programs. So, it is necessary for the programmer to run and analyze the selected computational intensive programs on both homogeneous and heterogeneous programming platform. The homogeneous programming environment makes the use of multi core CPU, where as the heterogeneous programming environment makes the use of different processors such as General Purpose Graphics Processing Unit (GPGPUs), Field Programmable Gate Arrays (FPGAs), Digital Signal Processors (DSPs) in addition to CPU. Hence, the programmer needs to write the code that makes the use of both CPU and other processors by using heterogeneous software environment such as parallel matlab with GPU enabled functions, matlab supported CUDA kernels and CUDA C for the execution of parallel code to achieve high performance in heterogeneous programming environment in comparison with homogeneous (sequential)
Big Data (as embodied by Hadoop clusters) and Big Compute (as embodied by MPI clusters) provide unique capabilities for storing and processing large volumes of data. Hadoop clusters make distributed computing readily ...
详细信息
ISBN:
(纸本)9781467315760;9781467315777
Big Data (as embodied by Hadoop clusters) and Big Compute (as embodied by MPI clusters) provide unique capabilities for storing and processing large volumes of data. Hadoop clusters make distributed computing readily accessible to the Java community and MPI clusters provide high parallel efficiency for compute intensive workloads. Bringing the big data and big compute communities together is an active area of research. The LLGrid team has developed and deployed a number of technologies that aim to provide the best of both worlds. LLGrid MapReduce allows the map/reduce parallel programming model to be used quickly and efficiently in any language on any compute cluster. D4M (Dynamic Distributed Dimensional Data Model) provided a high level distributed arrays interface to the Apache Accumulo database. The accessibility of these technologies is assessed by measuring the effort to use these tools and is typically a few lines of code. The performance is assessed by measuring the insert rate into the Accumulo database. Using these tools a database insert rate of 4M inserts/second has been achieved on an 8 node cluster.
matlab (R) has emerged as one of the languages most commonly used by scientists and engineers for technical computing, with approximately one million users worldwide. The primary benefits Of matlab are reduced code de...
详细信息
matlab (R) has emerged as one of the languages most commonly used by scientists and engineers for technical computing, with approximately one million users worldwide. The primary benefits Of matlab are reduced code development time via high levels of abstractions (e.g. first class multi-dimensional arrays and thousands of built in functions), interpretive, interactive programming, and powerful mathematical graphics. The compute intensive nature of technical computing means that many matlab users have codes that can significantly benefit from the increased performance offered by parallel computing. plmatlab provides this capability by implementing parallel global array semantics using standard operator overloading techniques. The core data structure in pmatlab is a distributed numerical array whose distribution onto multiple processors is specified with a "map" construct. Communication operations between distributed arrays are abstracted away from the user and plmatlab transparently supports redistribution between any block-cyclic-overlapped distributions up to four dimensions. pmatlab is built on top of the matlabMPI communication library and runs on any combination of heterogeneous systems that support matlab, which includes Windows, Linux, MacOS X, and SunOS. This paper describes the overall design and architecture of the pmatlab implementation. Performance is validated by implementing the HPC Challenge benchmark suite and comparing plmatlab performance with the equivalent C+MPI codes. These results indicate that plmatlab can often achieve comparable performance to C+MPI, usually at one tenth the code size. Finally, we present implementation data collected from a sample of real pmatlab applications drawn from the approximately one hundred users at MIT Lincoln Laboratory. These data indicate that users are typically able to go from a serial code to an efficient pmatlab code in about 3 hours while changing less than 1% of their code.
Several important laser-based medical treatments rest on the crucial knowledge of the response of tissues to laser penetration. Optical properties are often localised and are measured using optically active fluorescen...
详细信息
ISBN:
(纸本)9783642152900
Several important laser-based medical treatments rest on the crucial knowledge of the response of tissues to laser penetration. Optical properties are often localised and are measured using optically active fluorescent microspheres injected into the tissue. However, the measurement process combines the tissue properties with the optical characteristics of the measuring device which in turn requires numerically intensive mathematical simulations for extracting the tissue properties from the data. In this paper, we focus on exploiting the algorithmic parallelism in the bio-computational simulation, in order to achieve significant runtime reductions. The entire simulation accounts for over 30,000 spatial points and is too computationally demanding to run in a serial fashion. We discuss our strategies of parallelisation at different levels of granularity and we present our results on two different parallel platforms. We also emphasise the importance of retaining a high level of code abstraction in the application to benefit both agile coding and interdisciplinary collaboration between research groups.
parallel computing with the matlab(A (R)) language and environment has received interest from various quarters. The parallel Computing Toolbox(TM) and matlab(A (R)) Distributed Computing Server(TM) from The MathWorks ...
详细信息
parallel computing with the matlab(A (R)) language and environment has received interest from various quarters. The parallel Computing Toolbox(TM) and matlab(A (R)) Distributed Computing Server(TM) from The MathWorks are among several available tools that offer this capability. We explore some of the key features of the parallel matlab language that these tools offer. We describe the underlying mechanics as well as the salient design decisions and rationale for certain features in the toolset. The paper concludes by identifying some issues that we must address as the language features evolve.
A distributed approach is described for solving lineality (or linearity) space (LS) problems with large cardinalities and a large number of dimensions. The LS solution has applications in engineering, science, and bus...
详细信息
A distributed approach is described for solving lineality (or linearity) space (LS) problems with large cardinalities and a large number of dimensions. The LS solution has applications in engineering, science, and business, and includes a subset of solutions of the more general extended linear complementarity problem (ELCP). A parallel matlab framework is employed and results are computed on an 8-node Rocks based cluster computer using Remote Procedure Calls (RPCs) and the MPICH2 Message Passing Interface (MPI). Results show that both approaches perform comparably when solving distributed LS problems. This indicates that when deciding which parallel approach to use, the implementation details particular to the method are the decisive factors, which in this investigation give MPICH2 MPI the advantage.
It has been documented in the literature that the pseudospectrurn of a matrix is a powerful concept that broadens our understanding of phenomena based on matrix computations. When the matrix A is non-normal, however, ...
详细信息
It has been documented in the literature that the pseudospectrurn of a matrix is a powerful concept that broadens our understanding of phenomena based on matrix computations. When the matrix A is non-normal, however, the computation of the pseudospectrum becomes a very expensive computational task. Thus, the use of high performance computing resources becomes key to obtaining useful answers in acceptable amounts of time. In this work we describe the design and implementation of an environment that integrates a suite of state-of-the-art algorithms running on a cluster of workstations to enable the matrix pseudospectrum become a practical too for scientists and engineers. The user interacts with the environment via the graphical user interface PPsGUI. The environment is constructed on top of CMTM, an existing environment that enables distributed computation via an MPI API for matlab. (c) 2005 Elsevier B.V. All rights reserved.
In many projects the true costs of high performance computing are currently dominated by software. Addressing these costs may require shifting to higher level languages such as matlab. matlabMPI is a matlab implementa...
详细信息
In many projects the true costs of high performance computing are currently dominated by software. Addressing these costs may require shifting to higher level languages such as matlab. matlabMPI is a matlab implementation of the Message Passing Interface (MPI) standard and allows any matlab program to exploit multiple processors. matlabMPI currently implements the basic six functions that are the core of the MPI point-to-point communications standard. The key technical innovation of matlabMPI is that it implements the widely used MPI "look and feel" on top of standard matlab file I/O, resulting in an extremely compact (similar to350 lines of code) and "pure" implementation which runs anywhere matlab runs, and on any heterogeneous combination of computers. The performance has been tested on both shared and distributed memory parallel computers (e.g. Sun, SGI, HP, IBM, Linux, MacOSX and Windows). matlabMPI can match the bandwidth of C based MPI at large message sizes. A test image filtering application using matlabMPI achieved a speedup of similar to300 using 304 CPUs and similar to15% of the theoretical peak (450 Gigaflops) on an IBM SP2 at the Maui High Performance Computing Center. In addition, this entire parallel benchmark application was implemented in 70 software-lines-of-code, illustrating the high productivity of this approach. (C) 2004 Published by Elsevier Inc.
暂无评论