We present FooPar, an extension for highly efficient parallel Computing in the multi-paradigm programming language Scala. Scala offers concise and clean syntax and integrates functional programming features. Our frame...
详细信息
ISBN:
(数字)9783642551956
ISBN:
(纸本)9783642551956
We present FooPar, an extension for highly efficient parallel Computing in the multi-paradigm programming language Scala. Scala offers concise and clean syntax and integrates functional programming features. Our framework FooPar combines these features withparallel computing techniques. FooPar is designed to be modular and supports easy access to different communication backends for distributed memory architectures as well as high performance math libraries. In this article we use it to parallelize matrix-matrix multiplication and show its scalability by a isoefficiency analysis. In addition, results based on a empirical analysis on two supercomputers are given. We achieve close-to-optimal performance wrt. theoretical peak performance. Based on this result we conclude that FooPar allows programmers to fully access Scalas design features without suffering from performance drops when compared to implementations purely based on C and MPI.
this paper formulates the program runtime prediction problem subject to algorithm parameters and characteristics of a computational system to be used to run the algorithm. It is suggested to build a model representing...
详细信息
ISBN:
(纸本)9781450328890
this paper formulates the program runtime prediction problem subject to algorithm parameters and characteristics of a computational system to be used to run the algorithm. It is suggested to build a model representing runtime as a function of algorithm parameters and computational system characteristics. this is followed by determination of features to be used for functional dependence recovery. A two-step method of problem solution using linear and non-linear machine learning algorithms is proposed. the paper examines peculiarities of software algorithms and suggests a method for processing experimental data provided by computational systems. It also features a comparative analysis of runtime prediction results for solution of several linear algebra problems on 84 personal computers and servers using a number of machine learning algorithms. Use of a random forest combined withthe linear least square method shows an error of less than 15% for most computational systems of similar architecture. Copyright 2014 ACM.
In this paper, embeddings of a family of 3D meshes in locally twisted cubes are studied. Let LTQ(n)(V, E) denotes the n-dimensional locally twisted cube. We find two major results in this paper:(1) For any integer n &...
详细信息
ISBN:
(数字)9783319111940
ISBN:
(纸本)9783319111940;9783319111933
In this paper, embeddings of a family of 3D meshes in locally twisted cubes are studied. Let LTQ(n)(V, E) denotes the n-dimensional locally twisted cube. We find two major results in this paper:(1) For any integer n >= 4, two node-disjoint 3D meshes of size 2 x 2 x 2(n-3) can be embedded into LTQ(n) with dilation 1 and expansion 2. (2) For any integer n = 6, four node-disjoint 4x2x2(n-5) meshes can be embedded into LTQ(n) with dilation 1 and expansion 4. Further, an embedding algorithm can be constructed based on our embedding method. the obtained results are optimal in the sense that the dilations of the embeddings are 1.
this paper presents a study of co-operation schemes for the parallel memetic algorithm to solve the vehicle routing problem with time windows. In the parallel co-operative search algorithmsthe processes communicate t...
详细信息
ISBN:
(纸本)9783642552243
this paper presents a study of co-operation schemes for the parallel memetic algorithm to solve the vehicle routing problem with time windows. In the parallel co-operative search algorithmsthe processes communicate to exchange the up-to-date solutions, which may guide the search and improve the results. the interactions between processes are defined by the content of the exchanged data, timing, connectivity and mode. We show how co-operation schemes influence the search convergence and solutions quality. the quality of a solution is defined as its proximity to the best, currently-known one. We present the experimental study for the well-known Gehring and Homberger's benchmark. the new world's best solutions obtained in the study confirm that the cooperation scheme has a strong impact on the quality of final solutions.
the storage and access of massive small files are one of the challenges in the design of distributed file system. Hadoop distributed file system (HDFS) is primarily designed for reliable storage and fast access of ver...
详细信息
ISBN:
(数字)9783319111940
ISBN:
(纸本)9783319111940;9783319111933
the storage and access of massive small files are one of the challenges in the design of distributed file system. Hadoop distributed file system (HDFS) is primarily designed for reliable storage and fast access of very big files while it suffers a performance penalty with increasing number of small files. A middleware called Hmfs is proposed in this paper to improve the efficiency of storing and accessing small files on HDFS. It is made up of three layers, file operation interfaces to make it easier for software developers to submit different file requests, file management tasks to merge small files into big ones or extract small files from big ones in the background, and file buffers to improve the I/O performance. Hmfs boosts the file upload speed by using asynchronous write mechanism and the file download speed by adopting prefetching and caching strategy. the experimental results show that Hmfs can help to obtain high speed of storage and access for massive small files on HDFS.
Agent-based modeling (ABM) has been widely used in stock market simulation. However, traditional simulations of stock markets with ABM on single computers are limited by the computing capability as breakthroughs in fi...
详细信息
ISBN:
(数字)9783319111940
ISBN:
(纸本)9783319111940;9783319111933
Agent-based modeling (ABM) has been widely used in stock market simulation. However, traditional simulations of stock markets with ABM on single computers are limited by the computing capability as breakthroughs in financial research need much larger amount of agents. this paper introduces a platform for stock market simulation with ABM focusing on large scale parallel agents in a distributed computing environment such as Cluster and MPP. Withthe customized trade strategies inside the agents, the runtime system of the platform can distribute the massive amount of agents to multiple computing nodes automatically during the execution of the simulation. And agents exchange information with each other and the market through a uniform communication system. Withthis platform financial researchers can design their own financial model without caring about the complexity of parallelization and related problems. the sample simulation on the platform is verified to be compatible withthe data from Euronext-NYSE and the platform shows fair scalability and performance under different parallelism configurations.
the multi-swarm particle swarm optimization (MPSO) algorithm incorporates multiple independent PSO swarms that cooperate by periodically exchanging information. In spite of its embarrassingly parallel nature, MPSO is ...
详细信息
ISBN:
(数字)9783642551956
ISBN:
(纸本)9783642551956
the multi-swarm particle swarm optimization (MPSO) algorithm incorporates multiple independent PSO swarms that cooperate by periodically exchanging information. In spite of its embarrassingly parallel nature, MPSO is memory bound, limiting its performance on data-parallel GPUs. Recently, heterogeneous multi-core architectures such as the AMD Accelerated processing Unit (APU) have fused the CPU and GPU together on a single die, eliminating the traditional PCIe bottleneck between them. In this paper, we provide our experiences developing an OpenCL-based MPSO algorithm for the task scheduling problem on the APU architecture. We use the AMD A8-3530MX APU that packs four x86 computing cores and 80 four-way processing elements. We make effective use of hardware features such as the hierarchical memory structure on the APU, the four-way very long instruction word (VLIW) feature for vectorization, and global-to-local memory DMA transfers. We observe a 29% decrease in overall execution time over our baseline implementation.
throughout the years, biological processing demands have been addressed by relying on the design of algorithmic approaches for parallelarchitectures. By taking advantage of multicore processor systems, we can deal wi...
详细信息
ISBN:
(纸本)9781479955480
throughout the years, biological processing demands have been addressed by relying on the design of algorithmic approaches for parallelarchitectures. By taking advantage of multicore processor systems, we can deal withthe main sources of complexity which explain the NP-hard nature of multiple problems in computational biology. In this work, we address the inference of phylogenetic topologies by using two multiobjective metaheuristics: Fast Non-Dominated Sorting Genetic Algorithm and Strength Pareto Evolutionary Algorithm 2. the additional complexity introduced by the multiobjective formulation of the problem motivates that parallel designs of these algorithms must be undertaken. For this purpose, OpenMP-based implementations of these two metaheuristics are applied. To evaluate the performance of these approaches, a comparative study has been conducted by performing experimentation on four nucleotide data sets. Our experiments suggest the relevance of these parallel algorithmic designs, improving the phylogenetic results reported by other multiobjective tools in reduced execution times.
In order to harness abundant hardware resources, parallel programming has become a necessity in multicore era. However, parallel programs are prone to concurrency bugs, especially data races. Even worse, current softw...
详细信息
General-purpose computing on graphics processing unit (GPGPU) architectures rely on data locality and regular computation to leverage parallel resources to achieve performance benefits over multi-core systems. Current...
详细信息
暂无评论