We implemented five conversions of simulated annealing (SA) algorithm from sequential-to-parallel forms on high-performance computers and applied them to a set of standard function optimization problems in order to te...
详细信息
We implemented five conversions of simulated annealing (SA) algorithm from sequential-to-parallel forms on high-performance computers and applied them to a set of standard function optimization problems in order to test their performances. According to the experimental results, we eventually found that the traditional approach to parallelizing simulated annealing, namely, parallelizing moves in sequential SA, difficultly handled very difficult problem instances. Divide-and-conquer decomposition strategy used in a search space sometimes might find the global optimum function value, but it frequently resulted in great time cost if the random search space was considerably expanded. The most effective way we found in identifying the global optimum solution is to introduce genetic algorithm (GA) and build a highly hybrid GA+SA algorithm. In this approach, GA has been applied to each cooling temperature stage. Additionally, the performance analyses of the best algorithm among the five implemented algorithms have been done on the IBM Beowulf PCs Cluster and some comparisons have been made with some recent global optimization algorithms in terms of the number of functional evaluations needed to obtain a global minimum, success rate and solution quality.
Artificial neural networks simulate biological processes in an intriguing manner. Ideas gleaned from the study of neurophysiology and animal behavior have become realizable in recent years. The advent of computers cap...
详细信息
Artificial neural networks simulate biological processes in an intriguing manner. Ideas gleaned from the study of neurophysiology and animal behavior have become realizable in recent years. The advent of computers capable of rapidly executing massively parallel and distributed processes has allowed ideas from diverse fields to be merged and tested. The resulting neural networks, simulated in software and/or hardware, provide an adaptable, robust modeling tool useful to simulationists in all disciplines.
Efficient task scheduling on heterogeneous distributed computing systems (HeDCSs) requires the consideration of the heterogeneity of processors and the inter-processor communication. This paper presents a two-phase al...
详细信息
Efficient task scheduling on heterogeneous distributed computing systems (HeDCSs) requires the consideration of the heterogeneity of processors and the inter-processor communication. This paper presents a two-phase algorithm, called H2GS, for task scheduling on HeDCSs. The first phase implements a heuristic list-based algorithm, called LDCP, to generate a high quality schedule. In the second phase, the LDCP-generated schedule is injected into the initial population of a customized genetic algorithm, called GAS, which proceeds to evolve shorter schedules. GAS employs a simple genome composed of a two-dimensional chromosome. A mapping procedure is developed which maps every possible genome to a valid schedule. Moreover, GAS uses customized operators that are designed for the scheduling problem to enable an efficient stochastic search. The performance of each phase of H2GS is compared to two leading scheduling algorithms, and H2GS outperforms both algorithms. The improvement in performance obtained by H2GS increases as the inter-task communication cost increases. (C) 2011 Elsevier Inc. All rights reserved.
This paper presents a very efficient method for establishing nonlinear combinations of variables from small to big data for use in later processing (e.g., regression, classification, etc.). Variables are first partiti...
详细信息
This paper presents a very efficient method for establishing nonlinear combinations of variables from small to big data for use in later processing (e.g., regression, classification, etc.). Variables are first partitioned into subsets each of which has a linguistic term (called a causal condition) associated with it. Our Causal Combination Method uses fuzzy sets to model the terms and focuses on interconnections (causal combinations) of either a causal condition or its complement, where the connecting word is AND which is modeled using the minimum operation. Our Fast Causal Combination Method is based on a novel theoretical result, leads to an exponential speedup in computation and lends itself to parallel and distributed processing;hence, it may be used on data from small to big. (C) 2014 Elsevier Inc. All rights reserved.
This paper describes a parallel algorithm for Molecular Dynamics simulation of a lipid membrane using the isothermal-isobaric ensemble. A message-passing paradigm is adopted for interprocessor communications using PVM...
详细信息
This paper describes a parallel algorithm for Molecular Dynamics simulation of a lipid membrane using the isothermal-isobaric ensemble. A message-passing paradigm is adopted for interprocessor communications using PVM3 (parallel Virtual Machine). A data decomposition technique is employed for the parallelization of the calculation of intermolecular forces. The algorithm has been tested both on distributed memory architecture (DEC Alpha 500 workstation dusters) and shared memory architecture (SGI Powerchallenge with 20 R10000 processors) for a dipalmitoylphosphatidylcholine (DPPC) lipid bilayer consisting of 32 DPPC molecules and 928 water molecules. For each architecture, we measure the execution time with average work load, and the optimal number of processors for the current simulation. Some dynamical quantities are presented for a 2 ns simulation obtained with 5 processors on DEC Alpha 500 workstations. Our results show that the code is extremely efficient on 5-8 processors, and a useful addition to other major computational resources. (C) 1999 Published by Elsevier Science B.V.
This paper presents a study of graph partitioning schemes for parallel graph community detection on distributed memory machines. We investigate the relationship between graph structure and parallel clustering effectiv...
详细信息
This paper presents a study of graph partitioning schemes for parallel graph community detection on distributed memory machines. We investigate the relationship between graph structure and parallel clustering effectiveness, and develop a heuristic partitioning algorithm suitable for modularity-based algorithms. We demonstrate the accuracy and scalability of our approach using several real-world large graph datasets compared with stateof-the-art parallel algorithms on the Cray XK7 supercomputer at Oak Ridge National Laboratory. Given the ubiquitous graph model, we expect this high-performance solution will help lead to new insights in numerous fields. (C) 2016 Elsevier B.V. All rights reserved.
parallel and distributed processing has been broadly applied to scientific and engineering computing, including various aspects of power system analysis. This paper first presents a distributedprocessing approach of ...
详细信息
parallel and distributed processing has been broadly applied to scientific and engineering computing, including various aspects of power system analysis. This paper first presents a distributedprocessing approach of reliability index assessment (RIA) for distribution systems. Then, this paper proposes a balanced task partition approach to achieve better efficiency. Next, the distributedprocessing of RIA is applied to reliability-based network reconfiguration (NR), which employs an algorithm combining local search and simulated annealing to optimize system reliability. Testing results are presented to demonstrate the speeded execution of RIA and NR with distributedprocessing.
In this paper, we introduce a runtime, nontrace-based algorithm to compute the critical path profile of the execution of message passing and shared-memory parallel programs. Our algorithm permits starting or stopping ...
详细信息
In this paper, we introduce a runtime, nontrace-based algorithm to compute the critical path profile of the execution of message passing and shared-memory parallel programs. Our algorithm permits starting or stopping the critical path computation during program execution and reporting intermediate values. We also present an online algorithm to compute a variant of critical path, called critical path zeroing, that measures the reduction in application execution time that improving a selected procedure will have. Finally, we present a brief case study to quantify the runtime overhead of our algorithm and to show that online critical path profiling can be used to find program bottlenecks.
Among the so-called "4Vs" (volume, velocity, variety, and veracity) that characterize the complexity of Big Data, this paper focuses on the issue of "Volume" in order to ensure good performance for...
详细信息
Among the so-called "4Vs" (volume, velocity, variety, and veracity) that characterize the complexity of Big Data, this paper focuses on the issue of "Volume" in order to ensure good performance for Extracting-Transforming-Loading (ETL) processes. In this study, we propose a new fine-grained parallelization/distribution approach for populating the Data Warehouse (DW). Unlike prior approaches that distribute the ETL only at coarse-grained level of processing, our approach provides different ways of parallelization/distribution both at process, functionality and elementary functions levels. In our approach, an ETL process is described in terms of its core functionalities which can run on a cluster of computers according to the Map Reduce (MR) paradigm. The novel approach allows thereby the distribution of the ETL process at three levels: the "process" level for coarse-grained distribution and the "functionality" and "elementary functions" levels for fine-grained distribution. Our performance analysis reveals that employing 25 to 38 parallel tasks enables the novel approach to speed up the ETL process by up to 33% with the improvement rate being linear.
In this paper, we describe a software-based MPEG-4 video encoder which is implemented using parallelprocessing on a cluster of workstations collectively working as a virtual machine. The contributions of our work are...
详细信息
In this paper, we describe a software-based MPEG-4 video encoder which is implemented using parallelprocessing on a cluster of workstations collectively working as a virtual machine. The contributions of our work are as follows. First, a hierarchical Petri-nets-based modeling methodology is proposed to capture the spatiotemporal relationships among multiple objects at different levels of an MPEG-4 video sequence, Second, a scheduling algorithm is proposed to assign video objects to workstations for encoding in parallel, The algorithm determines the execution order of video objects, ensures that the synchronization requirements among them are enforced and that presentation deadlines are met, Third, a dynamic partitioning scheme is proposed which divides an object among multiple workstations to extract additional parallelism, The scheme achieves load balancing among the workstations with a low overhead, The striking feature of our encoder is that it adjusts the allocation and partitioning of objects automatically according to the dynamic variations in the video object behavior. We have made various additional software optimizations to further speed up the computation. The performance of the encoder can scale according to the number of workstations used. With 20 workstations, the encoder yields an encoding rate higher than real time, allowing the encoding of multiple sequences simultaneously.
暂无评论