Recent studies have demonstrated that the performance of a simulated annealing algorithm can be improved by following multiple-search paths and parallel computation. In this paper, we use these strategies to solve a c...
详细信息
Recent studies have demonstrated that the performance of a simulated annealing algorithm can be improved by following multiple-search paths and parallel computation. In this paper, we use these strategies to solve a comprehensive mathematical model for a flexible flowshop lot streaming problem. In the flexible flowshop environment, a number of jobs will be processed in several consecutive production stages, and each stage may involve a certain number of parallel machines that may not be identical. Each job has to be split into several unequal sublots by following the concept of lot streaming. The sublots are to be processed in the order of the stages, and sublots of certain products may skip some stages. This complex problem also incorporates sequence-dependent setup times, the anticipatory or nonanticipatory nature of setups, release dates for machines, and machine eligibility. Numerical examples are presented to demonstrate the effectiveness of lot streaming in hybrid flowshops, the performance of the proposed simulated annealing algorithm, and the improvements achieved using parallel computation.
Four algorithms are analyzed in the shared and nonshared (distributed) memory models of parallel computation. The analysis shows that the shared memory model predicts optimality for algorithms and programming styles t...
详细信息
Four algorithms are analyzed in the shared and nonshared (distributed) memory models of parallel computation. The analysis shows that the shared memory model predicts optimality for algorithms and programming styles that cannot be realized on any physical parallel computers. Programs based on these techniques are inferior to programs written in the nonshared memory model. The "unit" cost charged for a reference to shared memory is argued to be the source of the shared memory model's inaccuracy. The implications of these observations are discussed.
This paper shows that a fat-pyramid of area Theta(A) requires only O(log A) slowdown to simulate any competing network of area A under very general conditions. The result holds regardless of the processor size (amount...
详细信息
This paper shows that a fat-pyramid of area Theta(A) requires only O(log A) slowdown to simulate any competing network of area A under very general conditions. The result holds regardless of the processor size (amount of attached memory) and number of processors in the competing networks as long as the limitation on total area is met. Furthermore, the result is valid regardless of the relationship between wire length and wire delay. We especially focus on elimination of the common simplifying assumption that unit time suffices to traverse a wire regardless of its length, since the assumption becomes more and more untenable as the size of parallel systems increases. This paper concentrates on simulation using transmission lines (wires along which bits can be pipelined) with the message routing schedule set up offline, but it also discusses the extension to on-line simulation. This paper also examines the capabilities of a fat-pyramid when matched against a substantially larger network and points out the surprising difficulty of doing such a comparison without the unit wire delay assumption.
Automatic process partitioning is the operation of automatically rewriting an algorithm as a collection of tasks, each operating primarily on its own portion of the data, to carry out the computation in parallel. Hybr...
详细信息
Automatic process partitioning is the operation of automatically rewriting an algorithm as a collection of tasks, each operating primarily on its own portion of the data, to carry out the computation in parallel. Hybrid shared memory systems provide a hierarchy of globally accessible memories. To achieve high performance on such machines one must carefully distribute the work and the data so as to keep the workload balanced while optimizing the access to nonlocal data. In this paper we consider a semi-automatic approach to process partitioning in which the compiler, guided by advice from the user, automatically transforms programs into such an interacting set of tasks. This approach is illustrated with a picture processing example written in BLAZE, which is transformed by the compiler into a task system maximizing locality of memory reference.
The creation of a routing overlay network on the Internet requires the identification of shorter detour paths between end hosts in comparison to the default path available. These detour paths are typically the edges f...
详细信息
The creation of a routing overlay network on the Internet requires the identification of shorter detour paths between end hosts in comparison to the default path available. These detour paths are typically the edges forming a Triangle Inequality Violation (TIV), an artifact of the Internet delay space where the sum of latencies across an intermediate hop is lesser than the direct latency between the pair of end hosts. These violations are caused mainly due to interdomain routing policies between Autonomous Systems (ASes) and AS peering through Internet eXchange Points (IXPs). Identifying detours for a global overlay network requires large amounts of computational capabilities due to the sheer number of possible paths linking source and destination ASes. In this work, we use parallel programming paradigms to exploit the massively parallel capabilities of analyzing the large network measurement datasets made available to the network research community by CAIDA. We study Internet routes traversing IXPs and measure potential TIVs created by these paths. Large scale analysis of the dataset is carried out by implementing an efficient parallel solution on the CPU and then the general purpose graphics processor unit (GPGPU) as well. Both multicore CPU and GPGPU implementations can be carried out with ease on desktop environments with readily available software. We find both parallel solutions yield high improvements in speedup (2-35x) in comparison to the serial methodologies thereby opening up the possibility of harnessing the power of parallel programming with readily available hardware. The large amount of data analyzed and studied helps draw various inferences for the networking research community in building future scalable Internet routing overlays with greater routing efficiencies.
The intuition that a long history is required for the emergence of complexity in natural systems is formalized using the notion of depth. The depth of a system is defined in terms of the number of parallel computation...
详细信息
The intuition that a long history is required for the emergence of complexity in natural systems is formalized using the notion of depth. The depth of a system is defined in terms of the number of parallel computational steps needed to simulate it. Depth provides an objective, irreducible measure of history that is applicable to systems of the kind studied in statistical physics. It is argued that physical complexity cannot occur in the absence of substantial depth and that depth is a useful proxy for physical complexity. The ideas are illustrated for a variety of systems in statistical physics. (c) 2006 Wiley Periodicals, Inc.
In this paper, we suggest two kinds of approximation methods based on Taylor series expansion which can solve the non-linear equation in entropic lattice Boltzmann model without using any iteration methods such as New...
详细信息
In this paper, we suggest two kinds of approximation methods based on Taylor series expansion which can solve the non-linear equation in entropic lattice Boltzmann model without using any iteration methods such as Newton-Raphson method. The advantage of our methods is to be able to avoid the load imbalance in parallel computation which occurs due to the differences of iteration number on each calculation grid. In this study. ELBM simulations using our methods were compared with those using Newton-Raphson method for the channel flow past a square cylinder in Re = 1000 and the validity of the results and computational effort were investigated. As a result, it was found that the solutions obtained by our methods are qualitatively and quantitatively reasonable and CPU time is shorter than those obtained by Newton-Raphson method. (C) 2011 Elsevier Ltd. All rights reserved.
On parallel computation based on the domain partitioning, the efficient method for communication between partitioned subdomains was investigated. In the method, the communication table was generated by dividing a comm...
详细信息
On parallel computation based on the domain partitioning, the efficient method for communication between partitioned subdomains was investigated. In the method, the communication table was generated by dividing a communication process into multi-stage, limiting each core assigned with each subdomain to communicate once at each stage, and determining the core with more neighboring subdomains in priority to another to communicate until no more cores were available at each stage. The parallel computation of fluid flow based on the finite volume method was performed and it was found that the parallel computation with the proposed communication table could successfully reduce computational time for communication compared with that with the conventional one with an increase in the number of cores used.
The calculation of pairing plays a key role in pairing-based cryptography. Usually, the calculation is based on Miller's algorithm. However, most of the optimisations of Miller's algorithm are of serial struct...
详细信息
The calculation of pairing plays a key role in pairing-based cryptography. Usually, the calculation is based on Miller's algorithm. However, most of the optimisations of Miller's algorithm are of serial structure. In this paper, we propose a method to parallel compute Tate pairing efficiently. We split the divisor in Miller's algorithm into three parts. Then we use efficiently computation endomorphism and precomputation method to reduce computational cost. Compared with general version of Miller's algorithm in serial structure, our method has a gain of around 50.0%.
Performance of Hadoop platform is closely related to the task scheduler. Based on the analysis of the existing Hadoop platform's job scheduling algorithm, I propose a multi-queue job scheduling optimization algori...
详细信息
ISBN:
(纸本)9781510835429
Performance of Hadoop platform is closely related to the task scheduler. Based on the analysis of the existing Hadoop platform's job scheduling algorithm, I propose a multi-queue job scheduling optimization algorithm in the paper. Based on the actual test and analysis of Hadoop platform, it can be seen that the optimization algorithm proposed in this paper can effectively allocate the node resources according to the degree of demand for the node resource. It can achieve the resource sharing among multiple queues. And at the same time, it can also effectively avoid the ping-pong effect brought by resource competition at the same time. Optimization algorithm has greater improvements in execution efficiency than Hadoop default algorithm.
暂无评论