Reducing work-in-process (WIP) inventory is continuing to be an important business need because of several factors including the need to reduce working capital. Numerous techniques have been suggested for WIP reductio...
详细信息
Reducing work-in-process (WIP) inventory is continuing to be an important business need because of several factors including the need to reduce working capital. Numerous techniques have been suggested for WIP reduction, and CONWIP is a competitive algorithm for WIP reduction. Prior CONWIP algorithms have been primarily sequential algorithms and can be potentially incur significant computing time, especially when dealing with inventories for multiple products. The paper proposes a card-setting algorithm for multiple product types subject to routing and throughput requirements. The proposed algorithm searches the WIP space iteratively and the step-size is adaptively selected based on the known properties of multi-chain, multi-class, closed queuing networks. Furthermore, parallelization of this search algorithm across multiple processors is proposed where each processor searches a different segment of the WIP space while adaptively adjusting its step size for all product types to ensure fast convergence. The proposed parallel algorithm can take advantage of distributed computing architectures to speed-up the overall computation. An experimental implementation of the parallel algorithm using Message Passing Interface (MPI) over a high-speed network is described. Computational results demonstrate that the proposed parallel algorithm can be parallelized over eight to ten processors to obtain a speed-up of three to five.
This paper research on how to select a subtree with exactly k leaves and a diameter of at most 1, which minimizes the distance from the farthest vertex to the subtree. We call such a subtree (k, l)-center of a tree ne...
详细信息
ISBN:
(纸本)9780889866386
This paper research on how to select a subtree with exactly k leaves and a diameter of at most 1, which minimizes the distance from the farthest vertex to the subtree. We call such a subtree (k, l)-center of a tree network. In this paper, an efficient parallel algorithm is proposed for finding a (k, l)-center of a tree network. This algorithm performs on the EREW PRAM in O(log n) time using O(n) work.
Data flow acyclic directed graphs (digraphs) can be applied to accurately describe the data dependency for a wide range of grid-based scientific computing applications ranging from numerical algebra to realistic appli...
详细信息
Data flow acyclic directed graphs (digraphs) can be applied to accurately describe the data dependency for a wide range of grid-based scientific computing applications ranging from numerical algebra to realistic applications of radiation or neutron transport. The parallel computing of these applications is equivalent to the parallel execution of digraphs. This paper presents a framework of scalable heuristic algorithms for the parallel execution of digraphs. This framework consists of three components: the heuristic partitioning method of a digraph, the parallel sweeping algorithm for a partitioned digraph, and the heuristic strategy for vertex scheduling and vertex packing. Evaluation rules of heuristic algorithms are presented for better theoretical understanding and performance optimization. parallel benchmarks for the multigroup neutron or radiation S-n transport using processors from 100 to 2048 on two massively parallel machines show that these heuristic algorithms scale well.
In this paper, to obtain an efficient parallel algorithm to solve sparse block-tridiagonal linear systems, stair matrices are used to construct some parallel polynomial approximate inverse preconditioners. These preco...
详细信息
In this paper, to obtain an efficient parallel algorithm to solve sparse block-tridiagonal linear systems, stair matrices are used to construct some parallel polynomial approximate inverse preconditioners. These preconditioners are suitable when the desired goal is to maximize parallelism. Moreover, some theoretical results concerning these preconditioners are presented and how to construct preconditioners effectively for any nonsingular block tridiagonal H-matrices is also described. In addition, the validity of these preconditioners is illustrated with some numerical experiments arising from the second order elliptic partial differential equations and oil reservoir Simulations. (C) 2008 IMACS. Published by Elsevier B.V. All rights reserved.
The matter of using scheduling algorithms in parallel computing environments is discussed in this paper. There are proposed methods of parallelizing the criterion function calculations for a single solution and a grou...
详细信息
The matter of using scheduling algorithms in parallel computing environments is discussed in this paper. There are proposed methods of parallelizing the criterion function calculations for a single solution and a group of concentrated solutions (local neighborhood) dedicated to being used in metaheuristic approaches. Also a parallel scatter-search metaheuristic is proposed as a multiple-thread approach. Computational experiments are done for the flow shop, the classic NP-hard problem of the combinatorial optimization. (C) 2009 Elsevier Inc. All rights reserved.
A new parallel algorithm has been developed for second-order Moller-Plesset perturbation theory (MP2) energy calculations. Its main projected applications are for large molecules, for instance, for the calculation of ...
详细信息
A new parallel algorithm has been developed for second-order Moller-Plesset perturbation theory (MP2) energy calculations. Its main projected applications are for large molecules, for instance, for the calculation of dispersion interaction. Tests on a moderate number of processors (2-16) show that the program has high CPU and parallel efficiency. Timings are presented for two relatively large molecules, taxol (C47H51NO14) and luciferin (C11H8N2O3S2), the former with the 6-31G* and 6-311G** basis sets (1032 and 1484 basis functions, 164 correlated orbitals), and the latter with the aug-cc-pVDZ and aug-cc-pVTZ basis sets (530 and 1198 basis functions, 46 correlated orbitals). An MP2 energy calculation on C130H10 (1970 basis functions, 265 con-elated orbitals) completed in less than 2 h on 128 processors. (c) 2006 Wiley Periodicals, Inc.
The main contribution of this paper is to present an efficient parallel sorting "psort" compatible with the standard qsort. Our parallel sorting "psort" is implemented such that its interface is co...
详细信息
ISBN:
(纸本)9781424452910
The main contribution of this paper is to present an efficient parallel sorting "psort" compatible with the standard qsort. Our parallel sorting "psort" is implemented such that its interface is compatible with "qsort" in C Standard Library. Therefore, any application program that uses standard "qsort" can be accelerated by simply replacing "qsort" call by our "psort". Also, "psort" uses standard "qsort" as a "subroutine for local sequential sorting. So, if the performance of "qsort" is improved by anyone in the community, then that of our "psort" is also automatically improved. To evaluate the performance of our "psort", we have implemented our parallel sorting in a Linux server with two Intel quad-core processors (i.e. eight processor cores). The experimental results show that our "psort" is approximately 6 times faster than standard "qsort" using 8 processors. Since the speed up factor cannot be more than 8 if we use 8 cores, our algorithm is close to optimal. Also, as far as we know, no previously published parallel implementations achieve a speed up factor less than 4 using 8 cores.
We propose a parallel Mean Shift (MS) tracking algorithm on Graphics Processing Unit (GPU) using Compute Unified Device Architecture (CUDA). Traditional MS algorithm uses a large number of color histogram, say typical...
详细信息
ISBN:
(纸本)9783642021718
We propose a parallel Mean Shift (MS) tracking algorithm on Graphics Processing Unit (GPU) using Compute Unified Device Architecture (CUDA). Traditional MS algorithm uses a large number of color histogram, say typically 16x16x16, which makes parallel implementation infeasible. We thus employ K-Means clustering to partition the object color space that enables us to represent color distribution with a quite small number of bins. Based on this compact histogram, all key components of the MS algorithm are mapped onto the GPU. The resultant parallel algorithm consist of six kernel functions, which involves primarily the parallel computation of the candidate histogram and calculation of the Mean Shift vector. Experiments on public available CAVIAR videos show that the proposed parallel tracking algorithm achieves large speedup and has comparable tracking performance, compared with the traditional serial MS tracking algorithm.
The characteristics of modern graphics processing unit (GPU) is programmable, high price / performance ratio and high speed. It has a strong ability to adapt the parallel calculation, Based on this, the article study ...
详细信息
ISBN:
(纸本)9780769539010
The characteristics of modern graphics processing unit (GPU) is programmable, high price / performance ratio and high speed. It has a strong ability to adapt the parallel calculation, Based on this, the article study the general method of GPU calculating and use compute unified device architecture (CUDA) to design new parallel algorithm to accelerate the matrix inversion and Binarization algorithm. The results show that with the increase of matrix dimension, CPU performs much better than CPU in increase multiple.
Graphs or networks can be used to model complex systems. Detecting community structures from large network data is a classic and challenging task. In this paper, we propose a novel community detection algorithm, which...
详细信息
ISBN:
(纸本)9781605584959
Graphs or networks can be used to model complex systems. Detecting community structures from large network data is a classic and challenging task. In this paper, we propose a novel community detection algorithm, which utilizes a dynamic process by contradicting the network topology and the topology-based propinquity, where the propinquity. is a measure of the probability for a pair of nodes involved in a coherent community structure. Through several rounds of mutual reinforcement between topology and propinquity, the community structures are expected to naturally emerge. The overlapping vertices shared between communities can also be easily identified by an additional simple post-processing. To achieve better efficiency, the propinquity is incrementally calculated. We implement the algorithm on a vertex-oriented bulk synchronous parallel(BSP) model so that the mining load can be distributed on thousands of machines. We obtained interesting experimental results on several real network data.
暂无评论