During the last twenty years the lattice Boltzmann method (LBM) has been developed as an alternative approach for modeling of fluid dynamics. A parallel implementation of the LBM for 3D fluid dynamics simulations usin...
详细信息
ISBN:
(纸本)9783642032745
During the last twenty years the lattice Boltzmann method (LBM) has been developed as an alternative approach for modeling of fluid dynamics. A parallel implementation of the LBM for 3D fluid dynamics simulations using the Fortran-DVM language is presented. the LBM is parallelized by using spatial decomposition and implemented on a distributed memory cluster MVS-100K. the test problem has been solved for different number of processors (from 1 to 1024). Pictures of flows are compared visually withthe similar pictures published in the literature.
the 10th ACIS internationalconference on Software Engineering, Artificial Intelligence, Networking, and parallel/distributedcomputing, held in Daegu, Korea on May 27-29, 2009 is aimed at bringing together researcher...
ISBN:
(数字)9783642012037
ISBN:
(纸本)9783642012020
the 10th ACIS internationalconference on Software Engineering, Artificial Intelligence, Networking, and parallel/distributedcomputing, held in Daegu, Korea on May 27-29, 2009 is aimed at bringing together researchers and scientist, businessmen and entrepreneurs, teachers and students to discuss the numerous fields of computer science, and to share ideas and information in a meaningful way. this publication captures 20 of the conference's most promising papers, and we impatiently await the important contributions that we know these authors will bring to the field.
Graphics processing units (GPUs) are powerful computational devices tailored towards the needs of the 3-D gaming industry for high-performance, real-time graphics engines. Nvidia Corporation released a new generation ...
详细信息
ISBN:
(纸本)9780769536422
Graphics processing units (GPUs) are powerful computational devices tailored towards the needs of the 3-D gaming industry for high-performance, real-time graphics engines. Nvidia Corporation released a new generation of GPUs designed for general-purpose computing in 2006, and it released a GPU programming language called CUDA in 2007. the DNA microarray technology is a high throughput tool for assaying mRNA abundance in cell samples. In. data analysis, scientists often apply hierarchical clustering of the genes, where a fundamental operation is to calculate all pairwise distances. If there are n genes, it takes O(n(2)) time. In this work, GPUs and the CUDA language are used to calculate pairwise distances. For Manhattan distance, GPU/CUDA achieves a 40 to 90 times speed-up compared to the central processing unit implementation;for Pearson correlation coefficient, the speed-up is 28 to 38 times.
the main goal of this paper is to develop an efficient method of triangular mesh generation for physical objects which have similar geometrical structure. the method is based on deforming a high quality mesh generated...
详细信息
ISBN:
(纸本)9783642032745
the main goal of this paper is to develop an efficient method of triangular mesh generation for physical objects which have similar geometrical structure. the method is based on deforming a high quality mesh generated over some "ideal" object into another object of the same structure with mesh quality preservation. the approach uses the Self Organizing Maps algorithm and has been applied for constructing meshes on human femur bones using the GeomBox and GeomRandom packages. A parallel deformation algorithm is implemented using MPI. the efficiency of the parallelization is about 90%.
Since 2005 the Taiwanese government has invested over $1.2 billion into the M-Taiwan program to bolster Taiwan's broadband mobile communications industry and infrastructure. In addition to building a fiber backbon...
详细信息
the augmented Lagrangian and Generalized Newton methods are used to simultaneously solve the primal and dual linear programming (LP) problems. We propose parallel implementation of the method to solve the primal linea...
详细信息
ISBN:
(纸本)9783642032745
the augmented Lagrangian and Generalized Newton methods are used to simultaneously solve the primal and dual linear programming (LP) problems. We propose parallel implementation of the method to solve the primal linear programming problem with very large number (approximate to 2 . 10(6)) of nonnegative variables and a large (approximate to 2 . 10(5)) number of equality type constraints.
the paper describes the experimental library SSCC_PIPL for image processing on multicomputers. Basic principles of library building, some architectural solutions, and test results are given.
ISBN:
(纸本)9783642032745
the paper describes the experimental library SSCC_PIPL for image processing on multicomputers. Basic principles of library building, some architectural solutions, and test results are given.
SRAM (static random access memory)-based pipelined algorithmic solutions have become competitive alternatives to TCAMs (ternary content addressable memories) for high-throughput IP lookup. Multiple pipelines can be ut...
详细信息
SRAM (static random access memory)-based pipelined algorithmic solutions have become competitive alternatives to TCAMs (ternary content addressable memories) for high-throughput IP lookup. Multiple pipelines can be utilized in parallel to improve the throughput further. However, several challenges must be addressed to make such solutions feasible. First, the memory distribution over different pipelines, as well as across different stages of each pipeline, must be balanced. Second, the traffic among these pipelines should be balanced. third, the intra-flow packet order (i.e. the sequence) must be preserved. In this paper, we propose a parallel SRAM-based multi-pipeline architecture for IP lookup. A two-level mapping scheme is developed to balance the memory requirement among the pipelines as well as across the stages in each pipeline. To balance the traffic, we propose an early caching scheme to exploit the data locality inherent in the architecture. Our technique uses neither a large reorder buffer nor complex reorder logic. Instead, a flow-aware queuing scheme exploiting the flow information is used to maintain the intra-flow sequence. Extensive simulation using real-life traffic traces shows that the proposed architecture with 8 pipelines can achieve a throughput of up to 10 billion packets per second, i.e. 3.2 Tbps for minimum size (40 bytes) packets, while preserving intra-flow packet order. (c) 2009 Elsevier Inc. All rights reserved.
Fuzzy clustering based vector quantization algorithm has been widely used in the field of data compression since the use of fuzzy, clustering analysis in the early stages of a vector quantization process can make this...
详细信息
ISBN:
(纸本)9780769536422
Fuzzy clustering based vector quantization algorithm has been widely used in the field of data compression since the use of fuzzy, clustering analysis in the early stages of a vector quantization process can make this process less sensitive to initialization. However, the process of fuzzy clustering is computationally very intensive because of its complex framework for the quantitative formulation of the uncertainty involved in the training vector space. To overcome the computational burden of the process, we introduce a parallel implementation of Fuzzy Vector Quantization (FVQ) using a representative data parallel architecture which consists of 4,096 processing elements (PEs). Our parallel approach provides a computationally efficient solution withthe 4,096 PEs by employing an effective vector assignment strategy for the transition from soft to crisp decisions during the clustering process. Experimental results show that our parallel approach provides 1000x greater performance and 100x higher energy efficiency than other implementations using commercial processors such as ARM families.
the proceedings contain 140 papers. the topics discussed include: stochastically robust resource management in heterogeneous parallelcomputing systems;data structures and algorithms for packet forwarding and classifi...
ISBN:
(纸本)9780769539089
the proceedings contain 140 papers. the topics discussed include: stochastically robust resource management in heterogeneous parallelcomputing systems;data structures and algorithms for packet forwarding and classification;high-performance cloud computing: a view of scientific applications;constructing independent spanning trees for hypercubes and locally twisted cubes;a node-to-set disjoint-path routing algorithm in metacube;a new network topology for P2P overlay based on a contracted star graph;a single-trace cycle collection for reference counting systems;a heuristic routing scheme for wireless sensor networks based on a local search method;identifying useless states in non-FIFO distributed computations by using pseudo timestamps;job scheduling techniques for distributed systems with heterogeneous processor cardinality;pipelined computation of very large word-length LNS addition/subtraction computation with exponential convergence rate;and conflict-avoidance in multicore caching for data-similar executions.
暂无评论