SRAM (static random access memory)-based pipelined algorithmic solutions have become competitive alternatives to TCAMs (ternary content addressable memories) for high-throughput IP lookup. Multiple pipelines can be ut...
详细信息
SRAM (static random access memory)-based pipelined algorithmic solutions have become competitive alternatives to TCAMs (ternary content addressable memories) for high-throughput IP lookup. Multiple pipelines can be utilized in parallel to improve the throughput further. However, several challenges must be addressed to make such solutions feasible. First, the memory distribution over different pipelines, as well as across different stages of each pipeline, must be balanced. Second, the traffic among these pipelines should be balanced. third, the intra-flow packet order (i.e. the sequence) must be preserved. In this paper, we propose a parallel SRAM-based multi-pipeline architecture for IP lookup. A two-level mapping scheme is developed to balance the memory requirement among the pipelines as well as across the stages in each pipeline. To balance the traffic, we propose an early caching scheme to exploit the data locality inherent in the architecture. Our technique uses neither a large reorder buffer nor complex reorder logic. Instead, a flow-aware queuing scheme exploiting the flow information is used to maintain the intra-flow sequence. Extensive simulation using real-life traffic traces shows that the proposed architecture with 8 pipelines can achieve a throughput of up to 10 billion packets per second, i.e. 3.2 Tbps for minimum size (40 bytes) packets, while preserving intra-flow packet order. (c) 2009 Elsevier Inc. All rights reserved.
A modification of the second order Incomplete Cholesky (IC) factorization with controllable amount of fill-in is described and analyzed. this algorithm is applied to the construction of well balanced coarse-grain para...
详细信息
ISBN:
(纸本)9783642032745
A modification of the second order Incomplete Cholesky (IC) factorization with controllable amount of fill-in is described and analyzed. this algorithm is applied to the construction of well balanced coarse-grain parallel preconditioning for the Conjugate Gradient (CG) iterative solution of linear systems with symmetric positive definite matrix. the efficiency of the resulting parallel algorithm is illustrated by a series of numerical experiments using large-scale ill-conditioned test matrices taken from the collection of the University of Florida.
Since the amount of information is rapidly growing, there is an overwhelming interest in efficient network computing systems including Grids, public-resource computing systems, P2P systems and Cloud computing. In this...
详细信息
ISBN:
(纸本)9783642043932
Since the amount of information is rapidly growing, there is an overwhelming interest in efficient network computing systems including Grids, public-resource computing systems, P2P systems and Cloud computing. In this paper we take a detailed look at the problem of modeling and optimization of network computing systems for parallel decision tree induction methods. Firstly, we present a comprehensive discussion Oil mentioned induction methods with a special focus on their parallel versions. Next, we propose a generic optimization model of a network computing system that can be used for distributed implementation of parallel decision trees. To illustrate our work we provide results of numerical experiments showing that the distributed approach enables significant improvement of the system throughput.
A parallel 3D code for simulation of galaxies and protoplanetary discs is developed. the model includes dust;gas, gravitation and friction between dust and gas. the kinetic equation for dust particles is solved by PIC...
详细信息
ISBN:
(纸本)9783642032745
A parallel 3D code for simulation of galaxies and protoplanetary discs is developed. the model includes dust;gas, gravitation and friction between dust and gas. the kinetic equation for dust particles is solved by PIC method. Gas dynamics equations are solved by FLIC method. In parallel implementation a domain decomposition technique is used where each subdomain is processed by a group of processors. Results of parallelization efficiency are presented.
the theoretic and algorithmic description of the parallel batch pattern back propagation (BP) training algorithm of multilayer perceptron is presented in this paper. the efficiency research of the developed parallel a...
详细信息
ISBN:
(纸本)9783642024801
the theoretic and algorithmic description of the parallel batch pattern back propagation (BP) training algorithm of multilayer perceptron is presented in this paper. the efficiency research of the developed parallel algorithm is fulfilled at progressive increasing of the dimension of parallelized problem on general-purpose parallel Computer NEC TX-7.
Today, the development of Bag-of-Tasks, i.e. embarrassingly parallel, applications for execution on multiprocessors or clusters requires the use of APIs not designed for this kind of problem. For instance, MPI allows ...
详细信息
ISBN:
(纸本)9781605588506
Today, the development of Bag-of-Tasks, i.e. embarrassingly parallel, applications for execution on multiprocessors or clusters requires the use of APIs not designed for this kind of problem. For instance, MPI allows the parallel execution of tasks, but was developed for much complex parallelapplications, with high data communication between tasks. the use of such APIs requires the programmers to learn them, and add complexity to the final parallel solution. Mercury provides a platform for the transformation of serial applications into parallel Bag-of-Tasks. Mercury reads a configuration file stating what methods and classes should be parallelized, loads the application, and in run-time transforms it so that the specified methods are executed concurrently. this transformation is performed without user intervention. Its modular design allows the integration of Mercury with different parallel environments. the initial experiments done show that the overhead is minimal, and that it is possible to take advantage of parallel processing environments (multiprocessors/multicores, clusters, ...) without the use of complex APIs. Copyright 2009 ACM.
Fragmentation of the often used numerical algorithms for inclusion into the library of parallel numerical subroutines are considered. Algorithms and programs fragmentation allow to create parallel programs that can be...
详细信息
ISBN:
(纸本)9783642032745
Fragmentation of the often used numerical algorithms for inclusion into the library of parallel numerical subroutines are considered. Algorithms and programs fragmentation allow to create parallel programs that can be executed on parallel computers of different types (multiprocessors and/or multicomputers) and can be dynamically tuned to all the available resources. Programs' fragmentation is the way of automatic providing of the dynamic properties of parallel programs, like dynamic load balancing. Algorithm's fragmentation is a technological method of numerical algorithms parallelization which provides their effective parallel implementation.
A fragmented approach to parallel programming of numerical methods and its implementation in the asynchronous programming system Aspect are considered. It provides several important advantages like automatic implement...
详细信息
ISBN:
(纸本)9783642032745
A fragmented approach to parallel programming of numerical methods and its implementation in the asynchronous programming system Aspect are considered. It provides several important advantages like automatic implementation of dynamic properties (setting up on available resources, dynamic load balancing, dynamic resource distribution, etc.) of an application program. the asynchronous parallel programming system Aspect is considered which implements a conception of fragmented programming on supercomputers with shared memory architecture.
the parallel hybrid inverse neural network coordinate approximations algorithm (PHINNCA) for solution of large-scale global optimization problems is proposed in this work. the algorithm maps a trial value of an object...
详细信息
ISBN:
(纸本)9783642032745
the parallel hybrid inverse neural network coordinate approximations algorithm (PHINNCA) for solution of large-scale global optimization problems is proposed in this work. the algorithm maps a trial value of an objective function into values of objective function arguments. It decreases a trial value step by step to find a global minimum. Dual generalized regression neural networks are used to perform the mapping. the algorithm is intended for cluster systems. A search is carried out concurrently. When there are multiple processes, they share the information about their progress and apply a simulated annealing procedure to it.
We introduce the GCA-w model (Global Cellular Automata with write access) that is an extension of the GCA (Global Cellular Automata) model, which is in turn an extension of the cellular automata (CA) model. All three ...
详细信息
ISBN:
(纸本)9783642032745
We introduce the GCA-w model (Global Cellular Automata with write access) that is an extension of the GCA (Global Cellular Automata) model, which is in turn an extension of the cellular automata (CA) model. All three models are called "massively parallel" because the models are based on cells that are updated synchronously in parallel. In the CA model, the cells have static links to their local neighbors whereas in the GCA model, the links are dynamic to any global neighbor. In both models, the access is "read-only". thereby no write conflict can occur which reduces the complexity of the model and its implementation. the GCA model can be used for many parallel problems that can be described with a changing global (or locally restricted) neighborhood. the main restriction of the GCA model is the forbidden write access to neighboring cells. Although the write access can be emulated in O(log n) time this slowdown is not desired in practical applications. therefore, the GCA-w model was developed. the GCA-w model allows to change the state of the own cell as well as the states of the neighboring cells. thereby parallel algorithms can be executed faster and the activity of the cells can be controlled in order, e.g., to reduce power consumption or to use inactive cells for other purposes. the application of the GCA-w model is demonstrated for some parallel algorithms: pointer inversion, sorting with pointers, synchronization and Pascal's triangle. In addition, a hardware architecture is outlined which can execute this model.
暂无评论