SRAM (static random access memory)-based pipelined algorithmic solutions have become competitive alternatives to TCAMs (ternary content addressable memories) for high-throughput IP lookup. Multiple pipelines can be ut...
详细信息
SRAM (static random access memory)-based pipelined algorithmic solutions have become competitive alternatives to TCAMs (ternary content addressable memories) for high-throughput IP lookup. Multiple pipelines can be utilized in parallel to improve the throughput further. However, several challenges must be addressed to make such solutions feasible. First, the memory distribution over different pipelines, as well as across different stages of each pipeline, must be balanced. second, the traffic among these pipelines should be balanced. Third, the intra-flow packet order (i.e. the sequence) must be preserved. In this paper, we propose a parallel SRAM-based multi-pipeline architecture for IP lookup. A two-level mapping scheme is developed to balance the memory requirement among the pipelines as well as across the stages in each pipeline. To balance the traffic, we propose an early caching scheme to exploit the data locality inherent in the architecture. Our technique uses neither a large reorder buffer nor complex reorder logic. Instead, a flow-aware queuing scheme exploiting the flow information is used to maintain the intra-flow sequence. Extensive simulation using real-life traffic traces shows that the proposed architecture with 8 pipelines can achieve a throughput of up to 10 billion packets per second, i.e. 3.2 Tbps for minimum size (40 bytes) packets, while preserving intra-flow packet order. (c) 2009 Elsevier Inc. All rights reserved.
Data-intensive applications are increasingly designed to execute on large computing clusters. Grouped aggregation is a core primitive of many distributed programming models, and it is often the most efficient availabl...
详细信息
ISBN:
(纸本)9781605587523
Data-intensive applications are increasingly designed to execute on large computing clusters. Grouped aggregation is a core primitive of many distributed programming models, and it is often the most efficient available mechanism for computations such as matrix multiplication and graph traversal. Such algorithms typically require non-standard aggregations that are more sophisticated than traditional built-in database functions such as Sum and Max. As a result, the ease of programming user-defined aggregations, and the efficiency of their implementation, is of great current interest. This paper evaluates the interfaces and implementations for user-defined aggregation in several state of the art distributed computing systems: Hadoop, databases such as Oracle parallel Server, and DryadLINQ. We show that: the degree of language integration between user-defined functions and the high-level query language has an impact on code legibility and simplicity;the choice of programming interface has a material effect on the performance of computations;some execution plans perform better than others on average;and that in order to get good performance on a variety of workloads a system must be able to select between execution plans depending on the computation. The interface and execution plan described in the Map Reduce paper, and implemented by Hadoop, are found to be among the worst-performing choices.
Data base clusters based on share-nothing replication techniques are currently widely accepted as a practical solution to scalability and availability of the data tier. A key issue when planning such systems is the ab...
详细信息
ISBN:
(纸本)9780769538266
Data base clusters based on share-nothing replication techniques are currently widely accepted as a practical solution to scalability and availability of the data tier. A key issue when planning such systems is the ability to meet service level agreements when load spikes occur or cluster nodes fail. This translates into the ability to provision and deploy additional nodes. Many current research efforts focus on designing autonomic controllers to perform such reconfiguration, tuned to quickly react to system changes and spawn new replicas based on resource usage and performance measurements. In contrast, we are concerned about the inherent impact of deploying an additional node to an online cluster, considering both the time required to finish such an action as well as the impact on resource usage and performance of the cluster as a whole. If noticeable, such impact hinders the practicability of self-management techniques, since it adds an additional dimension that has to he accounted for. Our approach is to systematically benchmark a number of different reconfiguration scenarios to assess the cost of bringing a new replica online. We consider factors such as: workload characteristics, incremental and parallel recovery, flow control and outdatedness of the recovering replica. As a result, we show that research should be refocused from optimizing the capture and transmition of changes to applying them, which in a realistic setting dominates the cost of the recovery operation.
This paper deals with the problem of causal model-based diagnosis of distributedsystems. The setting we consider is a collection of interacting behavioral Petri nets (BPNs). Each BPN model represents the causal behav...
详细信息
ISBN:
(纸本)9780769536804
This paper deals with the problem of causal model-based diagnosis of distributedsystems. The setting we consider is a collection of interacting behavioral Petri nets (BPNs). Each BPN model represents the causal behavioral model of one subsystem and its interactions with neighboring subsystems. Interactions among subsystems are modeled by tokens that pass from one model to another via common places. Diagnosis reasoning scheme exploits, in a first step a backward reachability analysis on each net model to obtain local diagnoses;and in a second step, it exploits a forward reachability analysis for ensuring that local diagnoses are consistent and form global ones.
The proceedings contain 54 papers. The topics discussed include: an experimental study of diversity with off-the-shelf antivirus engines;simulating fixed virtual nodes for adapting wireline protocols to MANET;seed sch...
ISBN:
(纸本)9780769536989
The proceedings contain 54 papers. The topics discussed include: an experimental study of diversity with off-the-shelf antivirus engines;simulating fixed virtual nodes for adapting wireline protocols to MANET;seed scheduling for peer-to-peer networks;proximity-aware distributed mutual exclusion for effective peer-to-peer replica management;towards improved overlay simulation using realistic topologies;analysis of round-robin implementations of processor sharing, including overhead;comparison of price-based static and dynamic job allocation schemes for grid computing systems;a rule based co-operative approach for cell selection in high speed cellular networks;sharing private information across distributeddatabases;energy-aware prefetching for parallel disk systems;attribute-based prevention of phishing attacks;TTM based security enhancement for inter-domain routing protocol;and introducing probability in RFID reader-to-reader anti-collision.
We introduce a low-level performance modelling formalism, Shared Transaction Markov Chains (STMCs), specifically designed for the capture and analysis of massively parallel stochastic systems through fluid techniques....
详细信息
ISBN:
(纸本)9781424449262
We introduce a low-level performance modelling formalism, Shared Transaction Markov Chains (STMCs), specifically designed for the capture and analysis of massively parallel stochastic systems through fluid techniques. We introduce the notion of a shared transaction between concurrently running Markov chains which allows a multi-phase synchronisation to accurately represent complex cooperation between modelling components in a compositional manner. We demonstrate the new modelling formalism on four distinct models and show how fluid analysis may be performed, with results, where appropriate. Our contribution is that this is the first such system tailored to the fluid performance analysis of transaction-based systems as found in computing applications such as peer-to-peer networks, web architectures and Publish-Subscribe networks. The second contribution is that STMCs permit composed phase-type distributed synchronisation which is more useful from a transaction modelling perspective.
In any Active Geographical Information System the reliability of any results of queries, analysis or reasoning depends on data quality (up-to-date data with properties of positional accuracy, consistency and so on). A...
详细信息
ISBN:
(纸本)9780769534718
In any Active Geographical Information System the reliability of any results of queries, analysis or reasoning depends on data quality (up-to-date data with properties of positional accuracy, consistency and so on). A system is built for violation detection under Topo-Semantic consistency with specific checking and correcting processes using a SSRO-Tree structure for spatial data access. Results indicate that the developed Constraint Violation Detection (CVD) system is powerful when compared with popular conventional systems. Three kinds of errors are identified which lead to three kinds of consistency, namely structural consisteny, geometric consistency and Topo-Semantic consistency.
In this paper, we study the impact of multi processor memory systems in particular, the distributed memory (I)M) and virtual shared memory (VSM), on the implementation of parallel backpropagation neural network algori...
详细信息
ISBN:
(纸本)9781424442379
In this paper, we study the impact of multi processor memory systems in particular, the distributed memory (I)M) and virtual shared memory (VSM), on the implementation of parallel backpropagation neural network algorithms. In the first instance, neural network is partitioned into sub neural networks by applying a hybrid partitioning scheme. In the second, each partitioned network is evaluated with matrix multiplication. Three different sizes of neural networks are used and exchange rate prediction used as a reference problem. parallel implementations for each of the distributed memory and virtual shared memory scenarios is obtained. These algorithms are implemented on a high performance cluster, "Monolith" consisting of over 396 nodes. Programming is realized using Message Passing Interface (MPI) library and C-Linda. The partitioned, matrix multiplication has the fastest execution time, and DM/MPI implementation is always faster than the VSM/Linda equivalent. However in VSM/Linda it is possible to allow the parallel neural network to choose the optimum number of processors dynamically.
In this paper we present and compare two algorithms to transform and optimize the Finite Difference Time Domain (FDTD) method computations in irregular computational areas for multiprocessor systems. Both algorithms a...
详细信息
ISBN:
(纸本)9780769534725
In this paper we present and compare two algorithms to transform and optimize the Finite Difference Time Domain (FDTD) method computations in irregular computational areas for multiprocessor systems. Both algorithms art, based oil a genetic approach with different design of chromosomes, In the first case, a chromosome represents a complete merging algorithm as a sequence of merging functions that are used to reduce a number of macro nodes. In the second case, a chromosome corresponds to an assignment of each macro node to specified processor. We have tested a mixed fitness function, which can switch between two sub functions: cut min value and difference between the maximally and minimally loaded macro nodes in an individual. As our experiments have shown, it allows to adjust the behavior of a genetic algorithm to the requirements of the computational system. We have tested both algorithms for multiprocessor systems with shared and distributed memory, with MPI and RDMA communication.
The complexity and the variety of the deployed time-dependent systems, as well as the high degree of reliability required for their global functioning, justify the care provided to the design of the best possible test...
详细信息
ISBN:
(纸本)9780769534251
The complexity and the variety of the deployed time-dependent systems, as well as the high degree of reliability required for their global functioning, justify the care provided to the design of the best possible tests. Moreover;it is significant to automate these steps with an aim of reducing the time and the development cost and especially of increasing the reliability of the offered products. In this paper we present two different tools to test systems with time constraints. The first one allows to automatically generate test cases based on model-based active testing techniques. Whereas the second tool is based on passive testing approach to check that the collected system traces respect a silt of formal properties called Invariants.
暂无评论