parallelprocessing is recognized as a practical way to achieve high performance in logic simulation. Instead of using high cost parallel computers or special purpose hardware simulation engines, we explore the implem...
详细信息
parallelprocessing is recognized as a practical way to achieve high performance in logic simulation. Instead of using high cost parallel computers or special purpose hardware simulation engines, we explore the implementation of parallel logic simulation on an existing network of workstations using parallel Virtual Machine (PVM). We carry out a novel parallel implementation of an output event-driven logic simulation algorithm such that a global control processor or workstation is not needed to synchronize the advancement of simulation time to the next time step. Further advantages of our new approach include a random partitioning of the circuit on to available workstations and a pipelined execution of the different phases of the simulation algorithm. To achieve a better load balance we employ a semi-optimistic scheme for gate evaluations such that no rollback is required. the performance of our implementation has been evaluated in real time using the ISCAS combinational and sequential benchmark circuits. Speedups obtained improve withthe size of the circuit and the activity level in the circuit. Analyses of the communication overhead shows that the techniques developed here will yield even higher gains as newer networking technologies such as fast and switched Ethernet or ATM are employed to connect workstations.
Energy consumption is a critical issue in parallel and distributed embedded systems. We present a novel algorithm for energy efficient scheduling of Directed Acyclic Graph (DAG) based applications on Dynamic Voltage S...
详细信息
ISBN:
(纸本)9781424416936
Energy consumption is a critical issue in parallel and distributed embedded systems. We present a novel algorithm for energy efficient scheduling of Directed Acyclic Graph (DAG) based applications on Dynamic Voltage Scaling (DVS) enabled systems. Experimental results show that our algorithm provides near optimal solutions for energy minimization with considerably smaller computational time and memory requirements compared to an existing algorithm that provides near optimal solutions.
Synchronization techniques are proposed for algorithms which spawn processes remotely on loosely coupled processors based on run-time characteristics. the performance of the proposed synchronization schemes are measur...
详细信息
ISBN:
(纸本)0818656026
Synchronization techniques are proposed for algorithms which spawn processes remotely on loosely coupled processors based on run-time characteristics. the performance of the proposed synchronization schemes are measured on the iPSC/2 and SNAP-1 multiprocessors and their implementation cost is discussed. Results show that processes created dynamically throughout a distributed system can be synchronized at comparable overhead and cost to that required for fixed location process creation.
Testing distributed applications over the Internet is fraught with problems: due to the inability to control a wide area network consistent, reproduceable performance experiments are not possible. Here a system is des...
详细信息
ISBN:
(纸本)076950728X
Testing distributed applications over the Internet is fraught with problems: due to the inability to control a wide area network consistent, reproduceable performance experiments are not possible. Here a system is described that uses a parallel discrete evens simulator that can act as a real-time network emulator Real Internet Protocol (IP) traffic generated by application programs running on user workstations carl interact with modelled traffic in the emulator thus providing a controlled test environment for distributed applications, parallel execution enables the emulator to simulate lar ge virtual networks and to model traffic interactions that could nor be done in real-time sequentially: this paper gives an overview of the emulator and explores the various external data routing methods that the emulator supports. these routing methods allow the emulator to be operated in shared environments with certain constraints, as well as in dedicated test environments. Preliminary performance results are included.
In this paper we propose an efficient parallel implementation of Edmonds' algorithm for finding optimum branchings on a model of the SIMD type with vertical data processing (the STAR-machine). To this end for a di...
详细信息
ISBN:
(纸本)0769511538
In this paper we propose an efficient parallel implementation of Edmonds' algorithm for finding optimum branchings on a model of the SIMD type with vertical data processing (the STAR-machine). To this end for a directed graph given as a list of triples (edge vertices and the weight), we construct a new associative version of Edmonds' algorithm. this version is represented as the corresponding STAR procedure whose correctness is proved. We obtain that on vertical processing systems Edmonds' algorithm takes O(n log n) time, where n is the number of graph vertices.
Given the fact that the traditional GPU mainly supports the parallel computing with SIMD (Single Instruction Multiple Data) and SIMT (Single Instruction Multiple thread) mode, the Firefly2 GPU (Graphic processing Unit...
详细信息
ISBN:
(纸本)9781467395878
Given the fact that the traditional GPU mainly supports the parallel computing with SIMD (Single Instruction Multiple Data) and SIMT (Single Instruction Multiple thread) mode, the Firefly2 GPU (Graphic processing Unit) has special hardware configuration mechanism and can be used for paralleling computing on data-level, thread-level and operatedlevel. this paper presents parallel implementation of OpenVX kernels on Firefly2 GPU withthe method by combing the operation level parallelism with data level parallelism. Experimental results indicate satisfactory speedup of the parallel implementation and show that the Firefly2 is suitable for graphics and image processing.
this paper presents a method for the pH control of the residual water from a metallurgical factory. the blunting process is decomposed in two sub-processes connected in parallel, both viewed as distributed parameters ...
详细信息
ISBN:
(纸本)9781467364003;9781467363976
this paper presents a method for the pH control of the residual water from a metallurgical factory. the blunting process is decomposed in two sub-processes connected in parallel, both viewed as distributed parameters processes and modeled through equations with partial derivatives. the mathematical model of the two sub-processes considers the delays introduced by the liquids circulation in the plant. the modeling-simulation procedure is an application of the matrix of partial derivatives of the state vector (M-pdx) method associated with Taylor series.
Trace-oriented runtime monitoring is a very effective method to improve the reliability of distributed systems. However, for medium-scale distributed systems, existing trace-oriented monitoring frameworks are either n...
详细信息
ISBN:
(纸本)9781479936168
Trace-oriented runtime monitoring is a very effective method to improve the reliability of distributed systems. However, for medium-scale distributed systems, existing trace-oriented monitoring frameworks are either not powerful or efficient enough, or too complex and expensive to deploy and maintain. In this paper, we present MTracer1, which is a lightweight trace-oriented monitoring system for medium-scale distributed systems. We have proposed and implemented several optimizations to improve the efficiency of the monitor server in MTracer. A web-based frontend is also provided to visualize a monitored system from different perspectives. We have validated MTracer in a real medium-scale environment. the results indicate that MTracer has a very lower overhead, and can handle more than 4000 events per second.
SRAM (static random access memory)-based pipelined algorithmic solutions have become competitive alternatives to TCAMs (ternary content addressable memories) for high-throughput IP lookup. Multiple pipelines can be ut...
详细信息
SRAM (static random access memory)-based pipelined algorithmic solutions have become competitive alternatives to TCAMs (ternary content addressable memories) for high-throughput IP lookup. Multiple pipelines can be utilized in parallel to improve the throughput further. However, several challenges must be addressed to make such solutions feasible. First, the memory distribution over different pipelines, as well as across different stages of each pipeline, must be balanced. Second, the traffic among these pipelines should be balanced. third, the intra-flow packet order (i.e. the sequence) must be preserved. In this paper, we propose a parallel SRAM-based multi-pipeline architecture for IP lookup. A two-level mapping scheme is developed to balance the memory requirement among the pipelines as well as across the stages in each pipeline. To balance the traffic, we propose an early caching scheme to exploit the data locality inherent in the architecture. Our technique uses neither a large reorder buffer nor complex reorder logic. Instead, a flow-aware queuing scheme exploiting the flow information is used to maintain the intra-flow sequence. Extensive simulation using real-life traffic traces shows that the proposed architecture with8 pipelines can achieve a throughput of up to 10 billion packets per second, i.e. 3.2 Tbps for minimum size (40 bytes) packets, while preserving intra-flow packet order. (c) 2009 Elsevier Inc. All rights reserved.
Techniques for scheduling parallel I/O for both uniprogrammed systems that run single jobs in isolation and multiprogrammed environments that execute multiple parallel jobs simultaneously ate presented. the performanc...
详细信息
ISBN:
(纸本)0769511538
Techniques for scheduling parallel I/O for both uniprogrammed systems that run single jobs in isolation and multiprogrammed environments that execute multiple parallel jobs simultaneously ate presented. the performance of the scheduling algorithms is evaluated on a network of workstations. A new scheduling algorithm proposed in this paper is observed to perform very well for systems running single jobs in isolation. the algorithms that use knowledge of job characteristics are observed to produce a superior performance in multiprogrammed parallel environments.
暂无评论