this paper presents a network-on-chip (NoC) with flexible infrastructure based on dynamic wormhole packet identitity mapping management. the NoCs are developed based on VHDL-modular approach to support the design flex...
详细信息
ISBN:
(纸本)9781424416936
this paper presents a network-on-chip (NoC) with flexible infrastructure based on dynamic wormhole packet identitity mapping management. the NoCs are developed based on VHDL-modular approach to support the design flexibility. the on-chip router switches packets using wormhole packet switching method with synchronous parallel pipeline technique. Contention-free routing algorithms and dynamic packet identity management are proposed to support wire through-share methodology and identity-slot division multiple access technique. the on-chip routers are also equipped with packet flow control and automatic injection rate control mechanism to avoid packet-loss, when the NoC is congested. Some static and adaptive routing algorithms are implemented in the NoC to observe the performance of the NoC over some selected network traffic patterns and the logic consumption (using CMOS standard-cell library). Area overheads to implement several adaptive routings over static routing algorithm are less than 9%. Our NoC guarantees in-order and lossless messageflits delivery.
We describe and evaluate a thin client solution for desktop grid computing based on virtual machine appliances whose images are fetched on-demand and on a per-block basis over wide-area networks. the approach uses a d...
详细信息
ISBN:
(纸本)9781424416936
We describe and evaluate a thin client solution for desktop grid computing based on virtual machine appliances whose images are fetched on-demand and on a per-block basis over wide-area networks. the approach uses a distributed file system redirection mechanism which enables the use of unmodified NFS clients/servers and local buffering of file system modifications during the appliance's lifetime. the file system redirection technique is achieved through user-level proxies, and can be integrated with virtual private network overlays to provide transparent access to image servers even if they are behind firewalls. We have implemented and evaluated a prototype system which allows thin client diskless appliances to boot over a proxy VM bringing on-demand only a small fraction of the appliance image (16MB out of 900MB) and showing low run-time overhead for CPU-intensive applications. the paper also presents decentralized mechanisms to support seamless image version upgrades.
this paper presents the design and implementation of a new file system independent collective I/O optimization based on file views: view-based collective I/O. View-based collective I/O has been implemented and evaluat...
详细信息
We present an overview of the current status of input/output (I/O) on the Cray XT line of supercomputers and provide guidance to application developers and users for achieving efficient I/O. Many I/O benchmark results...
详细信息
ISBN:
(纸本)9781424416936
We present an overview of the current status of input/output (I/O) on the Cray XT line of supercomputers and provide guidance to application developers and users for achieving efficient I/O. Many I/O benchmark results are presented, based on projected I/O requirements for some widely used scientific applications in the Department of Energy. Finally, we interpret and summarize these benchmark results to provide forward-looking guidance for I/O in large-scale application runs on a Cray XT3/XT4.
With increasing demand for low power high performance computing, reducing power of not only CPUs but also memory is becoming important. In typical general-purpose HPC environments, DRAM is installed in an over-provisi...
详细信息
ISBN:
(纸本)9781424416936
With increasing demand for low power high performance computing, reducing power of not only CPUs but also memory is becoming important. In typical general-purpose HPC environments, DRAM is installed in an over-provisioned fashion to avoid swapping, although in most cases not all such memory is used, leading to unnecessary and excessive power consumption, even in a standby state. We propose a next generation low power memory system that reduces required DRAM capacity while minimizing application performance degradation. In this system, both DRAM and MRAM, fast non-volatile memory, are used as main memory, while flash memory is used as a swap device. Our profile-based paging algorithm optimizes memory accesses by using faster memory as much as possible, reducing accesses to slower memory. Simulated results of our architecture show that the overall energy consumption of the memory system can be reduced to 25% by in the best case by reducing DRAM capacity, with only 17% performance loss in application benchmarks.
Pipelined SRAM-based algorithmic solutions have become competitive alternatives to TCAMs (ternary content addressable memories)for high throughput IP lookup. Multiple pipelines can be utilized in parallel to improve t...
详细信息
ISBN:
(纸本)9781424416936
Pipelined SRAM-based algorithmic solutions have become competitive alternatives to TCAMs (ternary content addressable memories)for high throughput IP lookup. Multiple pipelines can be utilized in parallel to improve the throughput further However several challenges must be addressed to make such solutions feasible. First, the memory distribution over different pipelines as well as across different stages of each pipeline must be balanced. Second, the traffic among these pipelines should be balanced. third, the intra-flow packet order should be preserved. In this paper we propose a parallel SRAM-based multi-pipeline architecture for IP lookup. A two-level mapping scheme is developed to balance the memory requirement among the pipelines as well as across the stages in a pipeline. To balance the traffic, we propose a flow pre-caching scheme to exploit the inherent caching in the architecture. Our technique uses neither a large reorder buffer nor complex reorder logic. Instead, a payload exchange scheme exploiting the pipeline delay is used to maintain the intra-flow packet order Extensive simulation using real-life traffic traces shows that the proposed architecture with8 pipelines can achieve a throughput of up to 10 billion packets per second (GPPS) while preserving intra-flow packet order.
Real-time monitoring is increasingly becoming important in various scenes of large scale, multi-site distributed/parallel computing, e.g, understanding behavior of systems, scheduling resources, and debugging applicat...
详细信息
ISBN:
(纸本)9781424442379
Real-time monitoring is increasingly becoming important in various scenes of large scale, multi-site distributed/parallel computing, e.g, understanding behavior of systems, scheduling resources, and debugging applications. Dedicated networks on inter-site communications are rarely available for the monitoring purposes. therefore, for real-time monitoring systems, reducing communication cost is important to handle a large number of nodes with limited network resources. We implemented a real-time Grid monitoring system called VGXP with techniques for low cost data gathering. It tries to send only diffs to recent data, and adapts to the requested data freshness and tolerable errors to minimize required communication. We evaluate monitoring overheads of the proposed method on a distributed environment consisting of 8-sites with 500 nodes. In a realistic setting where the sampling interval is set to 0.5 seconds and the tolerable error to 2%, the CPU usage of the server to gather data from all nodes was 0.2% and the transfer rate was less than 5kbps. the transfer rate did not exceed 50kbps even if we gather a detailed per-process statistics.
One of the key challenges in Wireless Sensor Networks (WSNs) is that of extending the lifetime of the network while meeting some coverage requirements. In this paper we present a distributed algorithmic framework to e...
详细信息
ISBN:
(纸本)9781424416936
One of the key challenges in Wireless Sensor Networks (WSNs) is that of extending the lifetime of the network while meeting some coverage requirements. In this paper we present a distributed algorithmic framework to enable sensors to determine their sleep-sense cycles based on specific coverage goals. the framework is based on our earlier work on the target coverage problem. We give a general version of the framework that can be used to solve network/graph problems for which melding compatible neighboring local solutions directly yields globally feasible solutions. We also apply this framework to several variations of the coverage problem, namely, target coverage, area coverage and k-coverage problems, to demonstrate its general applicability. Each sensor constructs minimal cover sets for its local coverage objective. the framework entails each sensor prioritizing these local cover sets and then negotiating with its neighbors for satisfying mutual constraints. We introduce a dependency graph model that can capture the interdependencies among the cover sets. Detailed simulations are carried out to further demonstrate the resulting performance improvements and effectiveness of the framework.
When an adaptive software component is employed to select the best-performing implementation for a communication operation at runtime, the correctness of the decision taken strongly depends on detecting and removing o...
详细信息
ISBN:
(纸本)9781424416936
When an adaptive software component is employed to select the best-performing implementation for a communication operation at runtime, the correctness of the decision taken strongly depends on detecting and removing outliers in the data used for the comparison. this automatic decision is greatly complicated by the fact that the types and quantities of outliers depend on the network interconnect and the nodes assigned to the job by the batch scheduler. this paper evaluates four different statistical methods used for handling outliers, namely a standard interquartile range method, a heuristic derived from the trimmed mean value, cluster analysis and a method using robust statistics. Using performance data from the Abstract Data and Communication Library (ADCL) we evaluate the correctness of the decisions made with each statistical approach over three fundamentally different network interconnects, namely a highly reliable InfiniBand network, a Gigabit Ethernet network having a larger variance in the performance, and a hierarchical Gigabit Ethernet network.
the knapsack problem is a typical one of NPC problems, which is easy to be described but difficult to be solved. It is very important in theory and practice to study it. Nowadays there is a variety of research in algo...
详细信息
ISBN:
(纸本)9780769533827
the knapsack problem is a typical one of NPC problems, which is easy to be described but difficult to be solved. It is very important in theory and practice to study it. Nowadays there is a variety of research in algorithm for solving it. As the parallelprocessing technologies develop, the research of effective parallel algorithms for this problem attracts much attention. To run those algorithms needs high level hardware and high performance parallel computers. Using mobile agent technologies, a more effective model to solve the complex distributed problems can be established. Combines mobile agent withthe traditional parallel algorithm, the process in a parallel computer can be evolved into the one performed by some ordinary computers. this can avoid the limitation of experiment conditions and provides convenience in practice. In this paper, a distributed algorithm is proposed for the 0-1 knapsack problem based on the mobile agent, and it is feasible and effective in theoretical analysis.
暂无评论