Grid applications often need to distribute large amounts of data efficiently from one cluster to multiple others (multicast). Existing methods usually arrange nodes in optimized tree structures, based on external netw...
详细信息
ISBN:
(纸本)1595936734
Grid applications often need to distribute large amounts of data efficiently from one cluster to multiple others (multicast). Existing methods usually arrange nodes in optimized tree structures, based on external network monitoring data. this dependence on monitoring data, however, severely impacts both ease of deployment and adaptivity to dynamically changing network conditions. In this paper, we present Multicast Optimizing Bandwidth (MOB), a high-throughput multicast approach, inspired by the BitTorrent protocol. With MOB, data transfers are initiated by the receivers that try to steal data from peer clusters. Instead of using potentially outdated monitoring data, MOB automatically adapts to the currently achievable bandwidth ratios. Our experimental evaluation compares MOB to boththe BitTorrent protocol and to our previous approach, Balanced Multicasting, the latter optimizing multicast trees based on external monitoring data. We show that MOB outperforms the BitTorrent protocol. MOB is competitive with Balanced Multicasting as long as the network bandwidth remains stable. With dynamically changing bandwith, MOB outperforms Balanced Multicasting by wide margins. Copyright 2007 ACM.
the proceedings contain 46 papers. the topics discussed include: performance analysis withhigh-level languages for high-performance reconfigurable computing;a SRAM-based architecture for Trie-based IP lockup using FP...
ISBN:
(纸本)9780769533070
the proceedings contain 46 papers. the topics discussed include: performance analysis withhigh-level languages for high-performance reconfigurable computing;a SRAM-based architecture for Trie-based IP lockup using FPGA;a scalable highthroughput firewall in FPGA;the effectiveness of configuration merging in point-to-point networks for module-based FPGA reconfiguration;autonomous system on a chip adaptation through partial runtime reconfiguration;FPGA-based co-processor for singular value array reconciliation tomography;real-time optical flow calculations on FPGA and GPU architectures: a comparison study;multiobjective optimization of FPGA-based medical image registration;sparse matrix-vector multiplication on a reconfigurable supercomputer;fast multivariate signature generation in hardware: the case of rainbow;and runtime filesystem support for reconfigurable FPGA hardware process in BORPH.
Simulation is the most important tool for computer architects to evaluate the performance of new computer designs. However, detailed simulation is extremely time consuming. Sampling is one of the techniques that effec...
详细信息
this paper presents a new model for evaluation of the positive and negative impacts related to the process migration on environments composed by heterogeneous capacity computers. On this model, a busy computer analyze...
详细信息
Utility functions can be used to represent the value users attach to job completion as a function of turnaround time. Most previous scheduling research used simple synthetic representations of utility, withthe simpli...
详细信息
ISBN:
(纸本)1595936734
Utility functions can be used to represent the value users attach to job completion as a function of turnaround time. Most previous scheduling research used simple synthetic representations of utility, withthe simplicity being due to the fact that real user preferences are difficult to obtain, and perhaps concern that arbitrarily complex utility functions could in turn make the scheduling problem intractable. In this work, we advocate a flexible representation of utility functions that can indeed be arbitrarily complex. We show that a genetic algorithm heuristic can improve global utility by analyzing these functions, and does so tractably. Since our previous work showed that users indeed have and can articulate complicated utility functions, the result here is relevant. We then provide a means to augment existing workload traces with realistic utility functions for the purpose of enabling realistic scheduling simulations. Copyright 2007 ACM.
the gap between memory and processor speeds is responsible for the substantial amount of idle time of current processors. To reduce the impact provoked by the so-called " memory gap problem," many software t...
详细信息
ISBN:
(纸本)0769517722
the gap between memory and processor speeds is responsible for the substantial amount of idle time of current processors. To reduce the impact provoked by the so-called " memory gap problem," many software techniques (e.g., the code layout reorganization) together with hardware mechanisms (cache memory, translation look-aside buffer, branch prediction, speculative execution, trace cache, instruction reuse, and so on) have been successfully implemented. In this paper we present some experiments that explain why these mechanisms and techniques are so efficient. We found that only a small fraction of the object code is actually executed: our experiments disclosed that more than 50% of the instructions remain untouched during the whole execution, and the percentages of basic blocks which remain unused are slightly greater In addition to the usage of instructions and blocks, the paper provides further insights regarding the behavior of application programs, and gives some suggestions for extra performance gains.
Information services are an integral part of the grid architecture. It is the foundation of how resources are defined and their state known. More importantly, the user of the Grid gets a perspective of what a grid loo...
详细信息
ISBN:
(纸本)0769516866
Information services are an integral part of the grid architecture. It is the foundation of how resources are defined and their state known. More importantly, the user of the Grid gets a perspective of what a grid looks like, how it performs and what capabilities it has from information services. the Accelerated Strategic computing Initiative(1) (ASCI) has designed and deployed a set of grid services within the context of the ASCI program. We deploy information services by augmenting the Globus toolkit in order to meet the unique aspects of the ASCI grid. In this paper we describe the decisions made and processes developed to run a grid information service in the ASCI grid.
Skewed-associativity is a technique that reduces the miss ratios of CPU caches by applying different indexing functions to each way of an associative cache. Even though it showed impressive hit/miss statistics, the sc...
详细信息
ISBN:
(纸本)0769517722
Skewed-associativity is a technique that reduces the miss ratios of CPU caches by applying different indexing functions to each way of an associative cache. Even though it showed impressive hit/miss statistics, the scheme has not been welcomed by the industry, presumably because implementation of the original version is complex and might involve access-time penalties among other costs. this work presents a simplified, easy to implement variant that we call 11 niininialli,-skewed-associativity (MSkA). We show that MRA caches, for many cases, should not have penalties in either access time or power consumption when compared to set-associative caches of the same associativity. Hit/miss statistics were obtained by means of trace-driven simulations. Miss ratios are not as good as those for full skewing, but they are still advantageous. Minimal-skewing is thus proposed as a way to improve the hit/miss performance of caches, often without producing access-time delays or increases in power consumption as other techniques do (for example, using higher associativities).
Large-scale data centers run latency-critical jobs with quality-of-service (QoS) requirements, and throughput-oriented background :jobs, which need to achieve highperformance. Previous works have proposed methods whi...
详细信息
ISBN:
(纸本)9781728161495
Large-scale data centers run latency-critical jobs with quality-of-service (QoS) requirements, and throughput-oriented background :jobs, which need to achieve highperformance. Previous works have proposed methods which cannot co-locate multiple latency-critical jobs with multiple backgrounds jobs while: (I) meeting the QoS requirements of all latency-critical jobs, and (2) maximizing the performance of the background jobs. this paper proposes CLITE, a Bayesian Optimization-based, multi-resource partitioning technique which achieves these goals.
the next generation zSeries floating-point unit is unveiled which is the first IBM mainframe with a fused multiply-add dataflow. It supports both S/390 hexadecimal floating-point architecture and the IEEE 754 binary f...
详细信息
ISBN:
(纸本)076951894X
the next generation zSeries floating-point unit is unveiled which is the first IBM mainframe with a fused multiply-add dataflow. It supports both S/390 hexadecimal floating-point architecture and the IEEE 754 binary floating-point architecture which was first implemented in S/390 on the 1998 S/390 G5 floating-point unit. the new floating-point unit supports a total of 6 formats including single, double, and quadword formats implemented in hardware. the floating-point pipeline is 5 cycles with a throughput of 1 multiply-add per cycle. Both hexadecimal and binary floating-point instructions are capable of this performance due to a novel way of handling both formats. Other key developments include new methods for handling denormalized numbers and quad precision divide engine dataflow. this divider uses a radix-4 SRT algorithm and is able to handle quad precision divides in multiple floating-point and fixed-point formats. the number of iterations for fixed-point divisions depend on the effective number of quotient bits. It uses a reduced carry-save form for the partial remainder, with only 1 carry bit for every 4 sum bits, to save area and power.
暂无评论