In this paper we consider load balancing in a static and discrete setting where a fixed number of indivisible tasks have to be allocated to processors. We assume uniform tasks but the processors may have different spe...
详细信息
ISBN:
(纸本)9780769546759
In this paper we consider load balancing in a static and discrete setting where a fixed number of indivisible tasks have to be allocated to processors. We assume uniform tasks but the processors may have different speeds. the load of a processor is the number of tasks assigned to it divided by its speed. We consider diffusion load balancing which works in rounds. In every round the processors are allowed to compare their own load withthe load of their neighbors and to balance the load withthe neighbors, using their local information only. the question is how many rounds does it take until the whole processor network is balanced, meaning the load discrepancy (difference between maximum load and m/n) is minimized. Our balancing algorithm is deterministic and extends the algorithm studied in [1] from the case of uniform speeds to non-uniform speeds. We use a potential function argument to show that a better load balance can be obtained when the algorithm is allowed to run longer compared to the algorithm of [1].
the DCABES is a community working in the area of distributed Computing and Applications in Business, Engineering, and Sciences, and is responsible for organizing meetings and symposia related to the field. DCABES inte...
详细信息
Simultaneous multithreading (SMT) increases CPU utilization and application performance in many circumstances, but it can be detrimental when performance is limited by application scalability or when there is signific...
详细信息
ISBN:
(纸本)9780769546759
Simultaneous multithreading (SMT) increases CPU utilization and application performance in many circumstances, but it can be detrimental when performance is limited by application scalability or when there is significant contention for CPU resources. this paper describes an SMT-selection metric that predicts the change in application performance when the SMT level and number of application threads are varied. this metric is obtained online through hardware performance counters with little overhead, and allows the application or operating system to dynamically choose the best SMT level. We have validated the SMT-selection metric using a variety of benchmarks that capture various application characteristics on two different processor architectures. Our results show that the SMT-selection metric is capable of predicting the best SMT level for a given workload in 90% of the cases. the paper also shows that such a metric can be used with a scheduler or application optimizer to help guide its optimization decisions.
Today's high-end computing systems are facing a crisis of high failure rates due to increased numbers of components. Recent studies have shown that traditional fault tolerant techniques incur overheads that more t...
详细信息
ISBN:
(纸本)9780769546766
Today's high-end computing systems are facing a crisis of high failure rates due to increased numbers of components. Recent studies have shown that traditional fault tolerant techniques incur overheads that more than double execution times on these highly parallel machines. thus, future high-end computing must be able to provide adequate fault tolerance at an acceptable cost or the burdens of fault management will severely affect the viability of such systems. Cluster virtualization offers a potentially unique solution for fault management, but brings significant overhead, especially for I/O. In this paper, we propose a novel diskless checkpointing technique on clusters of virtual machines. Our technique splits Virtual Machines into sets of orthogonal RAID systems and distributes parity evenly across the cluster, similar to a RAID-5 configuration, but using VM images as data elements. Our theoretical analysis shows that our technique significantly reduces the overhead associated with checkpointing by removing the disk I/O bottleneck.
this article mainly studied in azimuth information reconstruction of distributed surface wave Over-the-Horizon radar. We deduce a new azimuth information reconstruction approach about distributed surface wave Over-the...
详细信息
In this paper, we present a technology mapping and clustering tool for leakage power reduction in FPGAs with programmable, dual-V-T logic blocks. the use of Reverse Back Bias (RBB) circuit techniques is recognized as ...
详细信息
ISBN:
(纸本)9780769546766
In this paper, we present a technology mapping and clustering tool for leakage power reduction in FPGAs with programmable, dual-V-T logic blocks. the use of Reverse Back Bias (RBB) circuit techniques is recognized as one of the more promising strategies in mitigating leakage power, a critical problem in circuits deploying deep submicron process technologies. FPGAs withthe ability to adjust fabric V-T through RBB offer the potential of reducing leakage power with minimal or no sacrifice to circuit speed. Today, Altera's Stratix line of FPGAs deploy a similar strategy, but with optimizations limited to the post-P&R stage. We present a novel two-stage technology mapping (RBBMap) and logic block packing (RBBPack) tool that is free from clustering constraints limiting the post-P&R method, and moves RBB optimizations upwards to the technology mapping level. Using the baseline technology mapping tool Emap, our tools generate an average of 70.95% savings in logic block leakage power and 28.30% savings in total energy consumption.
Modern multi-core and many-core systems offer a very impressive cost/performance ratio. In this paper a set of new parallel implementations for the solution of linear systems with block-tridiagonal coefficient matrix ...
详细信息
Real-valued black-box optimization of badly behaved and not well understood functions is a wide topic in many scientific areas. Possible applications range from maximizing portfolio profits in financial mathematics ov...
详细信息
parallel programming is widely considered very demanding for an average programmer due to inherent asynchrony of underlying parallel architectures. In this paper we describe the main design principles and core feature...
详细信息
the proceedings contain 112 papers. the topics discussed include: power-aware replica placement and update strategies in tree networks;minimum cost resource allocation for meeting job requirements;power and performanc...
ISBN:
(纸本)9780769543857
the proceedings contain 112 papers. the topics discussed include: power-aware replica placement and update strategies in tree networks;minimum cost resource allocation for meeting job requirements;power and performance management in priority-type cluster computing systems;communication-avoiding QR decomposition for GPU;overlapping computation and communication for advection on hybrid parallel computers;VisIO: enabling interactive visualization of ultra-scale, time series data via high-bandwidthdistributed I/O systems;a novel power management for CMP systems in data-intensive environment;characterization of system services and their performance impact in multi-core nodes;automatic recognition of performance idioms in scientific applications;exploiting data similarity to reduce memory footprints;the evaluation of an effective out-of-core run-time system in the context of parallel mesh generation;and a lightweight method for automated design of convergence.
暂无评论