the growth in data-intensive scientific applications poses strong demands on the HPC storage subsystem, as data needs to be copied from compute nodes to I/O nodes and vice versa for jobs to run. the emerging trend of ...
详细信息
ISBN:
(纸本)9781538677698
the growth in data-intensive scientific applications poses strong demands on the HPC storage subsystem, as data needs to be copied from compute nodes to I/O nodes and vice versa for jobs to run. the emerging trend of adding denser, NVM-based burst buffers to compute nodes, however, offers the possibility of using these resources to build temporary filesystems with specific I/O optimizations for a batch job. In this work, we present echofs, a temporary filesystem that coordinates withthe job scheduler to preload a job's input files into node-local burst buffers. We present the results measured with NVM emulation, and different FS backends with DAX/FUSE on a local node, to show the benefits of our proposal and such coordination.
We claim that network services can be transparently added to existing unmodified applications running inside virtual machine environments. Examples of these network services include protocol transformations (e.g. TCP ...
详细信息
ISBN:
(纸本)1595936734
We claim that network services can be transparently added to existing unmodified applications running inside virtual machine environments. Examples of these network services include protocol transformations (e.g. TCP to UDT), network connection persistence during long duration unavailability (e.g. wide area VM migration), and network flow modification (e.g. local acknowledgments and Split-TCP). To demonstrate the utility of this concept, and to enable the practical implementations of these examples and others, we have developed VTL. VTL is a framework for packet modification and creation whose purpose is to modify network traffic to and from a VM, doing so transparently to the VM and its applications. We explain how to use VTL to implement the examples mentioned above and others, such as providing anonymized connectivity for a virtual machine through the Tor anonymizing network, and creating cooperative selective wormholing services for network intrusion detection systems. Copyright 2007 ACM.
Cloud computing allows for elasticity as users can dynamically benefit from new virtual resources when their workload increases. Such a feature requires highly reactive resource provisioning mechanisms. In this paper,...
详细信息
ISBN:
(纸本)9781509012336
Cloud computing allows for elasticity as users can dynamically benefit from new virtual resources when their workload increases. Such a feature requires highly reactive resource provisioning mechanisms. In this paper, we propose two new workload prediction models, based on constraint programming and neural networks, that can be used for dynamic resource provisioning in Cloud environments. We also present two workload trace generators that can help to extend an experimental dataset in order to test more widely resource optimization heuristics. Our models are validated using real traces from a small Cloud provider. Both approaches are shown to be complimentary as neural networks give better prediction results, while constraint programming is more suitable for trace generation.
A radiative transfer solver that implements the LTSn method was optimized and parallelized using the MPI message passing communication library. Timing and profiling information was obtained for the sequential code in ...
详细信息
ISBN:
(纸本)0769520464
A radiative transfer solver that implements the LTSn method was optimized and parallelized using the MPI message passing communication library. Timing and profiling information was obtained for the sequential code in order to identify performance bottlenecks. performance tests were executed in a distributed memory parallel machine, a multi-computer based on IA-32 architecture. the radiative transfer equation was solved for a cloud test case to evaluate the parallel performance of the LTSn method. the LTSn code include spatial discretization of the domain and Fourier decomposition of the radiances leading to independent azimuthal modes. this yields an independent radiative transfer equation for each mode that can be executed by a different processor in a parallel implementation. Speed-up results show that the parallel implementation is suitable for the used architecture.
We consider the implementation of a parallel Monte Carlo code for high-performance simulations on PC clusters with MPI. We carry out tests of speedup and efficiency. the code is used for numerical simulations of pure ...
详细信息
ISBN:
(纸本)0769520464
We consider the implementation of a parallel Monte Carlo code for high-performance simulations on PC clusters with MPI. We carry out tests of speedup and efficiency. the code is used for numerical simulations of pure SU (2) lattice gauge theory at very large lattice volumes, in order to study the infrared behavior of gluon and ghost propagators. this problem is directly related to the confinement of quarks and gluons in the physics of strong interactions.
Researchers are constantly looking for ways to improve the execution time of parallel applications on distributed systems. Although compile-time static scheduling heuristics employ complex mechanisms, the quality of t...
详细信息
ISBN:
(纸本)0769520464
Researchers are constantly looking for ways to improve the execution time of parallel applications on distributed systems. Although compile-time static scheduling heuristics employ complex mechanisms, the quality of their schedules are handicapped by estimated run-time costs. On the other hand, while dynamic schedulers use actual run-time costs, they have to be of low complexity in order to reduce the scheduling overhead this paper investigates the viability of integrating these two approaches into a hybrid scheduling framework. the relationship between static schedulers, dynamic heuristics and scheduling events are examined the results show that a hybrid scheduler can indeed improve the schedules produced by good traditional static list scheduling algorithms.
Huge data advent in high-performancecomputing (HPC) applications such as fluid flow simulations usually hinders the interactive processing and exploration of simulation results. Such an interactive data exploration n...
详细信息
ISBN:
(纸本)9781479984480
Huge data advent in high-performancecomputing (HPC) applications such as fluid flow simulations usually hinders the interactive processing and exploration of simulation results. Such an interactive data exploration not only allows scientiest to 'play' withtheir data but also to visualise huge (distributed) data sets in both an efficient and easy way. therefore, we propose an HPC data exploration service based on a sliding window concept, that enables researches to access remote data (available on a supercomputer or cluster) during simulation runtime without exceeding any bandwidth limitations between the HPC back-end and the user front-end.
Cloud multi-tenancy is typically constrained to a single interactive service colocated with one or more batch, low-priority services, whose performance can be sacrificed when deemed necessary. Approximate computing ap...
详细信息
ISBN:
(纸本)9781728114446
Cloud multi-tenancy is typically constrained to a single interactive service colocated with one or more batch, low-priority services, whose performance can be sacrificed when deemed necessary. Approximate computing applications offer the opportunity to enable tighter colocation among multiple applications whose performance is important. We present Pliant, a lightweight cloud runtime that leverages the ability of approximate computing applications to tolerate some loss in their output quality to boost the utilization of shared servers. During periods of high resource contention, Pliant employs incremental and interference-aware approximation to reduce contention in shared resources, and prevent QoS violations for co-scheduled interactive, latency-critical services. We evaluate Pliant across different interactive and approximate computing applications, and show that it preserves QoS for all co-scheduled workloads, while incurring a 2.1% loss in output quality, on average.
Real-time systems need time-predictable architectures to support static worst-case execution time (WCET) analysis. One architectural feature, the data cache, is hard to analyze when different data areas (e.g., heap al...
详细信息
ISBN:
(纸本)9781479921133
Real-time systems need time-predictable architectures to support static worst-case execution time (WCET) analysis. One architectural feature, the data cache, is hard to analyze when different data areas (e.g., heap allocated and stack allocated data) share the same cache. this sharing leads to less precise results of the cache analysis part of the WCET analysis. Splitting the data cache for different data areas enables composable data cache analysis. the WCET analysis tool can analyze the accesses to these different data areas independently. In this paper we present the design and implementation of a cache for stack allocated data. Our port of the LLVM C++ compiler supports the management of the stack cache. the combination of stack cache instructions and the hardware implementation of the stack cache is a further step towards time-predictable architectures.
Silicon-Photonics architectures have enabled high speed hardware implementations of Reservoir computing (RC). With a delayed feedback reservoir (DFR) model, only one non-linear node can be used to perform RC. However,...
详细信息
ISBN:
(纸本)9781728199245
Silicon-Photonics architectures have enabled high speed hardware implementations of Reservoir computing (RC). With a delayed feedback reservoir (DFR) model, only one non-linear node can be used to perform RC. However, the delay is often provided by using off-chip fiber optics which is not only space inconvenient but it also becomes architectural bottleneck and hinders to scalability. In this paper, we propose a completely on-chip photonic RC architecture for highperformancecomputing, employing multiple electronically tunable delay lines and micro-ring resonator (MRR) switch for multi-tasking. Proposed architecture provides 84% less error compared to the state-of-the-art standalone architecture in [8] for executing NARMA task. For multi-tasking, the proposed architecture shows 80% better performancethan [8]. the architecture outperforms all other proposed architectures as well. the on-chip area and power overhead of proposed architecture due to delay lines and MRR switch are 0.0184mm(2) and 26mW respectively.
暂无评论