IoT devices commonly use flash memory for both data and code storage. Flash memory consumes a significant portion of the overall energy of such devices. This is problematic because IoT devices are energy constrained d...
详细信息
ISBN:
(纸本)9798350393132;9798350393149
IoT devices commonly use flash memory for both data and code storage. Flash memory consumes a significant portion of the overall energy of such devices. This is problematic because IoT devices are energy constrained due to their reliance on batteries or energy harvesting. To save energy, we leverage a unique property of flash memory;write operations take unequal amounts of energy depending on if we are flipping a 1. 0 versus a 0. 1. We exploit this asymmetry to reduce energy consumption with FLIPBIT, a hardware-software approximation approach that limits costly 0. 1 transitions in flash. Instead of performing an exact write, we write an approximated value that avoids any costly 0. 1 bit flips. Using FLIPBIT, we reduce the mean energy used by flash by 68% on video streaming applications while maintaining 42 dB PSNR. On machine learning models, we reduce energy by an average of 39% and up to 71% with only a 1% accuracy loss. Additionally, by reducing the number of program-erase cycles, we increase the flash lifetime by 68%.
Major cloud providers, including IBM Cloud, Amazon Web Services, Microsoft Azure, and Google Cloud, offer services to train, debug, store, and deploy machine learning models at scale. For enhanced user experience in S...
详细信息
ISBN:
(纸本)9781728141947
Major cloud providers, including IBM Cloud, Amazon Web Services, Microsoft Azure, and Google Cloud, offer services to train, debug, store, and deploy machine learning models at scale. For enhanced user experience in SLA-driven control, cost effective budgeting, elastic scaling, and efficient operations, estimating the runtime of training a machine learning model is important. We present AI Gauge, a cloud service to estimate runtime and cost for training deep learning models under different configuration options on the cloud. AI Gauge is designed using micro-service architecture and performs estimations based on machine learning models calibrated by an extensive and continuously populated job trace data-set. We show that AI Gauge can accurately predict the remaining time of running jobs based on its runtime progress (< 10% relative error) and can accurately predict the total runtime for a job before it starts with 7-8% relative error on average.
Cloud datacenters are exploring their idle resources and offering virtual machine as transient servers without availability guarantees. Spot instances are transient servers offered by Amazon AWS, with rules that defin...
详细信息
ISBN:
(纸本)9781538677698
Cloud datacenters are exploring their idle resources and offering virtual machine as transient servers without availability guarantees. Spot instances are transient servers offered by Amazon AWS, with rules that define prices according to supply and demand. These instances will run for as long as the current price is lower than the maximum bid price given by users. Spot instances have been increasingly used for executing computation and memory intensive applications. By using dynamic fault tolerant mechanisms and appropriate strategies, users can effectively use spot instances to run applications at a cheaper price. This paper presents a resilient multi-strategy agent-based cloud computingarchitecture. The architecture combines machine learning and a statistical model to predict instance survival times, refine fault tolerance parameters and reduce total execution time. We evaluate our strategies and the experiments demonstrate high levels of accuracy, reaching a 94% survival prediction success rate, which indicates that the model can be effectively used to define execution strategies to prevent failures at revocation events under realistic working conditions.
This study presents a new parallel Finite Element Method (FEM) strategy designed for coarse grain distributed memory systems. The adopted communication protocol is Message Passing Interface (MPI) and tests are carried...
详细信息
ISBN:
(纸本)9780769538570
This study presents a new parallel Finite Element Method (FEM) strategy designed for coarse grain distributed memory systems. The adopted communication protocol is Message Passing Interface (MPI) and tests are carried out in a cluster of PCs. Compressed data structure is used to store the Hessian matrix in order to optimize memory usage and to use the parallel direct solver MUMPS. The new partitioning paradigm is based on structural finite element nodes, not elements (as usually done in references), resulting in an overlapping algorithm, where a reduced amount of information should be allocated and manipulated to integrate finite elements. The main advantage of the nodal partitioning is the performance improvement of the Hessian matrix assembly and the natural ordering to improve the system solution. Numerical examples are shown in order to demonstrate the efficiency and scalability of the proposed algorithm.
In this paper, we introduce XPySom, a new opensource Python implementation of the well-known Self-Organizing Maps (SOM) technique. It is designed to achieve highperformance on a single node, exploiting widely availab...
详细信息
ISBN:
(纸本)9781728199245
In this paper, we introduce XPySom, a new opensource Python implementation of the well-known Self-Organizing Maps (SOM) technique. It is designed to achieve highperformance on a single node, exploiting widely available Python libraries for vector processing on multi-core CPUs and GP-GPUs. We present results from an extensive experimental evaluation of XPySom in comparison to widely used open-source SOM implementations, showing that it outperforms the other available alternatives. Indeed, our experimentation carried out using the Extended MNIST open data set shows a speed-up of about 7x and 100x when compared to the best open-source multi-core implementations we could find with multi-core and GP-GPU acceleration, respectively, achieving the same accuracy levels in terms of quantization error.
Large-scale data centers run latency-critical jobs with quality-of-service (QoS) requirements, and throughput-oriented background :jobs, which need to achieve highperformance. Previous works have proposed methods whi...
详细信息
ISBN:
(纸本)9781728161495
Large-scale data centers run latency-critical jobs with quality-of-service (QoS) requirements, and throughput-oriented background :jobs, which need to achieve highperformance. Previous works have proposed methods which cannot co-locate multiple latency-critical jobs with multiple backgrounds jobs while: (I) meeting the QoS requirements of all latency-critical jobs, and (2) maximizing the performance of the background jobs. This paper proposes CLITE, a Bayesian Optimization-based, multi-resource partitioning technique which achieves these goals.
This paper introduces an approach to the design of discrete event simulation experiments aimed at transient performance analysis. Specially in complex, multi-tier applications, the net effects of small delays introduc...
详细信息
ISBN:
(纸本)9781467380119
This paper introduces an approach to the design of discrete event simulation experiments aimed at transient performance analysis. Specially in complex, multi-tier applications, the net effects of small delays introduced by buffers, IO operations, communication latency and averaged measurements, may result in significant inertia along the input-output path. In order to bring out these dynamic properties, the simulation experiment should excite the system with non-stationary workload under controlled conditions. The work discusses on the dynamic properties of large-scale distributed computer systems and how these may impact delivered performance. These rationales are explored to motivate a concern-based architecture which captures the elicited requirements. The design approach is systematic formulated and illustrated by a case study on extending a well-known cloud computing simulation framework to meet the aimed features. Experimental results of ongoing work are also addressed.
The availability of computational resources changed significantly due to cloud computing. In addition, we have witnessed efforts to execute high-performancecomputing (HPC) applications in the cloud attracted by the a...
详细信息
ISBN:
(纸本)9798350381603
The availability of computational resources changed significantly due to cloud computing. In addition, we have witnessed efforts to execute high-performancecomputing (HPC) applications in the cloud attracted by the advantages of cost savings and scalable/elastic resource allocation. Allocating more powerful hardware and exclusivity allocating resources such as memory, storage, and CPU can improve performance in the cloud. For network interconnection, significant noise, and other inferences are generated by several simultaneous instances (multitenants) communicating using the same network. As increasing the network bandwidth may be an alternative, we designed an evaluation model, and performance analysis of NIC aggregation approaches in containerized private clouds. The experiments using NAS Parallel Benchmarks revealed that NIC aggregation approach outperforms the baseline up to similar to 98% of the executions with applications characterized by intensive network use. Also, the Balance Round-Robin aggregation mode performed better than the 802.3ad aggregation mode in most assessments.
This paper presents a high level model to describe bag of tasks (BoT) applications and a framework to evaluate user level approaches to scheduler BoTs on coarser works units. The scheduler consolidates the load of the...
详细信息
ISBN:
(纸本)9781509012336
This paper presents a high level model to describe bag of tasks (BoT) applications and a framework to evaluate user level approaches to scheduler BoTs on coarser works units. The scheduler consolidates the load of the tasks in a given number of virtual machines (VMs) providing the estimated makespan. The framework allows to change the policy of tasks selection in order to compare the length of the scheduling produced giving a limited number of VMs. The framework has as input a BoT description and produces for each VM its trace of processing load. This paper validates the BoT model and the proposed framework with a performance assessment. In our case studies, the output of the framework is submitted to a real OpenStack based IaaS infrastructure. The results show that the makespan can be reduced by grouping tasks in coarse units of loads.
The SPEC Power benchmark offers valuable insights into the energy efficiency of server systems, allowing comparisons across various hardware and software configurations. Benchmark results are publicly available for hu...
详细信息
ISBN:
(纸本)9798350383461;9798350383454
The SPEC Power benchmark offers valuable insights into the energy efficiency of server systems, allowing comparisons across various hardware and software configurations. Benchmark results are publicly available for hundreds of systems from different vendors, published since 2007. We leverage this data to perform an analysis of trends in x86 server systems, focusing on power consumption, energy efficiency, energy proportionality and idle power consumption. Through this analysis, we aim to provide a clearer understanding of how server energy efficiency has evolved and the factors influencing these changes.
暂无评论