Along the last decade, several studies considered green software design as a key development concern to improve the energy efficiency of software. Yet, few techniques address this concern for Software Product Lines (S...
详细信息
ISBN:
(纸本)9781450384698
Along the last decade, several studies considered green software design as a key development concern to improve the energy efficiency of software. Yet, few techniques address this concern for Software Product Lines (SPL). In this paper, we therefore introduce two approaches to measure and reduce the energy consumption of a SPL by analyzing a limited set of products sampled from this SPL. While the first approach relies on the analysis of individual feature consumptions, the second one takes feature interactions into account to better mitigate energy consumption of resulting products. Our experimental results on a real-world SPL indicate that both approaches succeed to produce significant energy improvements on a large number of products, while consumption data was modeled from a small set of sampled products. Furthermore, we show that taking feature interactions into account leads to more products improved with higher energy savings per product.
Evolving graphs in the real world are large-scale and constantly changing, as hundreds of thousands of updates may come every second. Monotonic algorithms such as Reachability and Shortest Path are widely used in real...
详细信息
ISBN:
(纸本)9781450383431
Evolving graphs in the real world are large-scale and constantly changing, as hundreds of thousands of updates may come every second. Monotonic algorithms such as Reachability and Shortest Path are widely used in real-time analytics to gain both static and temporal insights and can be accelerated by incremental computing. Existing streaming systems adopt the incremental computing model and achieve either low latency or high throughput, but not both. However, both high throughput and low latency are required in real scenarios such as financial fraud detection. This paper presents RisGraph, a real-time streaming system that provides low-latency analysis for each update with high throughput. RisGraph addresses the challenge with localized data access and inter-update parallelism. We propose a data structure named Indexed Adjacency Lists and use sparse arrays and Hybrid Parallel Mode to enable localized data access. To achieve inter-update parallelism, we propose a domain-specific concurrency control mechanism based on the classification of safe and unsafe updates. Experiments show that RisGraph can ingest millions of updates per second for graphs with several hundred million vertices and billions of edges, and the P999 processing time latency is within 20 milliseconds. RisGraph achieves orders-of-magnitude improvement on throughput when analyses are executed for each update without batching, and performs better than existing systems with batches of up to 20 million updates.
Intelligent Transportation System's (ITS) main aim is to provide advanced services in both the transportation and traffic fields. A wide variety of algorithms and different types of models are being used for the e...
详细信息
We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon T with fixed and known cost matrices Q, R, but unknown and non-stationary dynamics {A(t), B-t}. The sequence of dy...
详细信息
We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon T with fixed and known cost matrices Q, R, but unknown and non-stationary dynamics {A(t), B-t}. The sequence of dynamics matrices can be arbitrary, but with a total variation, V-T, assumed to be o(T) and unknown to the controller. Under the assumption that a sequence of stabilizing, but potentially sub-optimal controllers is available for all t, we present an algorithm that achieves the optimal dynamic regret of (O) over tilde ((VTT3/5)-T-2/5). With piecewise constant dynamics, our algorithm achieves the optimal regret of (O) over tilde(root ST) where S is the number of switches. The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems. We also argue that non-adaptive forgetting (e.g., restarting or using sliding window learning with a static window size) may not be regret optimal for the LQR problem, even when the window size is optimally tuned with the knowledge of V-T. The main technical challenge in the analysis of our algorithm is to prove that the ordinary least squares (OLS) estimator has a small bias when the parameter to be estimated is non-stationary. Our analysis also highlights that the key motif driving the regret is that the LQR problem is in spirit a bandit problem with linear feedback and locally quadratic cost. This motif is more universal than the LQR problem itself, and therefore we believe our results should find wider application.
The proceedings contain 6 papers. The topics discussed include: multi-accelerator neural network inference in diversely heterogeneous embedded systems;comparing llc-memory traffic between CPU and GPU architectures;pla...
ISBN:
(纸本)9781665458771
The proceedings contain 6 papers. The topics discussed include: multi-accelerator neural network inference in diversely heterogeneous embedded systems;comparing llc-memory traffic between CPU and GPU architectures;platform agnostic streaming data application performance models;distributed training for high resolution images: a domain and spatial decomposition approach;ELIΧR: eliminating computation redundancy in CNN-based video processing;and energy efficient task graph execution using compute unit masking in GPUs.
Applications implemented for GPU are important in various fields. GPU has many parallel computing cores and high arithmetic throughput, enabling GPU applications to work efficiently. However, the throughput of GPU mem...
详细信息
ISBN:
(纸本)9781450392662
Applications implemented for GPU are important in various fields. GPU has many parallel computing cores and high arithmetic throughput, enabling GPU applications to work efficiently. However, the throughput of GPU memory, of which global memory is the costliest for accessing, is low. The input and output of GPU kernels must be stored in the global memory. They can be a bottleneck for some applications. Kernel fusion-based methods can effectively suppress access to global memory when kernels share some data. The methods combine two or more kernels into one, improving the performance by reducing expensive data communication with global memory. However, traditional kernel fusion-based methods miss many fusion opportunities because they focus only on data dependency between candidate kernels and do not consider their control flows. This paper proposes a novel kernel fusion-based method, called kernel fusion based on code motion (KFCM). KFCM exposes the fusibility of kernels by moving the kernels along control flows and then combines them. In the process of exposing fusibility, KFCM may duplicate some kernels. However, KFCM does not increase the number of kernels executed on each execution path. Thus, KFCM increases the opportunity of combining kernels without negative gain. The experimental results show that KFCM achieves 1.60x and 1.35x speedup compared with O3 option of Clang and a traditional method, respectively, in the best case.
We explore a hybrid approach to designing a biomanufacturing system with low-volume, high variability, and individualized products. Simulating a large number of possible configurations to determine those that meet tar...
ISBN:
(纸本)9798331534202
We explore a hybrid approach to designing a biomanufacturing system with low-volume, high variability, and individualized products. Simulating a large number of possible configurations to determine those that meet target production goals is computationally impractical. We create an explainable surrogate model, specifically a queueing network model, that is calibrated to the output of a few computationally expensive simulations. The queueing network model enables a quick exploration of large numbers of mixed integer-continuous configurations, which would be challenging for traditional surrogate-based approaches. The queueing network model is used to quickly identify promising regions where a few configurations can then be evaluated with the simulation. The difference in evaluations at these configurations is used to decide whether the queueing model requires partitioning and/or re-calibration. The use of this hybrid approach with an explainable surrogate enables analysis, such as identifying bottle-necks, and gives insight into robust designs of the biomanufacturing system.
Temperature gradient due to Joule heating has huge impacts on the electromigration (EM) induced failure effects. However, Joule heating and related thermomigration (TM) effects were less investigated in the past for p...
详细信息
ISBN:
(纸本)9781665421355
Temperature gradient due to Joule heating has huge impacts on the electromigration (EM) induced failure effects. However, Joule heating and related thermomigration (TM) effects were less investigated in the past for physics-based EM analysis for VLSI chip design. In this work, we propose a new spatial temperature aware transient EM induced stress analysis method. The new method consists of two new contributions: First, we propose a new TM-aware void saturation volume estimation method for fast immortality check in the post-voiding phase for the first time. We derive the analytic formula to estimate the void saturation in the presence of spatial temperature gradients due to Joule heating. Second, we develop a fast numerical solution for EM-induced stress analysis for multi-segment interconnect trees considering TM effect. The new method first transforms the coupled EM-TM partial differential equations into linear time-invariant ordinary differential equations (ODEs). Then extended Krylov subspace-based reduction technique is employed to reduce the size of the original system matrices so that they can be efficiently simulated in the time domain. The proposed method can perform the simulation process for both void nucleation and void growth phases under time-varying input currents and position-dependent temperatures. The numerical results show that, compared to the recently proposed semi-analytic EM-TM method, the proposed method can lead to about 28x speedup on average for the interconnect with up to 1000 branches for both void nucleation and growth phases with negligible errors.
Intermittently powered energy-harvesting devices enable new applications in inaccessible environments. Program executions must be robust to unpredictable power failures, introducing new challenges in programmability a...
详细信息
ISBN:
(纸本)9781450383912
Intermittently powered energy-harvesting devices enable new applications in inaccessible environments. Program executions must be robust to unpredictable power failures, introducing new challenges in programmability and correctness. One hard problem is that input operations have implicit constraints, embedded in the behavior of continuously powered executions, on when input values can be collected and used. This paper aims to develop a formal framework for enforcing these constraints. We identify two key properties-freshness (i.e., uses of inputs must satisfy the same time constraints as in continuous executions) and temporal consistency (i.e., the collection of a set of inputs must satisfy the same time constraints as in continuous executions). We formalize these properties and show that they can be enforced using atomic regions. We develop Ocelot, an LLVM-based analysis and transformation tool targeting Rust, to enforce these properties automatically. Ocelot provides the programmer with annotations to express these constraints and infers atomic region placement in a program to satisfy them. We then formalize Ocelot's design and show that Ocelot generates correct programs with little performance cost or code changes.
Connected and autonomous vehicles (CAV s) are facing increasing amounts of data and more complex data analysis, which creates challenges for them to make reliable decisions in real-time. To enable time-sensitive CAV a...
详细信息
ISBN:
(数字)9781665486118
ISBN:
(纸本)9781665486125
Connected and autonomous vehicles (CAV s) are facing increasing amounts of data and more complex data analysis, which creates challenges for them to make reliable decisions in real-time. To enable time-sensitive CAV applications, we design and implement a vehicle-edge-cloud framework that integrates compressed imaging (CI) and edge computing into CAV systems. Specifically, a lightweight model is used on the vehicle to perform real-time detection based on optical domain compressed data (called measurements). The edge is responsible for receiving the measurements and performing video reconstruction to support (more accurate) analysis based on the reconstructed video with a trigger. At the same time, the measurements, reconstructed videos, and analysis results are sent to the cloud to continuously update the vehicle model. In addition, we apply reinforcement learning to adapt the compression rate in different driving scenarios. The proposed framework is fully evaluated using our designed roadside platform and outdoor delivery vehicles.
暂无评论