Utility functions can be used to represent the value users attach to job completion as a function of turnaround time. Most previous scheduling research used simple synthetic representations of utility, withthe simpli...
详细信息
ISBN:
(纸本)1595936734
Utility functions can be used to represent the value users attach to job completion as a function of turnaround time. Most previous scheduling research used simple synthetic representations of utility, withthe simplicity being due to the fact that real user preferences are difficult to obtain, and perhaps concern that arbitrarily complex utility functions could in turn make the scheduling problem intractable. In this work, we advocate a flexible representation of utility functions that can indeed be arbitrarily complex. We show that a genetic algorithm heuristic can improve global utility by analyzing these functions, and does so tractably. Since our previous work showed that users indeed have and can articulate complicated utility functions, the result here is relevant. We then provide a means to augment existing workload traces with realistic utility functions for the purpose of enabling realistic scheduling simulations. Copyright 2007 ACM.
Optical interconnects are useful for high-performance electronic computing systems when the number-of-channels, the bit-rate per channel, the channel density, and the communication distance of electrical links are sim...
详细信息
ISBN:
(纸本)9780769533803
Optical interconnects are useful for high-performance electronic computing systems when the number-of-channels, the bit-rate per channel, the channel density, and the communication distance of electrical links are simultaneously stressed. We review the near-term and longer term opportunities for optical communication at the chassis, chip-package and silicon micro-system levels.
the peak compute performance of GPUs has been increased by integrating more compute resources and operating them at higher frequency. However, such approaches significantly increase power consumption of GPUs, limiting...
详细信息
ISBN:
(纸本)9781467355872
the peak compute performance of GPUs has been increased by integrating more compute resources and operating them at higher frequency. However, such approaches significantly increase power consumption of GPUs, limiting further performance increase due to the power constraint. Facing such a challenge, we propose three techniques to improve power efficiency and performance of GPUs in this paper. First, we observe that many GPGPU applications are integer-intensive. For such applications, we combine a pair of dependent integer instructions into a composite instruction that can be executed by an enhanced fused multiply-add unit. Second, we observe that computations for many instructions are duplicated across multiple threads. We dynamically detect such instructions and execute them in a separate scalar unit. Finally, we observe that 16 or fewer bits are sufficient for accurate representation of operands and results of many instructions. thus, we split the 32-bit datapath into two 16-bit datapath slices that can concurrently issue and execute up to two such instructions per cycle. All three proposed techniques can considerably increase utilization of compute resources, improving power efficiency and performance by 20% and 15%, respectively.
Software instrumentation is an important aspect of software-intensive distributed real-time and embedded (DRE) systems because it enables real-time feedback of system properties, such as resource usage and component s...
详细信息
ISBN:
(纸本)9781479921133
Software instrumentation is an important aspect of software-intensive distributed real-time and embedded (DRE) systems because it enables real-time feedback of system properties, such as resource usage and component state, for performance analysis. Although it is critical not to collect too much instrumentation data to ensure minimal impact on the DRE system's existing performance properties, the design and implementation of software instrumentation middleware can impact how much instrumentation data can be collected. this can indirectly impact the DRE system's existing properties and performance analysis, and is more of a concern when using general-purpose software instrumentation middleware for DRE systems. this paper provides two contributions to instrumenting software-intensive DRE systems. First, it presents two techniques named the Standard Flat-rate Envelope and Pay-per-use for improving the performance of software instrumentation middleware for DRE systems. Secondly, it quantitatively evaluates performance gains realized by the two techniques in the context the Open-source architecture for Software Instrumentation of Systems (OASIS), which is open-source dynamic instrumentation middleware for DRE systems. Our results show that the Standard Flat-rate Envelope improves performance up to 57% and the Pay-per-use improves performance up to 49%.
Traditional scientific computing has been associated with harnessing computation cycles within and across clusters of machines. In recent years, scientific applications have become increasingly data-intensive. this is...
详细信息
ISBN:
(纸本)1595936734
Traditional scientific computing has been associated with harnessing computation cycles within and across clusters of machines. In recent years, scientific applications have become increasingly data-intensive. this is especially true in the fields of astronomy and high energy physics. Furthermore, the lowered cost of disks and commodity machines has led to a dramatic increase in the amount of free disk space spread across machines in a cluster. this space is not being exploited by traditional distributed computing tools. In this paper we have evaluated ways to improve the data management capabilities of Condor, a popular distributed computing system. We have augmented the Condor system by providing the capability to store data used and produced by workflows on the disks of machines in the cluster. We have also replaced the Condor matchmaker with a new workflow planning framework that is cognizant of dependencies between jobs in a workflow and exploits these new data storage capabilities to produce workflow schedules. We show that our data caching and workflow planning framework can significantly reduce response times for data-intensive workflows by reducing data transfer over the network in a cluster. We also consider ways in which this planning framework can be made adaptive in a dynamic, multi-user, failure-prone environment. Copyright 2007 ACM.
To fill the gap between the modeling of real-time systems and the scheduling analysis, we propose a framework that supports seamlessly the two aspects: (1) modeling a system using a methodology, in our case study, the...
详细信息
ISBN:
(纸本)9781479921133
To fill the gap between the modeling of real-time systems and the scheduling analysis, we propose a framework that supports seamlessly the two aspects: (1) modeling a system using a methodology, in our case study, the architecture Analysis and Design Language (AADL), and (2) helping to easily check temporal requirements (schedulability analysis, worst-case response time, sensitivity analysis, etc.). We introduce the usefulness of an intermediate framework called MoSaRT, which supports a rich semantic concerning temporal analysis. We show with a case study how the input model is transformed into a MoSaRT model, and how our framework is able to generate the proper models as inputs to several classic temporal analysis tools.
One of the main challenges in data analytics is that discovering structures and patterns in complex datasets is a computer-intensive task. Recent advances in high-performancecomputing provide part of the solution. Mu...
详细信息
ISBN:
(纸本)9780769548463
One of the main challenges in data analytics is that discovering structures and patterns in complex datasets is a computer-intensive task. Recent advances in high-performancecomputing provide part of the solution. Multicore systems are now more affordable and more accessible. In this paper, we investigate how this can be used to develop more advanced methods for data analytics. We focus on two specific areas: model-driven analysis and data mining using optimisation techniques.
Power density grows in new technology nodes, thus requiring Vcc to scale especially in mobile platforms where energy is critical. this paper presents a novel approach to decrease V cc while keeping operating frequency...
ISBN:
(纸本)9781424456598
Power density grows in new technology nodes, thus requiring Vcc to scale especially in mobile platforms where energy is critical. this paper presents a novel approach to decrease V cc while keeping operating frequency high. Our mechanism is referred to as immediate read after write (IRAW) avoidance. We propose an implementation of the mechanism for an Intel (R) Silterthorne (TM) in-order core. Furthermore, we show that our mechanism can be adapted dynamically to provide the highest performance and lowest energy-delay product (EDP) at each Vcc level. Results show that IRAW avoidance increases operating frequency by 57% at 500mV and 99% at 400mV with negligible area and power overhead (below 1%), which translates into large speedups (48% at 500mV and 90% at 400mV) and EDP reductions (0.61 EDP at 500mV and 0.33 at 400mV).
A high-radix composite algorithm for the computation of the powering function (X-Y) is presented in this paper the algorithm consists of a sequence of overlapped operations: (i) digit-recurrence logarithm, (ii) left-t...
详细信息
ISBN:
(纸本)076951894X
A high-radix composite algorithm for the computation of the powering function (X-Y) is presented in this paper the algorithm consists of a sequence of overlapped operations: (i) digit-recurrence logarithm, (ii) left-to-right carry-free (LRCF) multiplications, and (iii) on-line exponential. A redundant number system is used, and the selection in (i) and (iii) is done by rounding except from the first iteration, when selection by table look-up is necessary to guarantee the convergence of the recurrences. A sequential implementation of the algorithm is proposed and the execution times and hardware requirements are estimated for single and double-precision floating-point computations, for radix r = 128, showing that powering can be computed with similar performance as high-radix CORDIC algorithms.
Modular exponentiation is the cornerstone computation performed in public-key cryptography systems such as the RSA cryptosystem. the operation is time consuming for large operands. this paper describes the characteris...
详细信息
ISBN:
(纸本)0769520464
Modular exponentiation is the cornerstone computation performed in public-key cryptography systems such as the RSA cryptosystem. the operation is time consuming for large operands. this paper describes the characteristics of three architectures designed to implement modular exponentiation using the fast binary method: the first FPGA prototype has a sequential architecture, the second has a parallel architecture and the third has a systolic array-based architecture. the paper compares the three prototypes using the time x area classic factor. All three prototypes implement the modular multiplication using the popular Montgomery algorithm.
暂无评论