Cloud computing is a new computing model. The resource monitoring tools are immature compared to traditional distributed computing and grid computing. In order to better monitor the virtual resource in cloud computing...
详细信息
Cloud computing is a new computing model. The resource monitoring tools are immature compared to traditional distributed computing and grid computing. In order to better monitor the virtual resource in cloud computing, a periodically and event-driven push (PEP) monitoring model is proposed. Taking advantage of the push and event-driven mechanism, the model can provide comparatively adequate information about usage and status of the resources. It can simplify the communication between Master and Work Nodes without missing the important issues happened during the push interval. Besides, we develop "mon" to make up for the deficiency of Libvirt in monitoring of virtual CPU and memory.
Moore's law continues to grant computer architects ever more transistors in the foreseeable future, and para-llelism is the key to continued performance scaling in modern microprocessors. In this paper, the achiev...
详细信息
In this paper we present a thorough experience on tuning double-precision matrix-matrix multiplication (DGEMM) on the Fermi GPU architecture. We choose an optimal algorithm with blocking in both shared memory and regi...
详细信息
ISBN:
(纸本)9781450307710
In this paper we present a thorough experience on tuning double-precision matrix-matrix multiplication (DGEMM) on the Fermi GPU architecture. We choose an optimal algorithm with blocking in both shared memory and registers to satisfy the constraints of the Fermi memory hierarchy. Our optimization strategy is further guided by a performance modeling based on micro-architecture benchmarks. Our optimizations include software pipelining, use of vector memory operations, and instruction scheduling. Our best CUDA algorithm achieves comparable performance with the latest CUBLAS library1. We further improve upon this with an implementation in the native machine language, leading to 20% increase in performance. That is, the achieved peak performance (efficiency) is improved from 302Gflop/s (58%) to 362Gflop/s (70%). Copyright 2011 ACM.
It is challenging to schedule time-constrained cluster tools subject to activity time variation. With the help of their Petri net model, a real-time control policy is used to offset the activity time variation. Based ...
详细信息
It is challenging to schedule time-constrained cluster tools subject to activity time variation. With the help of their Petri net model, a real-time control policy is used to offset the activity time variation. Based on it, the schedulability conditions and scheduling algorithms are presented for single-arm cluster tools. The schedulability conditions can be analytically checked. Algorithms are developed based on analytical expressions such that it is also computationally efficient. The schedule obtained by the scheduling algorithms together with a real-time control policy forms the real-time schedule. It is optimal in terms of cycle time.
In this paper, the concept of left conjugate product is first presented. Some interesting properties of the concept are then derived. Using left conjugate product as a tool, we investigate dual Sylvester-conjugate mat...
详细信息
In this paper, the concept of left conjugate product is first presented. Some interesting properties of the concept are then derived. Using left conjugate product as a tool, we investigate dual Sylvester-conjugate matrix equations which include Lyapunov matrix equations and generalized Sylvester-observer matrix equations as special cases. An explicit solution of this matrix equation is presented with a free parameter matrix.
With wafer revisit, it is complicated to schedule cluster tools in semiconductor fabrication. In wafer fabrication processes, such as atomic layer deposition (ALD), the wafers need to visit some process modules for a ...
详细信息
With wafer revisit, it is complicated to schedule cluster tools in semiconductor fabrication. In wafer fabrication processes, such as atomic layer deposition (ALD), the wafers need to visit some process modules for a number of times. The existing swap-based strategy can be used to operate a dual-arm cluster tool for such a process. It results in a 3-wafer cyclic schedule. However, it is not optimal in the sense of cycle time. Thus, to search for a better schedule, a Petri net model is developed for a dual-arm cluster tool with wafer revisit. With it, the properties of the 3-wafer schedule are analyzed. It is found that, to improve the performance, it is necessary to reduce the number of wafers completed in a cycle. Thus, a 1-wafer schedule is developed by using a new swap-based strategy.
For Petri net models whose legal reachability spaces are non-convex, one cannot optimally control them by the conjunctions of linear constraints. This work proposes a method to find a set of linear constraints such th...
详细信息
With the rapid development of service-oriented computing (SOC) and service-oriented architecture (SOA), the number of services is rapidly increasing. How to organize and manage services effectively in repositories to ...
详细信息
With the rapid development of service-oriented computing (SOC) and service-oriented architecture (SOA), the number of services is rapidly increasing. How to organize and manage services effectively in repositories to improve the efficiency of service discovery and composition is important. This paper proposes three categorization rules to classify services for a large scale repository to form a relational taxonomy. The service retrieve scope can be drastically narrowed by this taxonomy. Therefore, the efficiency of service discovery and service composition can be greatly improved. We evaluate and compare the performance of the proposed method and other related ones via a publicly available test set, ICEBE05. The experimental results validate the effectiveness and high efficiency of the proposed one.
Traditional temporal logics such as LTL (Linear Temporal Logic) and CTL (Computation Tree Logic) have shown tremendous success in specifying and verifying hardware and software systems. However, this kind of logic can...
详细信息
The limited write endurance of phase change random access memory (PRAM) is one of the major obstacles for PRAM-based main memory. Wear leveling techniques were proposed to extend its lifetime by balancing writes traff...
详细信息
暂无评论