The wide application of General Purpose Graphic Processing Units (GPGPUs) results in large manual efforts on porting and optimizing algorithms on them. However, most existing automatic ways of generating GPGPU code fa...
详细信息
Sensor nodes in wireless networks often use batteries as their source of energy, but replacing or recharging exhausted batteries in a deployed network can be difficult and costly. Therefore, prolonging battery life be...
详细信息
Sensor nodes in wireless networks often use batteries as their source of energy, but replacing or recharging exhausted batteries in a deployed network can be difficult and costly. Therefore, prolonging battery life becomes a principal objective in the design of wireless sensor networks (WSNs). There is little published data that quantitatively analyze a sensor node's lifetime under different operating conditions. This paper presents several experiments to quantify the impact of key wireless sensor network design and environmental parameters on battery performance. Our testbed consists of MicaZ motes, commercial alkaline batteries, and a suite of techniques for measuring battery performance. We evaluate known parameters, such as communication distance, working channel and operating power that play key roles in battery performance. Through extensive real battery discharge measurements, we expect our results to serve as a quantitative basis for future research in designing and implementing battery-efficient sensing applications and protocols.
Instruction-level redundancy is an effective scheme to reduce the susceptibility of microprocessors to soft errors, offering high error detection and recovery capability;however, it usually incurs significant performa...
详细信息
ISBN:
(纸本)9781467344975
Instruction-level redundancy is an effective scheme to reduce the susceptibility of microprocessors to soft errors, offering high error detection and recovery capability;however, it usually incurs significant performance degradation due to resource racing. Motivated by the fact that narrow-width operands are commonly seen in applications, we exploit data-level parallelism to accelerate instruction-level redundancy. For the instructions within sphere of replication (SoR) of data-level redundancy, normal and redundant versions of the narrow-width operand of the instruction are folded into one register to share the same functional unit during execution hence alleviating resource racing. The other instructions are all protected by instructionlevel redundancy. We run SPECint2000 benchmarks on a modified version of SimpleScalar simulator, and synthesize the extra hardware to evaluate area overhead of the proposed pipeline. Experimental results show that our acceleration scheme outperforms conventional instruction-level redundancy by 13% in IPC. Besides, the extra area overhead is negligible.
We propose an approach to recognize group activities which involve several persons based on modeling the interactions between human bodies. Benefitted from the recent progress in pose estimation [1], we model the acti...
详细信息
We propose an approach to recognize group activities which involve several persons based on modeling the interactions between human bodies. Benefitted from the recent progress in pose estimation [1], we model the activities as the interactions between the parts belong to the same person (intra-person) and those between the parts of different persons (inter-person). Then a unified, discriminative model which integrates both types of interactions is developed. The experiments on the UT-Interaction Dataset [2] show the promising results and demonstrate the power of the interacting models.
Cloud computing is a new computing model. The resource monitoring tools are immature compared to traditional distributed computing and grid computing. In order to better monitor the virtual resource in cloud computing...
详细信息
Cloud computing is a new computing model. The resource monitoring tools are immature compared to traditional distributed computing and grid computing. In order to better monitor the virtual resource in cloud computing, a periodically and event-driven push (PEP) monitoring model is proposed. Taking advantage of the push and event-driven mechanism, the model can provide comparatively adequate information about usage and status of the resources. It can simplify the communication between Master and Work Nodes without missing the important issues happened during the push interval. Besides, we develop "mon" to make up for the deficiency of Libvirt in monitoring of virtual CPU and memory.
Moore's law continues to grant computer architects ever more transistors in the foreseeable future, and para-llelism is the key to continued performance scaling in modern microprocessors. In this paper, the achiev...
详细信息
In this paper we present a thorough experience on tuning double-precision matrix-matrix multiplication (DGEMM) on the Fermi GPU architecture. We choose an optimal algorithm with blocking in both shared memory and regi...
详细信息
ISBN:
(纸本)9781450307710
In this paper we present a thorough experience on tuning double-precision matrix-matrix multiplication (DGEMM) on the Fermi GPU architecture. We choose an optimal algorithm with blocking in both shared memory and registers to satisfy the constraints of the Fermi memory hierarchy. Our optimization strategy is further guided by a performance modeling based on micro-architecture benchmarks. Our optimizations include software pipelining, use of vector memory operations, and instruction scheduling. Our best CUDA algorithm achieves comparable performance with the latest CUBLAS library1. We further improve upon this with an implementation in the native machine language, leading to 20% increase in performance. That is, the achieved peak performance (efficiency) is improved from 302Gflop/s (58%) to 362Gflop/s (70%). Copyright 2011 ACM.
It is challenging to schedule time-constrained cluster tools subject to activity time variation. With the help of their Petri net model, a real-time control policy is used to offset the activity time variation. Based ...
详细信息
It is challenging to schedule time-constrained cluster tools subject to activity time variation. With the help of their Petri net model, a real-time control policy is used to offset the activity time variation. Based on it, the schedulability conditions and scheduling algorithms are presented for single-arm cluster tools. The schedulability conditions can be analytically checked. Algorithms are developed based on analytical expressions such that it is also computationally efficient. The schedule obtained by the scheduling algorithms together with a real-time control policy forms the real-time schedule. It is optimal in terms of cycle time.
With wafer revisit, it is complicated to schedule cluster tools in semiconductor fabrication. In wafer fabrication processes, such as atomic layer deposition (ALD), the wafers need to visit some process modules for a ...
详细信息
With wafer revisit, it is complicated to schedule cluster tools in semiconductor fabrication. In wafer fabrication processes, such as atomic layer deposition (ALD), the wafers need to visit some process modules for a number of times. The existing swap-based strategy can be used to operate a dual-arm cluster tool for such a process. It results in a 3-wafer cyclic schedule. However, it is not optimal in the sense of cycle time. Thus, to search for a better schedule, a Petri net model is developed for a dual-arm cluster tool with wafer revisit. With it, the properties of the 3-wafer schedule are analyzed. It is found that, to improve the performance, it is necessary to reduce the number of wafers completed in a cycle. Thus, a 1-wafer schedule is developed by using a new swap-based strategy.
In this paper, the concept of left conjugate product is first presented. Some interesting properties of the concept are then derived. Using left conjugate product as a tool, we investigate dual Sylvester-conjugate mat...
详细信息
In this paper, the concept of left conjugate product is first presented. Some interesting properties of the concept are then derived. Using left conjugate product as a tool, we investigate dual Sylvester-conjugate matrix equations which include Lyapunov matrix equations and generalized Sylvester-observer matrix equations as special cases. An explicit solution of this matrix equation is presented with a free parameter matrix.
暂无评论