Electron tomography (ET) is a powerful technology allowing the three-dimensional (3D) imaging of cellular ultrastructure. These structures are reconstructed from a set of micrographs taken at different sample orientat...
详细信息
This paper discusses energy-performance trade-off of networks-on-chip with real parallel applications. First, we develop an accurate energy-performance analytical model that, and conduct and analyze the impacts of bot...
详细信息
This paper discusses energy-performance trade-off of networks-on-chip with real parallel applications. First, we develop an accurate energy-performance analytical model that, and conduct and analyze the impacts of both frequency-independent and frequency-dependent factors in energy and performance model. Secondly, we put together the communication overhead, memory access overhead, frequency scaling, core count scaling to quantify the performance and energy consumed by NoCs in the energy-performance model. We propose a new energy-performance optimization method, by choosing a pair of frequency and core count to get optimal energy or performance. Finally, we implement eight PARSEC parallel applications to evaluate our model and optimization method. The experiment results confirm that our model predicts NoCs energy and performance behavior well, and we exactly select correct frequency level and core count for minimum energy or maximum performance with a given constraint for most parallel applications. We also present how to establish an NoCs' energy-performance model of a real parallel application in this paper.
In this paper, we present a new solution, BaitA-larm, to detect phishing attack using features that are hard to evade. The intuition of our approach is that phishing pages need to preserve the visual appearance the ta...
详细信息
MPI Alltoall is an important collective operation. In multicore clusters, many processes run in a node. On the one hand, shared memory can be adopted to optimize Alltoall communications of small messages by leader-bas...
详细信息
MPI Alltoall is an important collective operation. In multicore clusters, many processes run in a node. On the one hand, shared memory can be adopted to optimize Alltoall communications of small messages by leader-based schemes. However, as these schemes adopt a fixed number of leader processes, the optimal performance can't be obtained for all small messages. On the other hand, processes within a node contend for the same network resource. In Alltoall communications of large messages, many synchronization messages are used. Nevertheless, the contention makes their latency increase many times and the synchronization overhead can't be ingored. To solve these problems, two optimizations are presented. For small messages, the PLP method adopts changeable numbers of leader processes. For large messages, the LSS method reduces the number of synchronization messages from 3N to 2√N. The evaluations prove two methods. For small messages, the PLP method always obtains optimal performance. For large messages, the LSS method brings almost constant improvement percentage. The performance is improved by 25% for 32 KB and 64 KB messages.
Power has been a big issue in processor design for several years. Conventional popular approaches for addressing this issue like DVFS (Dynamic Voltage Frequency Scaling) now hit the law of diminishing returns. As mult...
详细信息
Power has been a big issue in processor design for several years. Conventional popular approaches for addressing this issue like DVFS (Dynamic Voltage Frequency Scaling) now hit the law of diminishing returns. As multi/many-core processors becoming the main stream processors, caches account for more and more CPU die area and power, this paper presents using filtering unnecessary way accesses to reduce dynamic power consumption of caches shared by instruction and data. The methods include using Invalid Filter, which could eliminate accesses to cache ways contained invalid blocks, and I/D Filter, which could eliminate accesses to cache ways contained instruction/data access type mismatch blocks, and Tag-2 Filter, which could eliminate accesses to cache ways contained tag lowest 2 bits mismatch blocks. Since the methods reducing the activities happened in cache architecture, dynamical CPU power could be significantly decreased. In the paper, we also propose combining the above methods together, which is called Invalid+I/D+Tag-2 Filter, in an attempt to achieve better power saving results. We have verified the effectiveness and complementariness of the three proposed methods through analysis and experiments. Also, our evaluations show that, we could obtain 19.6%~47.8% (which is on average 34.3%) improvement on a 64KB-4way set-associative cache and 19.6%~55.2% (which is on average 39.2%) improvement on a 128KB-8way set-associative cache comparing to Invalid+I/D Filter, and 16.1%~27.7% (which is on average 16.6%) improvement on a 64KB-4way set-associative cache and 6.9%~44.4% (which is on average 25.0%) improvement on a 128KB-8way set-associative cache comparing to Invalid+Tag-2 Filter, respectively.
In many practical data mining applications, such as web categorization, key gene selection, etc., generally, unlabeled training examples are readily available, but labeled ones are fairly expensive to obtain. Therefor...
详细信息
Leakage-resilient public key encryption (PKE) schemes are designed to resist "memory attacks", i.e., the adversary recovers the cryptographic key in the memory adaptively, but subject to constraint that the ...
详细信息
A measurement of the Higgs boson mass is presented based on the combined data samples of the ATLAS and CMS experiments at the CERN LHC in the H→γγ and H→ZZ→4ℓ decay channels. The results are obtained from a simul...
详细信息
A measurement of the Higgs boson mass is presented based on the combined data samples of the ATLAS and CMS experiments at the CERN LHC in the H→γγ and H→ZZ→4ℓ decay channels. The results are obtained from a simultaneous fit to the reconstructed invariant mass peaks in the two channels and for the two experiments. The measured masses from the individual channels and the two experiments are found to be consistent among themselves. The combined measured mass of the Higgs boson is mH=125.09±0.21 (stat)±0.11 (syst) GeV.
Without appropriate stitching of scan chains, even with good diagnosis algorithm and diagnostic pattern generation, it may still result in bad scan chain diagnostic resolution. To improve the diagnostic resolution, we...
详细信息
ISBN:
(纸本)9781479908608
Without appropriate stitching of scan chains, even with good diagnosis algorithm and diagnostic pattern generation, it may still result in bad scan chain diagnostic resolution. To improve the diagnostic resolution, we propose a novel Diagnosis and Layout Aware (DLA) scan chain stitching method, which is pattern independent and supports embedded scan compaction. It is based on three ideas: (1) increasing the number of sensitive scan cells, which can capture useful diagnostic information; (2) properly distributing the sensitive scan cells along the scan chains to enhance the overall resolution; (3) stitching scan cells based on their placement at layout to preserve the chip performance. Experiments on ISCAS'89/ITC'99 benchmark circuits and a real industry circuit based on 20nm technology with silicon results show that, the proposed DLA scan chain stitching method effectively improves the resolution, with negligible impact on chip performance, embedded scan compaction, transition fault coverage, and test power dissipation. The silicon results even show 7X average resolution improvement comparing to without using the proposed method.
暂无评论