Conventional random access scan (RAS) designs, although economic in test power dissipation, test application time and test data volume, are expensive in area and routing overhead. In this paper, we present a localized...
详细信息
On today's microprocessors, there often exist several different types of registers, e.g. general purpose registers and floating point registers. A given program may use one type of registers much more frequently t...
详细信息
ISBN:
(纸本)9781605581583
On today's microprocessors, there often exist several different types of registers, e.g. general purpose registers and floating point registers. A given program may use one type of registers much more frequently than other types. This creates an opportunity to employ the infrequently used registers as spill destinations for the more frequently used register types. In this paper, we present a code optimization method named idle register exploitation (IRE) to exploit such opportunities. We developed a model, called the IRE model, or IREM, to determine the static performance gains of IRE versus spilling to the stack. On a microprocessor with fast data paths between different types of registers, we find that IRE method speeds up the execution of the SPECint benchmark suite from 1.7% to 10%. In contrast, on microprocessors with less efficient data transfer paths, the performance gain is limited. In some cases, performance may even suffer degradation. This result argues strongly for the adoption of fast data paths between different types of registers for the purpose of reducing register spills, which is important in view of the increased significance of memory bottlenecks on future microprocessors. Copyright 2008 ACM.
Huffman codes are being widely used as a very efficient technique for compressing data. To achieve high compressing ratio, some properties of encoding and decoding for canonical Huffman table are discussed. A study an...
详细信息
With the development of semiconductor technology, microprocessors become more and more susceptible to transient faults. Some proposed schemes support redundant execution of a program in a superscalar processor for fau...
详细信息
In this paper, we highlight the use of multimedia technology in generating intrinsic summaries of tourism related information. The system utilizes an automated process to gather, filter and classify information on var...
详细信息
ISBN:
(纸本)9781605580852
In this paper, we highlight the use of multimedia technology in generating intrinsic summaries of tourism related information. The system utilizes an automated process to gather, filter and classify information on various tourist spots on the Web. The end result present to the user is a personalized multimedia summary generated with respect to users queries filled with text, image, video and real-time news made retrievable for mobile devices. Preliminary experiments demonstrate the superiority of our presentation scheme to traditional methods.
This paper proposes a new pose estimation method based on the appearance of 2D head image. First, the 1D Gabor filters are used to extract the features on the raw images. Compared with the traditional 2D Gabor represe...
详细信息
CAM is widely used in microprocessors and SOC TLB modules. It gives great advantage for software development. And TLB operations become bottleneck of the microprocessor performance. The test cost of normal BIST approa...
详细信息
An equivalence verification algorithm of sequential circuits [1] based on state transfer graph (STG) is presented in this paper, which obtains some certain useful information through verifying the corresponding state ...
详细信息
The continuing shrinking of technology enables more and more processor cores to reside on a single chip. However, the power consumption and delay of global wires have presented a great challenge in designing future ch...
详细信息
The continuing shrinking of technology enables more and more processor cores to reside on a single chip. However, the power consumption and delay of global wires have presented a great challenge in designing future chip multiprocessors. With these overheads of wires properly accounted for, researchers have explored some efficient on- chip network designs in the domain of larger scale caches. While in the paper, we attempt to reduce the interconnect power consumption with a novel cache coherence protocol. Conventional coherence protocols are kept independent from underlying networks for flexibility reasons. But in CMPs, processor cores and the on-chip network are tightly integrated. Exposing features of interconnect networks to protocols will unveil some optimization opportunities for power reduction. Specifically, by utilizing the location information of cores on a chip, the coherence protocol we propose in this work chooses to response the requester with the data copy in the closest sharer of the desired cache line, other than fetching it from distant L2 cache banks. This mechanism reduces the hops cache lines must travel and eliminates the power that would have incurred on the corresponding not-traveled links. To get accurate and detailed power information of interconnects, we extract wire power parameters by physical level simulation (HSPICE) and obtain router power by synthesizing RTL with actual ASIC libraries. We conduct experiments on a 16-core CMP simulator with a group of SPLASH2 benchmarks. The results demonstrate that an average of 16.3% L2 cache accesses could be optimized, resulting in an average 9.3% power reduction of data links with 19.2% as the most. This mechanism also yields a performance speedup of 1.4%.
Location consistency (LC) is a weak memory consistency model which is defined entirely on partial order execution semantics of parallel programs. Compared with sequential consistency (SC), LC is scalable and provides ...
详细信息
Location consistency (LC) is a weak memory consistency model which is defined entirely on partial order execution semantics of parallel programs. Compared with sequential consistency (SC), LC is scalable and provides ample theoretical parallelism. This makes LC an interesting memory model in the upcoming many-core parallel processing era. Previous work has pointed out that LC does not guarantee SC execution behavior for all data race free programs. In this paper, we compare the semantics of LC with PRAM consistency and memory coherence, and prove that LC is strictly weaker than PRAM consistency. For data race free programs, we prove that the semantics of LC is equivalent to memory coherence. In addition, by introducing memory ordering semantics into LC judiciously, we prove that the enhanced model is equivalent to SC for data race free programs. Finally, we discuss possible solutions for adding reasoning rules for LC-like weak memory models.
暂无评论