This paper describes a low cost, high quality at-speed testing strategy implemented on a gigahertz microprocessor with multi-clock domains. The presented DFT method not only utilizes the internal phase-locked loops (P...
详细信息
This paper describes a low cost, high quality at-speed testing strategy implemented on a gigahertz microprocessor with multi-clock domains. The presented DFT method not only utilizes the internal phase-locked loops (PLLs) to provide complex test clock sequences, but also applies a hybrid scan compression structure to reduce test data volume. It is difficult and time-consuming to generate at-speed tests for a design with embedded memories and multi-clock domains. The proposed test pattern generation scheme can gain transition fault coverage of approximately 83% for this high-performance microprocessor, and the test power consumption is well controlled.
Instead of all using commodity components, an approach building a personal parallel computer on top of a non-coherent HyperTransport (HT) fabric is presented in the paper. The advantage is to provide both lower cost a...
详细信息
Instead of all using commodity components, an approach building a personal parallel computer on top of a non-coherent HyperTransport (HT) fabric is presented in the paper. The advantage is to provide both lower cost and higher performance compared with the existing method. A HT switch is designed and implemented for the interconnection of a set of AMD Opteron processors for building an in-a-box cluster. On our prototyping system, the result of evaluation experiments shows this approach gives the better performance.
In the medium and small cluster systems, the centralized file server such as NFS is the main approach to provide the storage service with low cost and easy management. However, when multiple parallel applications acce...
详细信息
In the medium and small cluster systems, the centralized file server such as NFS is the main approach to provide the storage service with low cost and easy management. However, when multiple parallel applications access the shared storage at the same time, the I/O performance decreases much because of the interference of the I/O requests coming from the different clients. In this paper, a hint-based I/O mechanism is proposed and implemented in the United-FS. By analyzing the hint information of the I/O requests, the related requests are grouped, sorted and scheduled by our hint-based I/O scheduler. The experiments show that our hint-based I/O mechanism nearly doubles the read performance compared with NFS, and has better scalability.
Scan is a widely used Design-for-Testability technique to improve test and diagnosis quality. Many defects may cause scan chains to fail. In this paper, an observation point oriented Deterministic Diagnostic Pattern G...
详细信息
Scan is a widely used Design-for-Testability technique to improve test and diagnosis quality. Many defects may cause scan chains to fail. In this paper, an observation point oriented Deterministic Diagnostic Pattern Generation (DDPG) method was proposed for compound defects, which tolerates the system defects during scan chain diagnosis. Instead of sensitizing multiple paths proposed in our prior work, the proposed new DDPG method directly targets as many observation points as possible to observe the loading error occurred on the targeted scan cell. Experimental results on ISCASpsila89 benchmark circuits show that the proposed DDPG method improves the effectiveness and efficiency of diagnosing compound defects, compared to our prior research.
In order to provide high resource utilization and QoS assurance inutility computing hosting concurrently various services, this paper proposes aservice computing framework-RAINBOW for VM(Virtual Machine)-basedutility ...
详细信息
ISBN:
(纸本)9783540898931
In order to provide high resource utilization and QoS assurance inutility computing hosting concurrently various services, this paper proposes aservice computing framework-RAINBOW for VM(Virtual Machine)-basedutility computing. In RAINBOW, we present a priority-based resourcescheduling scheme including resource flowing algorithms (RFaVM) to optimizeresource allocations amongst services. The principle of RFaVM is preferentiallyensuring performance of some critical services by degrading of others to someextent when resource competition arises. Based on our prototype, we evaluateRAINBOW and RFaVM. The experimental results show that RAINBOWwithout RFaVM provides 28%-324% improvements in service performance,and 26% higher the average CPU utilization than traditional service computingframework (TSF) in typical enterprise environment. RAINBOW with RFaVMfurther improves performance by 25%-42% for those critical services whileonly introducing up to 7% performance degradation to others, with 2%-8%more improvements in resource utilization than RAINBOW without RFaVM.
Allocation order is the best for locality, which slide mark compact algorithm is based on. But traditional design made the algorithm's overhead too large. We proposed a fast slide mark compact algorithm, which red...
详细信息
Allocation order is the best for locality, which slide mark compact algorithm is based on. But traditional design made the algorithm's overhead too large. We proposed a fast slide mark compact algorithm, which reduces the overhead by mark bit table, live block pool and offset table. The results show that it achieves up to 8.9% speedup in industry-standard benchmark SPEC JVM98 on the Pentium 4, 11% improvement in dtlb miss numbers and 13.6% reduce with L2 cache miss numbers.
The characteristics of advanced integrated circuit technologies require architects to look for new ways to utilize large numbers of gates and mitigate the effects of high interconnect delays. Chip multiprocessors (CMP...
详细信息
The characteristics of advanced integrated circuit technologies require architects to look for new ways to utilize large numbers of gates and mitigate the effects of high interconnect delays. Chip multiprocessors (CMPs) exploit increasing transistor counts by placing multiple processors on a single die. As the chip multiprocessors (CMPs) have become the trend of high performance microprocessors, the target workloads become more and more diversified. Due to the wire delay problem and diversity of applications, neither private nor shared caches can provide both large capacity and fast access in CMPs. A novel CMP cache design, the heterogeneous CMP cache (HCC) is presented, in which chips are constructed by tiles of two different categories. L2 caches of private tiles provide lowest hit latency and L2 cache of shared tiles increases the effective cache capacity for shared data. Incorporating indirect-index cache technology to share capacity between different hierarchies, HCC provide a both capacity-effective and access-fast on-chip memory subsystem. Detailed full-system simulations are used to analyze the HCC performance for various programs, including SPEC CPU2000, SPLASH2 and commercial workloads. The result shows that HCC improves performance by 16% for single-threaded benchmarks and 9% for multi-thread benchmarks. HCC is easy to implement and the design ideas will be used in the future multi-core processors of Godson series.
With the widespread adoption of embedded microprocessor-based systems in safety critical applications, such as aircrafts, spaceships and nuclear power plants, how to rapidly and conveniently evaluate these fault-toler...
详细信息
With the widespread adoption of embedded microprocessor-based systems in safety critical applications, such as aircrafts, spaceships and nuclear power plants, how to rapidly and conveniently evaluate these fault-tolerant mechanisms with low cost is an important problem. The traditional method requires a detailed hardware protocol to do evaluation, which lengthens evaluation period and increases the cost. A new dependability evaluation technique based on microprocessor function model is proposed, which can evaluate fault-tolerant mechanisms more rapidly, more conveniently and more economically than the conventional systems. As a case for study, the new system evaluates three fault-tolerant techniques;the software redundancy technique, the assertion validation technique and the instruction re-fetching and re-execution technique. The results show that the evaluation is reasonable.
We propose a two-phase test generation method to generate patterns targeting maximal path delay caused by multiple crosstalk effects. A timing analysis method based on transition map is proposed to manage the timing i...
详细信息
We propose a two-phase test generation method to generate patterns targeting maximal path delay caused by multiple crosstalk effects. A timing analysis method based on transition map is proposed to manage the timing information of aggressor lines and victim lines in the first phase, followed by an ordinary ATPG engine with a few alterations in the second phase. This two-phase method avoids complex timing processing in ATPG algorithm. Using transition map instead of timing window in timing analysis, our method can more efficiently calculate the accumulative crosstalk-induced delay and find the sub-paths which cause maximal coupling effects. We can trade off accuracy and efficiency by controlling the size of timescale used in transition map, which makes this approach highly scalable.
Circular self test path (CSTP) is an attractive technique for testing digital integrated circuits(IC) in the nanometer era, because it can easily provide at-speed test with small test data volume and short test applic...
详细信息
Circular self test path (CSTP) is an attractive technique for testing digital integrated circuits(IC) in the nanometer era, because it can easily provide at-speed test with small test data volume and short test application time. However, CSTP cannot reliably attain high fault coverage because of difficulty of testing random-pattern-resistant faults. This paper presents a deterministic CSTP (DCSTP) structure that consists of a DCSTP chain and jumping logic, to attain high fault coverage with low area overhead. Experimental re- sults on ISCAS’89 benchmarks show that 100% fault coverage can be obtained with low area overhead and CPU time, especially for large circuits.
暂无评论