This chapter presents a formalization of functional and behavioural requirements, and a refinement of requirements to a design for CoCoME using the Relational Calculus of Object and Component Systems (rCOS). We give a...
详细信息
An improved high fan-in domino circuit is proposed. The nMOS pull-down network of the circuit is divided into several blocks to reduce the capacitance of the dynamic node and each block only needs a small keeper trans...
详细信息
An improved high fan-in domino circuit is proposed. The nMOS pull-down network of the circuit is divided into several blocks to reduce the capacitance of the dynamic node and each block only needs a small keeper transistor to maintain the noise margin. Because we omit the footer transistor, the circuit has better performance than the standard domino circuit. A 64-input OR-gate implemented with the structure is simulated using HSPICE under typical conditions of 0.13m CMOS technology. The average delay of the circuit is 63.9ps, the average power dissipation is 32.4W, and the area is 115m2. Compared to compound domino logic, the proposed circuit can reduce delay and power dissipation by 55% and 38%, respectively.
<正>This paper designs a 64-bit floating-point reciprocal and square root reciprocal unit of a stream processor (FT64),which combines the methods of table look-up and functional iteration to implement division and...
详细信息
<正>This paper designs a 64-bit floating-point reciprocal and square root reciprocal unit of a stream processor (FT64),which combines the methods of table look-up and functional iteration to implement division and square root *** unit which is implemented with two pipeline stages provides the initial value for the iteration of division and square root.A semi-custom and foll-custom mixed design method is adopted to improve its performance,and a mixed verification method is also proposed to verify the *** results of verification show that the unit can achieve the performance of 1GHz under the typical condition of 0.13μm CMOS technology.
This paper designs a 64-bit floating-point reciprocal and square root reciprocal unit of a stream processor (FT64), which combines the methods of table look-up and functional iteration to implement division and square...
详细信息
ISBN:
(纸本)9781424421855
This paper designs a 64-bit floating-point reciprocal and square root reciprocal unit of a stream processor (FT64), which combines the methods of table look-up and functional iteration to implement division and square root operations. This unit which is implemented with two pipeline stages provides the initial value for the iteration of division and square root. A semi-custom and full-custom mixed design method is adopted to improve its performance, and a mixed verification method is also proposed to verify the unit. The results of verification show that the unit can achieve the performance of 1 GHz under the typical condition of 0.13 ¿m CMOS technology.
The performance gap between processor and memory keeps expanding and memory access continues to be the crucial bottleneck of program performance. Traditionally, this problem is mitigated with cache technique. Stream p...
详细信息
The performance gap between processor and memory keeps expanding and memory access continues to be the crucial bottleneck of program performance. Traditionally, this problem is mitigated with cache technique. Stream processing is another approach that tackles this problem and has shown its effectiveness in reducing the number of memory accesses for media applications. Whether it is effective in reducing the memory traffic of scientific application is a question. This paper tries to investigate this problem. It first comparatively analyzes the memory hierarchy organization and the data access pattern of the Imagine stream processor and conventional cache based processors. Then it performs experiments on Imagine and a contrastive cache based general purpose processor (Intel Pentium M) with five typical scientific programs. The data obtained on two processors are compared against each other, with special focus on data access efficiency. The results show that data traffic between the LRF (local register file) and the SRF (stream register file) are effectively reduced on Imagine. But SRF of Imagine alone can not effectively reduce the number of off-chip memory accesses. Off-chip memory access still accounts for a large fraction of the total runtime on Imagine, as far as the programs are evaluated.
Fault tolerance is a critical issue in the arena of large-scale computing. The fault-tolerant parallel algorithm (FTPA) is an application-level technique for tolerating hardware failures. FTPA achieves fast failure re...
详细信息
Fault tolerance is a critical issue in the arena of large-scale computing. The fault-tolerant parallel algorithm (FTPA) is an application-level technique for tolerating hardware failures. FTPA achieves fast failure recovery making use of parallel recomputing. However, it complicates the coding of the application program. This paper uses compiler technology to automate the design of FTPA, and introduces the implementation of a tool called GiFT (Get it Fault-Tolerant). GiFT utilizes the extended data-flow analysis to choose the state needed by failure recovery, exploits the parallel recomputing time model to compute the optimal number of recomputing processes, and uses parallelization technologies to generate parallel recomputing codes. The experimental results show that original MPI programs can be transformed into the FTPA counterparts by GiFT correctly, and the performance of GiFT-generated FTPA programs is comparable to the performance of hand-modified FTPA programs.
In most DHTs proposed so far, all nodes are assumed to be homogeneous, and all messages are routed using a common algorithm. In practice, however, nodes in large-scale systems might be heterogeneous with respect to th...
详细信息
In most DHTs proposed so far, all nodes are assumed to be homogeneous, and all messages are routed using a common algorithm. In practice, however, nodes in large-scale systems might be heterogeneous with respect to their capabilities, reputations, affiliations of administrative domains, and so on, which consequently makes it preferable to distinguish the heterogeneity of participant nodes. To achieve this, in this paper we present grouped tapestry (GTap), a novel tapestry-based DHT that supports organizing nodes into groups and allows flexible DHT routing. The effectiveness of our proposals is demonstrated through theoretical analysis and extensive simulations.
This paper proposes an optimistic data consistency method according to the question about data dependence in data consistency. In the method, data object is partitioned into data blocks by fixed size as the basic unit...
详细信息
This paper proposes an optimistic data consistency method according to the question about data dependence in data consistency. In the method, data object is partitioned into data blocks by fixed size as the basic unit of data management. Updates are compressed by Bloom filter technique and propagated in double-path. Negotiation algorithms detect and reconcile update conflicts, and dynamic data management algorithms accommodate dynamic data processing. The results of the performance evaluation show that it is an efficient method to achieve consistency, good dynamic property, and strong robustness when choosing the size of data block appropriately. At the same time, a feasible way is put forward on how to choose appropriate data block size.
The features of simple description, small updates item and weak dependence are the main characteristics of updates of key-attributes in P2P systems. Accordingly, an optimistic data consistency maintenance method based...
详细信息
The features of simple description, small updates item and weak dependence are the main characteristics of updates of key-attributes in P2P systems. Accordingly, an optimistic data consistency maintenance method based on key-attributes is proposed. In the method, the update of key-attributes is separated from user update requests. Key-Updates are propagated by latency-overlay update propagation model, that is, updates are always propagated to the nodes having maximum or minimum latency, and assured and uncertain propagation paths of updates are all taken into account. Based on classifying key-update conflicts, a double-level reconciling mechanism including buffer preprocessing and update-log processing is applied to detect and reconcile conflicts, and then conflicts are solved by policies as last-writer-win and divide-and-rule. Lastly, update-log management method and maintenance method brought by node failure and network partitioning are discussed for the above is deployed based on the information storied in update-log. Delaying key-attributes updates cannot occur by the optimistic disposal method, and then it cannot depress efficiency of resource location based on key-attributes, which adapts well to P2P systems for Internet. The simulation results show that it is an effective optimistic data consistency maintenance method, achieving good consistency overhead, resource location and resource access overhead, and having strong robustness.
Recent studies on network traffic have shown that self-similar is very popular, and the character will not be changed during buffering, switching and transmitting. The character self-similar must be considered in netw...
详细信息
Recent studies on network traffic have shown that self-similar is very popular, and the character will not be changed during buffering, switching and transmitting. The character self-similar must be considered in network traffic prediction. This paper analyzed and summarized the research results of self-similar network traffic prediction from the fields of self-similar modeling, parameter computing and performance prediction. An equivalent bandwidth algorithm of self-similar traffic prediction based on measurement was put forward. Our analysis has shown that the algorithm can effectively reduce computing and realizing complexities.
暂无评论