This paper explores high-performance central processing unit (CPU) design with VLSI CMOS. Workstations are the focus, because they were first to apply the synergism of CMOS, VLSI, and reduced-instruction-set computing...
详细信息
This paper explores high-performance central processing unit (CPU) design with VLSI CMOS. Workstations are the focus, because they were first to apply the synergism of CMOS, VLSI, and reduced-instruction-set computing (RISC). But the advances of CMOS now encompass all computing system design, and extend to newly created environments. We discuss CMOS extendibility in the highest-performance areas.
Exactly rounded results are necessary for many architectures such as IEEE 754 standard. For division and square root, rounding is easy to perform if a remainder is available. But for quadratically converging algorithm...
详细信息
Exactly rounded results are necessary for many architectures such as IEEE 754 standard. For division and square root, rounding is easy to perform if a remainder is available. But for quadratically converging algorithms, the remainder is not typically calculated. Past implementations have required the additional delay to calculate the remainder, or calculate the approximate solution to twice the accuracy, or have resulted in a close but not exact solution. This paper shows how the additional delay of calculating the remainder can be reduced if extra precision is available.
The time taken for processor simulation can be drastically reduced by selecting simulation points, which are dynamic sections obtained from the simulation result of processors. The overall behavior of the program can ...
详细信息
Proper distribution of operations among parallel processors in a large scientific computation executed on a distributed-memory machine can significantly reduce the total computation time. In this paper, we propose an ...
详细信息
Proper distribution of operations among parallel processors in a large scientific computation executed on a distributed-memory machine can significantly reduce the total computation time. In this paper, we propose an operation called simultaneous parallel reduction(SPR), that is amenable to such optimization. SPR performs reduction operations in parallel, each operation reducing a one-dimensional consecutive section of a distributed array. Each element of the distributed array is used as an operand to many reductions executed concurrently over the overlapping array's sections. SPR is distinct from a more commonly considered parallel reduction which concurrently evaluates a single reduction. In this paper we consider SPR on Single Instruction Multiple Data (SIMD) machines with different interconnection networks. We focus on SPR over sections whose size is not a power of 2 with the result shifted relative to the arguments. Several algorithms achieving some of the lower bounds on SPR complexity are presented under various assumptions about the properties of the binary operator of the reduction and of the communication cost of the target architectures.
In today's competitive semiconductor business environment, wafer manufacturers are facing continuous pressure to accurately predict cycle time and tool utilization, gauge the impact of changes in capacity availabl...
详细信息
ISBN:
(纸本)9781424427086
In today's competitive semiconductor business environment, wafer manufacturers are facing continuous pressure to accurately predict cycle time and tool utilization, gauge the impact of changes in capacity available, assess the impact of changes in product mix and quantity, and determine action plans to improve operational performance. Discrete event simulation (DES) is a widely used approach to perform such an analysis. However, DES has some inherent shortcomings for these planning tasks. Analytical models, like queueing networks, have much shorter response times and additional advantages compared to DES. But due to the complexity of semiconductor manufacturing systems (SMS) queueing models were not able to model all the peculiarities of those. This paper provides an overview of the main features of the ibm Enterprise Production planning and Optimization System (EPOS), a queueing network based system, which closes this gap. EPOS has been in use in the 300 mm fabrication of ibm in Fishkill for more than 2 years and has turned out to be an invaluable tool to analyze the trade-offs of cycle time and capacity within this complex environment.
A simple extension of the critical path method is presented which allows more accurate optimization of circuits with level-sensitive latches. The extended formulation provides a sufficient set of constraints to ensure...
详细信息
ISBN:
(纸本)9780897916905
A simple extension of the critical path method is presented which allows more accurate optimization of circuits with level-sensitive latches. The extended formulation provides a sufficient set of constraints to ensure that, when all slacks are non-negative, the corresponding circuit will be free of late signal timing problems. Cycle stealing is directly permitted by the formulation. However, moderate restrictions may be necessary to ensure that the timing constraint graph is acyclic. Forcing the constraint graph to be acyclic allows a broad range of existing optimization algorithms to be easily extended to better optimize circuits with level-sensitive latches. We describe the extension of two such algorithms, both of which attempt to solve the problem of selecting parts from a library to minimize area subject to a cycle time constraint.
The problem of analyzing simulations in a single replication is considered. This type of analysis would facilitate efficient examination of a number of simulated process control strategies over a short horizon. For su...
详细信息
The problem of analyzing simulations in a single replication is considered. This type of analysis would facilitate efficient examination of a number of simulated process control strategies over a short horizon. For such problems, a detailed simulation study would not be feasible, and ideally a single simulation replication per alternative would be desired. The ultimate goal is to establish a methodology for single-replication simulation using control or alteration of the underlying stochastic processes. The proposed methodologies consist of two approaches for altering the simulation model while adequately modeling the likely behavior of the real system. The first approach eliminates certain sources of variation; for instance, events of low probability that have substantial effect on the system output are ignored. The second approach simply alters other sources of variation of some probability distributions.< >
暂无评论