The routing problem of VLSI layout design is very compute intensive. Consequently, the routing task often turns out to be a bottleneck in the layout design of large circuits. Parallel processing of the routing problem...
详细信息
The routing problem of VLSI layout design is very compute intensive. Consequently, the routing task often turns out to be a bottleneck in the layout design of large circuits. Parallel processing of the routing problem holds promise for mitigating this situation. In this context, we present a parallel channel routing algorithm that is targetted to run on loosely coupled computers like hypercubes. The proposed parallel algorithm employs the simulated annealing technique for obtaining near- optimum solutions. Initially, the number of tracks in the channel is made equal to the number of nets, and partitions of the channel are appropriately assigned to the nodes of the hypercube. Each node carries out concurrent perturbations to obtain new channel states that satisfy the constraints for a given net list. The algorithm minimizes the number of tracks iteratively by using the simulated annealing technique. For efficient execution, we attempt to reduce the communication overheads by restricting the broadcast updates to cases of interprocessor net transfers only. Performance evaluation studies of the algorithm show promising results.
Lightweight threads are becoming increasingly useful for supporting parallelism and asynchronous control structures in applications and language implementations. However, lightweight thread packages for distributed me...
详细信息
ISBN:
(纸本)9780818666056
Lightweight threads are becoming increasingly useful for supporting parallelism and asynchronous control structures in applications and language implementations. However, lightweight thread packages for distributed memory systems have received little attention. We introduce the design of a runtime interface, called Chant, that supports communicating threads in a distributed memory environment. In particular, Chant is layered atop standard message passing and lightweight thread libraries, and supports efficient point-to-point and remote service request communication primitives. We examine the design issues of Chant, the efficiency of its point-to-point communication layer, and the evaluation of scheduling policies to poll for the presence of incoming messages.< >
We present a fast loop parallelization heuristic that assigns separate invocations of a loop to different processors. If the loop contains data dependences between iterations, later iterations can be delayed while awa...
We present a fast loop parallelization heuristic that assigns separate invocations of a loop to different processors. If the loop contains data dependences between iterations, later iterations can be delayed while awaiting a result computed in an earlier iteration. In this paper we study a scheduling problem, called the Delay Problem, that approximates the problem of minimizing the delay in the start time of loops with loop-carried dependences. Our major result is a fast (O(n log 2 n)) time algorithm for the case where the precedence constraints are a forest of in-trees or a forest of out-trees. Since most graphs for the Delay Problem that arise in practice are sparse and consist of such a forest with possibly a few additional edges, this is an important case. We prove that the Delay Problem becomes NP-Complete when the precedence constraints are a set of arbitrary trees. We also prove that the Delay Problem becomes NP-Complete for independent chains when it is generalized to allow either non-unit execution times or release times and deadlines.
In general, electrodynamics requires two independent three-dimensional potential vectors for each Fourier mode, analogous to the position and momentum vectors of particle mechanics. The scalar potential may then be co...
详细信息
In general, electrodynamics requires two independent three-dimensional potential vectors for each Fourier mode, analogous to the position and momentum vectors of particle mechanics. The scalar potential may then be completely eliminated from the theory and a charged-particle Lagrangian will quite naturally contain a spontaneous-symmetry-breaking potential.
A computer code for solving the Reynolds-averaged full Navier-Stokes equations has been developed and applied using H- and C-type grids. The Baldwin-Lomax eddy-viscosity model is used for turbulence closure. The integ...
详细信息
A computer code for solving the Reynolds-averaged full Navier-Stokes equations has been developed and applied using H- and C-type grids. The Baldwin-Lomax eddy-viscosity model is used for turbulence closure. The integration in time is based on an explicit four-stage Runge-Kutta scheme. Local time stepping, variable coefficient implicit residual smoothing, and a full multigrid method have been implemented to accelerate steady-state calculations. A grid independence analysis is presented for a transonic rotor blade. Comparisons with experimental data show that the code is an accurate viscous solver and can give very good blade-to-blade predictions for engineeringapplications.
作者:
BOHM, SELHAKEEM, AKHACHICHA, MDepartment of Electrical and Computer Engineering
Concordia University 1455 De Maisonneuve Blvd. West Montreal H3A 1M8 Canada Was born in Montreal
Canada on 14 September 1966. He received the B. Eng. degree in electrical engineering from Concordia University Montreal Canada in 1989. He is at present completing the M.A.Sc. degree in electrical engineering at Concordia University. (S'75–S'79–M'79–SM'86) received the Ph.D. degree from Southern Methodist University
Dallas TX in 1979. He spent the next two years working as a Visiting Professor in Egypt after which he moved to Ottawa Canada in 1982. He assumed teaching and research positions in Carleton and Manitoba Univerities and later moved to Concordia University Montreal Canada in 1983 where he is now a Professor in the Electrical and Computer Engineering Department. He has published numerous papers in IEEE and international journals in the areas of spread spectrum and networking. He is a well-known expert in these areas and serves as a consultant to many companies. His current research interests include wide-band metropolitan networks switching architectures and performance of on-board multibeam satellites acquisitionless CDMA networks code distribution and orthogonalization of CDMA signals responsive congestion control for ATM-based networks ARQ techniques and investigation of the novel SUGAR CDMA systems in fading channels. Dr. Elhakeem is a Senior Member of the Canadian Electrical Engineering Society and Armed Forces Association. He has chaired numerous technical sessions in IEEE Conferences was the Technical Program Chairman for IEEE Montech 1986 Montreal Canada. Dr. Elhakeem is the key guest editor of theIEEE Journal of Selected Areas in Communicationsfor the May June issues 1993 covering CDMA networks. Advanced Technology & Networks
VISTAR Telecommunications Inc. Ottawa Ontario K1G 3J4 Canada An Associate Director of Advanced Technology & Networks Group
VISTAR Telecommunications Inc. Ottawa Canada. He is also an Adjunct Pr
In this paper, we study the performance of a prioritized on-board baseband switch in conjunction with a multibeam satellite handling integrated services. The services considered for the analysis include voice, video, ...
详细信息
In this paper, we study the performance of a prioritized on-board baseband switch in conjunction with a multibeam satellite handling integrated services. The services considered for the analysis include voice, video, file transfer and interactive data. The prioritized switch uses both input and output buffering, switch speed-up as well as a two-phase head-of-line resolution algorithm, in order to reduce the buffer loss while maintaining acceptable user delays. The minimum required buffer capacity and switch speed-up for each service in a prioritized environment are found under uniform traffic conditions. It is shown that under uniform traffic conditions, only minimal buffering and switch speed-up are needed even for the lowest priority users. The performance dependence on the switch size is also substantially reduced with head of line resolution and buffering even in a prioritized environment.
In this paper we summarize the results of our theoretical investigation into the costs and benefits of extending the conservative simulation window established in a non-aggressive windowing algorithm. There are two pr...
详细信息
ISBN:
(纸本)078031381X
In this paper we summarize the results of our theoretical investigation into the costs and benefits of extending the conservative simulation window established in a non-aggressive windowing algorithm. There are two primary costs incurred by the non-aggressive algorithm: the cost of global synchronization and the cost of blocking due to pessimistic synchronization constraints. As the conservative simulation window is extended processors are required to synchronize less often and parallelism is increased. However, the increased aggressiveness increases the costs associated with state saving and rollbacks. This is the fundamental trade-off we capture analytically.
The authors address the feasibility of a nondedicated parallel processing environment, assuming workstation processes have preemptive priority over parallel tasks. They develop an analytical model to predict parallel ...
详细信息
The authors address the feasibility of a nondedicated parallel processing environment, assuming workstation processes have preemptive priority over parallel tasks. They develop an analytical model to predict parallel job response times. The model provides insight into how significantly the workstation owner interference degrades parallel program performance. A new term, task ratio, which relates the parallel task demand to the mean service demand of nonparallel workstation processes, is introduced. It is proposed that the task ratio is a useful metric for determining how large the demand of a parallel applications must be in order to make efficient use of a nondedicated distributed system.
We present a parallel rendering algorithm targeted to MIMD distributed-memory message-passing architectures. For maximum performance, the algorithm exploits both object-level and image level parallelism. The behavior ...
详细信息
We present a parallel rendering algorithm targeted to MIMD distributed-memory message-passing architectures. For maximum performance, the algorithm exploits both object-level and image level parallelism. The behavior of the algorithm is examined both analytically and experimentally. The results show that the choice of message size has a significant impact on performance. Scalability to large numbers of processors is found to be limited primarily by communication overheads. An experimental implementation for the Intel iPSC/860 confirms the analytical results and demonstrates increasing performance from 1 to 128 processors across a wide range of scene complexities.
The TPC-C benchmark is a new benchmark approved by the TPC council intended for comparing database platforms running a medium complexity transaction processing workload. Some key aspects in which this new benchmark di...
ISBN:
(纸本)9780897915922
The TPC-C benchmark is a new benchmark approved by the TPC council intended for comparing database platforms running a medium complexity transaction processing workload. Some key aspects in which this new benchmark differs from the TPC-A benchmark are in having several transaction types, some of which are more complex than that in TPC-A, and in having data access skew. In this paper we present results from a modelling study of the TPC-C benchmark for both single node and distributed database management systems. We simulate the TPC-C workload to determine expected buffer miss rates assuming an LRU buffer management policy. These miss rates are then used as inputs to a throughput model. From these models we show the following: (i) We quantify the data access skew as specified in the benchmark and show what fraction of the accesses go to what fraction of the data. (ii) We quantify the resulting buffer hit ratios for each relation as a function of buffer size. (iii) We show that close to linear scale-up (about 3% from the ideal) can be achieved in a distributed system, assuming replication of a read-only table. (iv) We examine the effect of packing hot tuples into pages and show that significant price/performance benefit can be thus achieved. (v) Finally, by coupling the buffer simulations with the throughput model, we examine typical disk/memory configurations that maximize the overall price/performance.
暂无评论