In this paper, dilated embedding and precise embedding of K-ary complete trees into hypercubes are studied. For dilated embedding, a nearly optimal algorithm is proposed which embeds a K-ary complete tree of height h,...
详细信息
ISBN:
(纸本)0818656026
In this paper, dilated embedding and precise embedding of K-ary complete trees into hypercubes are studied. For dilated embedding, a nearly optimal algorithm is proposed which embeds a K-ary complete tree of height h, TK(h), into an (h-1)[logK] + [log(K+2)] dimensional hypercube with dilation max {2, φ(K)*), φ(K+2)}. For precise embedding, we show a (K-1)h+1 dimensional hypercube is large enough to contain TK(h) as its subgraph for K>3.
Synchronizations are necessary when there are dependences between concurrent processes. However, many synchronizations are redundant because the composite effect of the other synchronizations may have already covered ...
详细信息
ISBN:
(纸本)0818656026
Synchronizations are necessary when there are dependences between concurrent processes. However, many synchronizations are redundant because the composite effect of the other synchronizations may have already covered them. In this paper, we investigate the problem of redundant synchronization elimination in DOACROSS loops and preset an algorithm that identifies redundant synchronizations in doubly-nested DOACROSS loops with multiple statements. Nonuniformity in redundancy at the boundaries of the loop iteration space, which is caused primarily by the back-ward dependence directions in some inner loops, is addressed. these results allow a compiler to generate efficient data synchronization instructions for DOACROSS loops.
We introduce a technique for optimizing interprocessor communication in programs for distributed memory multi-processors. Our basic approach is to combine message withthe explicit goal of reducing the over all execut...
详细信息
ISBN:
(纸本)0818656026
We introduce a technique for optimizing interprocessor communication in programs for distributed memory multi-processors. Our basic approach is to combine message withthe explicit goal of reducing the over all execution time, taking into account direct and indirect dependencies among the concurrent units. We first establish that combining messages between a pair of isolated processors is not necessarily useful in reducing the overall execution time of the program because of complex inter-processor dependencies. the conditions under which message combining is profitable are then established. We then search for such conditions along chains of dependences that exist across several processors and combine messages that satisfy these conditions.
this paper proposes the lookahead idea in job allocation in a mesh-connected parallel processor, which has become popular in the design of massively parallel computers. the allocation heuristic can select its strategy...
详细信息
ISBN:
(纸本)0818656026
this paper proposes the lookahead idea in job allocation in a mesh-connected parallel processor, which has become popular in the design of massively parallel computers. the allocation heuristic can select its strategy by lookahead, i.e., looking into the queue of waiting jobs. this approach is shown to be superior to the existing heuristics which do not utilize and lookahead. For efficient implementation, we propose a compact rectangular set representation of the free submesh area which allows the complete submesh detection (also allocation) algorithm(s) to be an order of magnitude improved (in the average case) over the existing approach(es). the proposed representation is also beneficial when considering some currently unexplored issues in processor allocation and practitioner's approach.
this paper we present formal models for message-passing systems that provide both synchronous and asynchronous sends, both blocking and nonblocking sends and receives, and a variety of ordering properties. In addition...
详细信息
ISBN:
(纸本)0818656026
this paper we present formal models for message-passing systems that provide both synchronous and asynchronous sends, both blocking and nonblocking sends and receives, and a variety of ordering properties. In addition, the receive primitives are very general in that they can specify the desired source and/or tag value of a message. Our models apply to all message-passing programs, including ones with errors, and they apply to parallel computers with arbitrary amounts of buffering. To the best of our knowledge, this is the first time that such rich message-passing models have been defined formally. In addition to presenting the formal models, we also give a number of theorems that capture the properties of the models.
As parallel and distributed computers become more widely available and used, the already important process of understanding and debugging concurrent programs will take on even greater importance. We believe that visua...
详细信息
ISBN:
(纸本)0818656026
As parallel and distributed computers become more widely available and used, the already important process of understanding and debugging concurrent programs will take on even greater importance. We believe that visualization can help in the process. In this paper we discuss heretofore unaddressed issues in the visualization of concurrent programs, and present the Animation Choreographer. the Animation choreographer allows users to view, manipulate, and explore the set o alternate feasible orderings of the program execution under study, boththrough the Choreographer interface and in the context of the selected visualizations, thus providing the user with a variety of temporal perspectives on the computation.
Control and data parallelism are two complementary but often mutually exclusive paradigms used to program massively parallel systems. We propose to encapsulate both control and data parallelism in regular classes of a...
详细信息
ISBN:
(纸本)0818656026
Control and data parallelism are two complementary but often mutually exclusive paradigms used to program massively parallel systems. We propose to encapsulate both control and data parallelism in regular classes of a sequential Object Oriented language: a SPMD programming model is used and thus no language extensions are needed, provided a Shared Virtual Memory is available. We show how these ideas are implemented in EPEE, our Eiffel parallel Execution Environment. As an example, we present the implementation of both paradigms on a toy linear of both paradigms on a toy linear algebra example and show how they can interoperate. We conclude with some performance results and prospective remarks.
Transformation techniques are usually applied to get optimal execution rates in parallel and/or pipeline systems. the retiming technique is a common and valuable tool in one-dimensional problems, represented by Data F...
详细信息
ISBN:
(纸本)0818656026
Transformation techniques are usually applied to get optimal execution rates in parallel and/or pipeline systems. the retiming technique is a common and valuable tool in one-dimensional problems, represented by Data Flow Graphs (DFGs) such as DSP filters, which can maximize the parallelism of a loop body represented by a DFG. Since most scientific or DSP applications are recursive or iterative, to increase the parallelism of the loop body can substantially decrease the overall computation time. this paper develops a novel multi-dimensional retiming technique that considers the final schedule as part of the process. the description and the correctness of our algorithm are presented in the paper.
Information users today are hampered by inadequate access to the information they want and need. Broadcast and cable TV offer inflexible scheduling, while dial-up services are still fairly primitive, offering poor vis...
详细信息
ISBN:
(纸本)0818656026
Information users today are hampered by inadequate access to the information they want and need. Broadcast and cable TV offer inflexible scheduling, while dial-up services are still fairly primitive, offering poor visual quality and arcane user interfaces. Videotape rental is time consuming and frustrating and on-line shopping is in its infancy. the network infrastructure now exists to deliver vastly improved versions of these services. What has been lacking is a server system sophisticated enough to cost effectively pump information to the delivery networks. Together, Oracle and nCUBE provide a high-performance, cost-effective solution that enables information providers and users to realize the benefits of interactive multimedia services.
this work presents a parallel implementation of a model-based spectral estimation method, based on the modified covariance parametric algorithm. this method has been implemented in order to take advantage of its resol...
详细信息
ISBN:
(纸本)0818656026
this work presents a parallel implementation of a model-based spectral estimation method, based on the modified covariance parametric algorithm. this method has been implemented in order to take advantage of its resolution benefits when it is applied to spectral estimation in Doppler blood flow instrumentation. As the algorithm is computationally intensive and must be executed in realtime, parallelprocessing techniques have been utilized. Two different approaches are presented, these being implemented on a transputer-based system. Performances analysis of these approaches is presented together with concluding remarks. the results are being applied in the development of a real-time spectrum analyzer for pulsed Doppler blood flow instrumentation.
暂无评论