作者:
E. KraemerJ.T. StaskoGraphics
Visualization and Usability Center College of Computing Georgia Institute of Technology Atlanta GA USA
As parallel and distributed computers become more widely available and used, the already important process of understanding and debugging concurrent programs will rake on even greater importance. We believe that visua...
详细信息
As parallel and distributed computers become more widely available and used, the already important process of understanding and debugging concurrent programs will rake on even greater importance. We believe that visualization can help in the process. We discuss heretofore unaddressed issues in the visualization of concurrent programs, and present the Animation Choreographer. the Animation Choreographer allows users to view, manipulate, and explore the set of alternate feasible orderings of the program execution under study, boththrough the Choreographer interface and in the context of the selected visualizations, thus providing the user with a variety of temporal perspectives on the computation.< >
Reconfiguring a faulty hypercube into a maximal incomplete cube tends to lower potential performance degradation, because a hypercube so reconfigured often results in a much larger system than what is attained by any ...
详细信息
ISBN:
(纸本)0818656026
Reconfiguring a faulty hypercube into a maximal incomplete cube tends to lower potential performance degradation, because a hypercube so reconfigured often results in a much larger system than what is attained by any conventional reconfiguration scheme which identifies only complete subcubes. this paper proposes an efficient strategy for identifying all the maximal incomplete subcubes present in a faulty hypercube. the proposed strategy is distributed in that every healthy node executes the same identification algorithm independently at the same time.< >
Describes the design of ScaLAPACK, a scalable software library for performing dense and banded linear algebra computations on distributed memory concurrent computers. the specification of the data distribution has imp...
详细信息
Describes the design of ScaLAPACK, a scalable software library for performing dense and banded linear algebra computations on distributed memory concurrent computers. the specification of the data distribution has important consequences for interprocessor communication and load balance, and hence is a major factor in determining performance and scalability of the library routines. the block cyclic data distribution is adopted as a simple, yet general purpose, way of decomposing block-partitioned matrices. distributed memory versions of the Level 3 BLAS provide an easy and convenient way of implementing the ScaLAPACK routines.< >
Associating sensor measurements with target tracks is a fundamental and challenging problem in multi-target tracking. the problem is even more challenging in the context of sensor networks, since association is couple...
详细信息
We address the problem of finding a maximum matching for a convex bipartite graph on a mesh-connected computer (MCC). We show that this can be done in optimal time on MCC by designing the efficient merge and division ...
详细信息
ISBN:
(纸本)0818656026
We address the problem of finding a maximum matching for a convex bipartite graph on a mesh-connected computer (MCC). We show that this can be done in optimal time on MCC by designing the efficient merge and division schemes in bottom-up and top-down approach respectively.< >
the paper demonstrates the advantages of having two processors in the node of a distributed memory architecture, one for computation and one for communication. the architecture of such a dual-processor node is discuss...
详细信息
the paper demonstrates the advantages of having two processors in the node of a distributed memory architecture, one for computation and one for communication. the architecture of such a dual-processor node is discussed. To exploit fully the potential for parallel execution of computation threads and communication threads, a novel, compiler-optimized IPC mechanism allows for an unbuffered no-wait send and a prefetched receive without the danger of semantics violation. It is shown how an optimized parallel operating system can be constructed such that the application processor's involvement in communication is kept to a minimum while the utilization of both processors is maximized. the MANNA implementation results in an effective message start-up latency of only 1...4 microseconds. It is also shown how the dual-processor node is utilized to efficiently realize virtual shared memory.< >
Machine learning algorithms are in the fastestgrowing fields of interest in recent years. In this work, a semisupervised learning algorithm based on complex networks is adapted to exploit distributedprocessing of ver...
详细信息
Presents a new view of routing messages in interconnection networks based on the known compact interval labeling. the authors propose simple algorithms, encapsulating networks and routing, suitable for a large class o...
详细信息
Presents a new view of routing messages in interconnection networks based on the known compact interval labeling. the authors propose simple algorithms, encapsulating networks and routing, suitable for a large class of topologies. they define a floating rule that unifies the notions of virtual channels and multiple intervals labeling. the introduced approach is applied to some usual structures.< >
the efforts of EuroSys, the professional society for European systems community, towards researching distributed systems, are discussed. As far as publishing about distributed systems is concerned European are way beh...
详细信息
the efforts of EuroSys, the professional society for European systems community, towards researching distributed systems, are discussed. As far as publishing about distributed systems is concerned European are way behind the US. the symposium on Operating Systems Design and Implementation, ACM Transactions on Computer Systems, and the ACM symposium on Operating Systems Principles rank 1st, 8th, and 11th on the CiteSeer impact list. the ACM sigops European workshop is 216th, the International Conference on distributed Computing Systems is 217th, the International symposium on Fault-Tolerant Computing is 270th, and the European Conference on Parrallel processing is 491st. Researchers are trying to change the system from within by creating the Euro Sys professional society, holding workshops, launching the EuroSys conference series, networking and exchanging information, and so on.
Barrier algorithms are central to the performance of numerous algorithms on scalable, high-performance architectures. Numerous barrier algorithms have been suggested and studied for non-uniform memory access (NUMA) ar...
详细信息
Barrier algorithms are central to the performance of numerous algorithms on scalable, high-performance architectures. Numerous barrier algorithms have been suggested and studied for non-uniform memory access (NUMA) architectures, but less work has been done for cache only memory access (COMA) or attraction memory architectures such as the KSR-1. We present two new barrier algorithms that offer the best performance we have recorded on the KSR-1 distributed cache multiprocessor. We discuss the trade-offs and the performance of seven algorithms on two architectures. the new barrier algorithms adapt well to a hierarchical caching memory model and take advantage of parallel communication offered by most multiprocessor interconnection networks. Performance results are shown for a 256-processor KSR-1 and a 20-processor Sequent Symmetry.< >
暂无评论