We parallelize the 'go withthe winners' algorithm of Aldous and Vazirani (in: proceedings of the 35th IEEE symposium on the Foundations of Computer Science, IEEE Computer Society Press, Silver Spring., MD, 19...
详细信息
We parallelize the 'go withthe winners' algorithm of Aldous and Vazirani (in: proceedings of the 35th IEEE symposium on the Foundations of Computer Science, IEEE Computer Society Press, Silver Spring., MD, 1994, pp. 492-501) and analyze the resulting parallel algorithm in the LogP-model (in: proceedings of the Fourthacmsigplansymposium on principles & practice of parallelprogramming, 1993, pp. 1-12). the main issues in the analysis are load imbalances and communication delays. the result of the analysis is a practical algorithm which, under reasonable assumptions, achieves linear speedup. Finally, we analyze our algorithm for a concrete application: generating models of amorphous chemical structures. (C) 2003 Elsevier Inc. All rights reserved.
Sensor networks are long-running computer systems with many sensing/compute nodes working to gather information about their environment, process and fuse that information, and in some cases, actuate control mechanisms...
详细信息
ISBN:
(纸本)9781581135886
Sensor networks are long-running computer systems with many sensing/compute nodes working to gather information about their environment, process and fuse that information, and in some cases, actuate control mechanisms in response. Like traditional parallel systems, communication between nodes is of fundamental importance, but is typically accomplished via wireless transceivers. One further key attribute of sensor networks is that they are almost always long-running systems, intended to operate in situ, with minimal direct human intervention, for months or years. this requirement for long-running autonomy mandates careful design of the runtime system that manages applications on each node, to ensure reliability and ease of upgrades over the life of the system. this paper describes Impala, a middleware architecture that enables application modularity, adaptivity, and repair-ability in wireless sensor networks. Impala allows software updates to be received via the node's wireless transceiver and to be applied to the running system dynamically. In addition, Impala also provides an interface for on-the-fly application adaptation in order to improve the performance, energy-efficiency, and reliability of the software system. Impala has been designed to be a part of the ZebraNet mobile sensor network, but we are also prototyping it within HP/Compaq iPAQ Pocket PC handhelds. We present performance data for both real system measurements of the Pocket PC version as well as simulations of a full mobile sensor system deployment. Overall, Impala is a lightweight runtime system that can greatly improve system reliability, performance, and energy-efficiency. the ideas introduced here for sensor networks have applicability more broadly in other long-running autonomous parallel systems as well.
the proceedings contains 14 papers from the conference on the proceedings of the acmsigplansymposium on principles and practice of parallelprogramming, PPOPP. Topics discussed include: reference idempotency analysi...
详细信息
the proceedings contains 14 papers from the conference on the proceedings of the acmsigplansymposium on principles and practice of parallelprogramming, PPOPP. Topics discussed include: reference idempotency analysis: a framework for optimizing speculative execution;pointer and escape analysis for multithread programs;language support for motion-order matrices;efficient load balancing for wide-area divide-and-conquer applications;scalable queue-based spin locks with timeout;contention ellimination by replication of sequential sections in distributed shared memory programs;and accurate data redistribution cost estimation in software distributes shared memory systems.
this paper presents a new combined pointer and escape analysis for multithreaded programs. the algorithm uses a new abstraction called parallel interaction graphs to analyze the interactions between threads and extrac...
详细信息
ISBN:
(纸本)9781581133462
this paper presents a new combined pointer and escape analysis for multithreaded programs. the algorithm uses a new abstraction called parallel interaction graphs to analyze the interactions between threads and extract precise points-to, escape, and action ordering information for objects accessed by multiple threads. the analysis is compositional, analyzing each method or thread once to extract a parameterized analysis result that can be specialized for use in any context. It is also capable of analyzing programs. that use the unstructured form of multithreading present in languages such as Java and standard threads packages such as POSIX threads. We have implemented the analysis in the MIT Flex compiler for Java and used the extracted information to 1) verify that programs correctly use region-based allocation constructs, 2) eliminate dynamic checks associated withthe use of regions, and 3) eliminate unnecessary synchronization. Our experimental results show that analyzing the interactions between threads significantly increases the effectiveness of the region analysis and region check elimination, but has little effect for synchronization elimination.
We present a new parallel computational model, named Log-GPS, which captures synchronization. the LogGPS model is an extension of the LogGP model, which abstracts communication on parallel platforms. Although the LogG...
详细信息
ISBN:
(纸本)9781581133462
We present a new parallel computational model, named Log-GPS, which captures synchronization. the LogGPS model is an extension of the LogGP model, which abstracts communication on parallel platforms. Although the LogGP model captures long messages with one bandwidth parameter (G), it does not capture synchronization that is needed before sending a long message by high-level communication libraries. Our model has one additional parameter, S, defined as the threshold for message length, above which synchronous messages are sent. We also present some experimental results using both models. the results include (1) a verification of the LogGPS model, (2) an example of synchronization analysis using an MPI program and (3) a comparison of the models. the results indicate that the LogGPS model is more accurate than the LogGP model, and analyzing synchronization costs is important when improving parallel program performance.
Current trends in high performance computing suggest that users will soon have widespread access to clusters of multiprocessors with hundreds, if not thousands, of processors. this unprecedented degree of parallelism ...
详细信息
ISBN:
(纸本)9781581133462
Current trends in high performance computing suggest that users will soon have widespread access to clusters of multiprocessors with hundreds, if not thousands, of processors. this unprecedented degree of parallelism will undoubtedly expose scalability limitations in existing applications, where scalability is the ability of a parallel algorithm on a parallel architecture to effectively utilize an increasing number of processors. Users will need precise and automated techniques for detecting the cause of limited scalability. this paper addresses this dilemma. First, we argue that users face numerous challenges in understanding application scalability: managing substantial amounts of experiment data, extracting useful trends from this data, and reconciling performance information withtheir application's design. Second, we propose a solution to automate this data analysis problem by applying fundamental statistical techniques to scalability experiment data. Finally, we evaluate our operational prototype on several applications, and show that statistical techniques offer an effective strategy for assessing application scalability. In particular, we find that non-parametric correlation of the number of tasks to the ratio of the time for communication operations to overall communication time provides a reliable measure for identifying communication operations that scale poorly.
Networks of workstations (NOWs), which are generally composed of autonomous compute elements networked together, axe an attractive parallel computing platform since they offer high performance at low cost. the autonom...
详细信息
ISBN:
(纸本)9781581133462
Networks of workstations (NOWs), which are generally composed of autonomous compute elements networked together, axe an attractive parallel computing platform since they offer high performance at low cost. the autonomous nature of the environment, however, often results in inefficient utilization due to load imbalances caused by three primary factors: 1) unequal load (compute or communication) assignment to equally-powerful compute nodes, 2) unequal resources at compute nodes, and 3) multiprogramming. these load imbalances result in idle waiting time on cooperating processes that need to synchronize or communicate data. Additional waiting time may result due to local scheduling decisions in a multiprogrammed environment. In this paper, we present a combined approach of compile-time analysis, run-time load distribution, and operating system scheduler cooperation for improved utilization of available resources in an autonomous NOW. the techniques we propose allow efficient resource utilization by taking into consideration all three causes of load imbalance in addition to locality of access in the process of load distribution. the resulting adaptive load distribution and cooperative scheduling system allows applications to take advantage of parallel resources when available by providing better performance than when the loaded resources axe not used at all.
In shared memory programs contention often occurs at the transition between a sequential and a parallel section of the code. As all threads start executing the parallel section, they often access data just modified by...
详细信息
ISBN:
(纸本)9781581133462
In shared memory programs contention often occurs at the transition between a sequential and a parallel section of the code. As all threads start executing the parallel section, they often access data just modified by the thread that executed the sequential section, causing a flurry of data requests to converge on that processor. We address this problem in a software distributed shared memory system by replicating the execution of the sequential sections on all processors. Communication during this replicated sequential execution is reduced by using multicast. We have implemented replicated sequential execution with multicast support in OpenMP/NOW, a version of of OpenMP that runs on networks of workstations. We do not rely on compile-time data analysis, and therefore we can handle irregular and pointer-based applications. We show significant improvement for two pointer-based applications that suffer from severe contention without replicated sequential execution.
暂无评论