the proceedings contain 24 papers. the topics discussed include: typed polyadic pi-calculus in bigraphs;combining Fuzzy logic and behavioral similarity for non-strict program validation;type oriented construction of w...
详细信息
ISBN:
(纸本)1595933883
the proceedings contain 24 papers. the topics discussed include: typed polyadic pi-calculus in bigraphs;combining Fuzzy logic and behavioral similarity for non-strict program validation;type oriented construction of web user interfaces;an abductive framework for A-priori verification of web services;an efficient algorithms for XML type projection;type inference for spreadsheets;polymorphic algebraic data type reconstruction;extracting programs from type class proofs;computing constructor forms with non terminating rewrite programs;open data types and open functions;a historic functional and object-oriented calculus;combining algorithmic debugging and program slicing;practical type inference based on success typings;collection analysis for Horn clause programs;a model type system for multi-level generating extensions with persistent code;a type system equivalent to static single assignment;and poly-controlled partial evaluation.
In 2002, we first brought High Performance Computing (HPC) methods to the college classroom as a way to enrich Computational Science education. through the years, we have continued to facilitate college faculty in sci...
详细信息
ISBN:
(纸本)1595931899
In 2002, we first brought High Performance Computing (HPC) methods to the college classroom as a way to enrich Computational Science education. through the years, we have continued to facilitate college faculty in science, technology, engineering, and mathematics (STEM) disciplines to stay current with HPC methodologies. We have accomplished this by designing and delivering faculty workshops, hosted in a variety of lab settings, as well as by developing tools supporting the technical infrastructure necessary for HPC education, all this without requiring access to traditional HPC computing platforms. In all, we have so far presented 16 professional development workshops for close to 400 predominantly undergraduate STEM faculty. this paper presents the result of internal formative evaluation by workshop instructors and the materials and tools developed during that process. We did this work as part of the National Computational Science Institute (NCSI) and in collaboration withthe following groups: the Minority Serving Institutions - High Performance Computing (MSI-HPC) program of the National Computational Science Alliance the Consortium for Computing Sciences in Colleges (CCSC) the Center for Excellence in High Performance Computing the Oklahoma University Supercomputing symposium series the Super Computing (SC) conference series education program We presented learners with a sequence of interactive, "run it, modify it, build it" open-ended lab exercises drawn from a variety of disciplines. Interactivity means having the ability to change parallel and algorithmic parameters, e.g. running software on more than one machine, using different models, refining the model, changing the problem scale, using different parallel algorithms. there is a lack of scientific parallel curricula suitable for illustrating Computational Science principles in the classroom. We addressed this need by locating, and where necessary creating, suitable open source software, data-sets, and cur
In programming high performance applications, shared address-space platforms are preferable for fine-grained computation, while distributed address-space platforms are more suitable for coarse-grained computation. How...
详细信息
ISBN:
(纸本)9781581135886
In programming high performance applications, shared address-space platforms are preferable for fine-grained computation, while distributed address-space platforms are more suitable for coarse-grained computation. However, currently only distributed address-space systems scale beyond the low hundreds of processors. In this paper we introduce a hybrid architecture that allows users to trade off local memory usage for coherence communication, making possible larger-scale shared memory architectures. We introduce a programming model and examine possible implementations of hardware mechanisms, evaluating some of the trade-offs inherent in each. Preliminary experiments on an application with particularly fine-grained communication requirements indicate that effective placement of directives can reduce coherence communication by more than a factor of 10 for 64 processors.
We parallelize the 'go withthe winners' algorithm of Aldous and Vazirani (in: proceedings of the 35th IEEE symposium on the Foundations of Computer Science, IEEE Computer Society Press, Silver Spring., MD, 19...
详细信息
We parallelize the 'go withthe winners' algorithm of Aldous and Vazirani (in: proceedings of the 35th IEEE symposium on the Foundations of Computer Science, IEEE Computer Society Press, Silver Spring., MD, 1994, pp. 492-501) and analyze the resulting parallel algorithm in the LogP-model (in: proceedings of the Fourth ACM SIGPLAN symposium on principles & practice of parallelprogramming, 1993, pp. 1-12). the main issues in the analysis are load imbalances and communication delays. the result of the analysis is a practical algorithm which, under reasonable assumptions, achieves linear speedup. Finally, we analyze our algorithm for a concrete application: generating models of amorphous chemical structures. (C) 2003 Elsevier Inc. All rights reserved.
ARMI is a communication library that provides a framework for expressing fine-grain parallelism and mapping it to a particular machine using shared-memory and message passing library calls. the library is an advanced ...
详细信息
ISBN:
(纸本)9781581135886
ARMI is a communication library that provides a framework for expressing fine-grain parallelism and mapping it to a particular machine using shared-memory and message passing library calls. the library is an advanced implementation of the RMI protocol and handles low-level details such as scheduling incoming communication and aggregating outgoing communication to coarsen parallelism when necessary. these details can be tuned for different platforms to allow user codes to achieve the highest performance possible without manual modification. ARMI is used by STAPL, our generic parallel library, to provide a portable, user transparent communication layer, We present the basic design as well as the mechanisms used in the current Pthreads/OpenMP, MPI implementations and/or a combination thereof. Performance comparisons between ARMI and explicit use of Pthreads or MPI are given on a variety of machines, including an HP V2200, SGI Origin 3800, IBM Regatta-HPC and IBM RS6000 SP cluster.
Collecting a program's execution profile is important for many reasons: code optimization, memory layout, program debugging and program comprehension. Path based execution profiles are more detailed than count bas...
详细信息
ISBN:
(纸本)9781581135886
Collecting a program's execution profile is important for many reasons: code optimization, memory layout, program debugging and program comprehension. Path based execution profiles are more detailed than count based execution profiles, since they present the order of execution of the various blocks in a program: modules, procedures, basic blocks etc. Recently, online string compression techniques have been employed for collecting compact representations of sequential program executions. In this paper, we show how a similar approach can be taken for shared memory parallel programs. Our compaction scheme yields one to two orders of magnitude compression compared to the uncompressed parallel program trace on some of the SPLASH benchmarks. Our compressed execution traces contain detailed information about synchronization and control/data flow which can be exploited for post-mortem analysis. In particular, information in our compact execution traces are useful for accurate data race detection (detecting unsynchronized shared variable accesses that occurred in the execution).
A design pattern is a mechanism for encapsulating the knowledge of experienced designers into a re-usable artifact. parallel design patterns reflect commonly occurring parallel communication and synchronization struct...
详细信息
ISBN:
(纸本)9781581135886
A design pattern is a mechanism for encapsulating the knowledge of experienced designers into a re-usable artifact. parallel design patterns reflect commonly occurring parallel communication and synchronization structures. Our tools, CO2P3S (Correct Object-Oriented Pattern-based parallelprogramming System) and MetaCO(2)P(3)S, use generative design patterns. A programmer selects the parallel design patterns that are appropriate for an application, and then adapts the patterns for that specific application by selecting from a small set of code-configuration options. CO2P3S then generates a custom framework for the application that includes all of the structural code necessary for the application to ran in parallel. the programmer is only required to write simple code that launches the application and to fill in some application-specific sequential hook routines. We use generative design patterns to take an application specification (parallel design patterns + sequential user code) and use it to generate parallel application code that achieves good performance in shared memory and distributed memory environments. Although our implementations are for Java, the approach we describe is tool and language independent. this paper describes generalizing CO2P3S to generate distributed-memory parallel solutions.
Sensor networks are long-running computer systems with many sensing/compute nodes working to gather information about their environment, process and fuse that information, and in some cases, actuate control mechanisms...
详细信息
ISBN:
(纸本)9781581135886
Sensor networks are long-running computer systems with many sensing/compute nodes working to gather information about their environment, process and fuse that information, and in some cases, actuate control mechanisms in response. Like traditional parallel systems, communication between nodes is of fundamental importance, but is typically accomplished via wireless transceivers. One further key attribute of sensor networks is that they are almost always long-running systems, intended to operate in situ, with minimal direct human intervention, for months or years. this requirement for long-running autonomy mandates careful design of the runtime system that manages applications on each node, to ensure reliability and ease of upgrades over the life of the system. this paper describes Impala, a middleware architecture that enables application modularity, adaptivity, and repair-ability in wireless sensor networks. Impala allows software updates to be received via the node's wireless transceiver and to be applied to the running system dynamically. In addition, Impala also provides an interface for on-the-fly application adaptation in order to improve the performance, energy-efficiency, and reliability of the software system. Impala has been designed to be a part of the ZebraNet mobile sensor network, but we are also prototyping it within HP/Compaq iPAQ Pocket PC handhelds. We present performance data for both real system measurements of the Pocket PC version as well as simulations of a full mobile sensor system deployment. Overall, Impala is a lightweight runtime system that can greatly improve system reliability, performance, and energy-efficiency. the ideas introduced here for sensor networks have applicability more broadly in other long-running autonomous parallel systems as well.
the proceedings contains 14 papers from the conference on the proceedings of the ACM SIGPLAN symposium on principles and practice of parallelprogramming, PPOPP. Topics discussed include: reference idempotency analysi...
详细信息
the proceedings contains 14 papers from the conference on the proceedings of the ACM SIGPLAN symposium on principles and practice of parallelprogramming, PPOPP. Topics discussed include: reference idempotency analysis: a framework for optimizing speculative execution;pointer and escape analysis for multithread programs;language support for motion-order matrices;efficient load balancing for wide-area divide-and-conquer applications;scalable queue-based spin locks with timeout;contention ellimination by replication of sequential sections in distributed shared memory programs;and accurate data redistribution cost estimation in software distributes shared memory systems.
暂无评论