We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs ( sequences of possibly imperfectly nested loops) for parallelism and lo...
详细信息
We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs ( sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytical model-driven automatic transformation in the polyhedral model-far beyond what is possible by current production compilers. Unlike previous works, our approach is an end-to-end fully automatic one driven by an integer linear optimization framework that takes an explicit view of finding good ways of tiling for parallelism and locality using affine transformations. The framework has been implemented into a tool to automatically generate OpenMP parallel code from C program sections. Experimental results from the tool show very high speedups for local and parallel execution on multi-cores over state-of-the-art compiler frameworks from the research community as well as the best native production compilers. The system also enables the easy use of powerful empirical/iterative optimization for general arbitrarily nested loop sequences.
Applications that combine live data streams with embedded, parallel, and distributed processing are becoming more commonplace. WaveScript is a domain-specific language that brings high-level, type-safe, garbage-collec...
详细信息
Applications that combine live data streams with embedded, parallel, and distributed processing are becoming more commonplace. WaveScript is a domain-specific language that brings high-level, type-safe, garbage-collected programming to these domains. This is made possible by three primary implementation techniques, each of which leverages characteristics of the streaming domain. First, we employ a novel evaluation strategy that uses a combination of interpretation and reification to partially evaluate programs into stream dataflow graphs. Second, we use profile-driven compilation to enable many optimizations that are normally only available in the synchronous (rather than asynchronous) dataflow domain. Finally, we incorporate an extensible system for rewrite rules to capture algebraic properties in specific domains (such as signal processing). We have used our language to build and deploy a sensor-network for the acoustic localization of wild animals, in particular, the Yellow-Bellied marmot. We evaluate WaveScript's performance on this application, showing that it yields good performance on both embedded and desktop-class machines, including distributed execution and substantial parallel speedups. Our language allowed us to implement the application rapidly, while outperforming a previous C implementation by over 35%, using fewer than half the lines of code. We evaluate the contribution of our optimizations to this success.
When scripts in untyped languages grow into large programs, maintaining them becomes difficult. A lack of types in typical scripting languages means that programmers must (re)discover critical pieces of design informa...
详细信息
Microfluidics has enabled lab-on-a-chip technology to miniaturize and integrate biological and chemical analyses to a single chip comprising channels, valves, mixers, heaters, separators, and sensors. Recent papers ha...
详细信息
Microfluidics has enabled lab-on-a-chip technology to miniaturize and integrate biological and chemical analyses to a single chip comprising channels, valves, mixers, heaters, separators, and sensors. Recent papers have proposed programmable labs-on-a-chip as an alternative to traditional application-specific chips to reduce design effort, time, and cost. While these previous papers provide the basic support for programmability, this paper identifies and addresses a practical issue, namely, fluid volume management. Volume management addresses the problem that the use of a fluid depletes it and unless the given volume of a fluid is distributed carefully among all its uses, execution may run out of the fluid before all its uses are complete. Additionally, fluid volumes should not overflow (i.e., exceed hardware capacity) or underflow (i.e., fall below hardware resolution). We show that the problem can be formulated as a linear programming problem ( LP). Because LP's complexity and slow execution times in practice may be a concern, we propose another approach, called DAGSolve, which over-constrains the problem to achieve linear complexity while maintaining good solution quality. We also propose two optimizations, called cascading and static replication, to handle cases involving extreme mix ratios and numerous fluid uses which may defeat both LP and DAGSolve. Using some real-world assays, we show that our techniques produce good solutions while being faster than LP.
Developers commonly build contemporary enterprise applications using type-safe, component-based platforms, such as J2EE, and architect them to comprise multiple tiers, such as a web container, application server, and ...
详细信息
Developers commonly build contemporary enterprise applications using type-safe, component-based platforms, such as J2EE, and architect them to comprise multiple tiers, such as a web container, application server, and database engine. Administrators increasingly execute each tier in its own managed runtime environment (MRE) to improve reliability and to manage system complexity through the fault containment and modularity offered by isolated MRE instances. Such isolation, however, necessitates expensive cross-tier communication based on protocols such as object serialization and remote procedure calls. Administrators commonly co-locate communicating MREs on a single host to reduce communication overhead and to better exploit increasing numbers of available processing cores. However, state-of-the-art MREs offer no support for more efficient communication between co-located MREs, while fast inter-process communication mechanisms, such as shared memory, are widely available as a standard operating system service on most modern platforms. To address this growing need, we present the design and implementation of XMem - type-safe, transparent, shared memory support for co-located MREs. XMem guarantees type-safety through coordinated, parallel, multi-process class loading and garbage collection. To avoid introducing any level of indirection, XMem manipulates virtual memory mapping. In addition, object sharing in XMem is fully transparent: shared objects are identical to local objects in terms of field access, synchronization, garbage collection, and method invocation, with the only difference being that shared-to-private pointers are disallowed. XMem facilitates easy integration and use by existing communication technologies and software systems, such as RMI, JNDI, JDBC, serialization/XML, and network sockets. We have implemented XMem in the open-source, production-quality HotSpot Java Virtual Machine. Our experimental evaluation, based on core communication technologies un
In this paper we propose the Merge framework, a general purpose programming model for heterogeneous multi-core systems. The Merge framework replaces current ad hoc approaches to parallel programming on heterogeneous p...
详细信息
In this paper we propose the Merge framework, a general purpose programming model for heterogeneous multi-core systems. The Merge framework replaces current ad hoc approaches to parallel programming on heterogeneous platforms with a rigorous, library-based methodology that can automatically distribute computation across heterogeneous cores to achieve increased energy and performance efficiency. The Merge framework provides (1) a predicate dispatch-based library system for managing and invoking function variants for multiple architectures;(2) a high-level, library-oriented parallel language based on map-reduce;and (3) a compiler and runtime which implement the map-reduce language pattern by dynamically selecting the best available function implementations for a given input and machine configuration. Using a generic sequencer architecture interface for heterogeneous accelerators, the Merge framework can integrate function variants for specialized accelerators, offering the potential for to-the-metal performance for a wide range of heterogeneous architectures, all transparent to the user. The Merge framework has been prototyped on a heterogeneous platform consisting of an Intel Core 2 Duo CPU and an 8-core 32-thread Intel Graphics and Media Accelerator X3000, and a homogeneous 32-way Unisys SNIP system with Intel Xeon processors. We implemented a set of benchmarks using the Merge framework and enhanced the library with X3000 specific implementations, achieving speedups of 3.6x - 8.5x using the X3000 and 5.2x - 22x using the 32-way system relative to the straight C reference implementation on a single IA32 core.
EventScript is a simple but powerful language for programming reactive processes. A stream of incoming events is matched against a regular expression. Actions embedded within the regular expression are executed in res...
详细信息
ISBN:
(纸本)9781605581040
EventScript is a simple but powerful language for programming reactive processes. A stream of incoming events is matched against a regular expression. Actions embedded within the regular expression are executed in response to the matching of patterns of events. These actions include assigning computed values to variables and emitting output events. The definition of EventScript presented a number of novel and interesting language-design choices. EventScript has an efficient implementation, and has been used in a development environment for complex event-based applications. We have used EventScript to program both small examples and large industrial applications. Readers of EventScript programs find them easy to understand, and are comfortable with the familiar model of matching regular expressions.
Applications that combine live data streams with embedded, parallel, and distributed processing are becoming more commonplace. WaveScript is a domain-specific language that brings high-level, type-safe, garbage-collec...
详细信息
ISBN:
(纸本)9781605581040
Applications that combine live data streams with embedded, parallel, and distributed processing are becoming more commonplace. WaveScript is a domain-specific language that brings high-level, type-safe, garbage-collected programming to these domains. This is made possible by three primary implementation techniques, each of which leverages characteristics of the streaming domain. First, we employ a novel evaluation strategy that uses a combination of interpretation and reification to partially evaluate programs into stream dataflow graphs. Second, we use profile-driven compilation to enable many optimizations that are normally only available in the synchronous (rather than asynchronous) dataflow domain. Finally, we incorporate an extensible system for rewrite rules to capture algebraic properties in specific domains (such as signal processing). We have used our language to build and deploy a sensor-network for the acoustic localization of wild animals, in particular, the Yellow-Bellied marmot. We evaluate WaveScript's performance on this application, showing that it yields good performance on both embedded and desktop-class machines, including distributed execution and substantial parallel speedups. Our language allowed us to implement the application rapidly, while outperforming a previous C implementation by over 35%, using fewer than half the lines of code. We evaluate the contribution of our optimizations to this success.
I worked on Lisp design and implementation from the late 1960s almost until I retired about 5 years ago - and since then I've remained in the community by helping organize Lisp conferences. This means I've bee...
详细信息
This paper presents a software transactional memory system that introduces first-class C++ language constructs for transactional programming. We describe new C++ language extensions, a production-quality optimizing C+...
详细信息
暂无评论