Most Java-based systems that support portable parallel and distributed computing either require the programmer to deal with intricate low-level details of Java which can be a tedious, time-consuming and error-prone ta...
详细信息
Most Java-based systems that support portable parallel and distributed computing either require the programmer to deal with intricate low-level details of Java which can be a tedious, time-consuming and error-prone task, or prevent the programmer from controlling locality of data. In this paper we describe JavaSymphony, a programming paradigm for distributed and parallel computing that provides a software infrastructure for wide classes of heterogeneous systems ranging from small-scale cluster computing to large scale wide-area meta-computing. The software infrastructure is written entirely in Java and runs on any standard compliant Java virtual machine. In contrast to most existing systems, JavaSymphony provides the programmer with the flexibility to control data locality and load balancing by explicit mapping of objects to computing nodes. Virtual architectures are specified to impose a virtual hierarchy on a distributed system of physical computing nodes. Objects can be mapped and dynamically migrated to arbitrary components of virtual architectures. A high-level API to hardware/software system parameters is provided to control mapping, migration, and load balancing of objects. Objects can interact through synchronous asynchronous and one-sided method invocation. Selective remote class loading may reduce the overall memory requirement of an application. Moreover; objects can be made persistent by explicitly storing and loading objects to/from external storage. A prototype of the JavaSymphony software infrastructure has been implemented. Preliminary experiments on a heterogeneous cluster of workstations are described that demonstrate reasonable performance values.
This paper will outline various types of parallel test, discuss an adaptation of Amdahl's law to parallel test, and discuss possible extensions to ATML for parallel test. Amdahl's law is an equation in compute...
详细信息
This paper will outline various types of parallel test, discuss an adaptation of Amdahl's law to parallel test, and discuss possible extensions to ATML for parallel test. Amdahl's law is an equation in computer science that is used to derive the speedup gained through parallelizing the software; it expresses the speedup as a function of number of processors. parallel test increases the throughput and efficiency of test systems. Amdahl's law can be adapted to analyze the speedup of a test system that makes use of parallelism. parallel test is most commonly thought of as multiple units under test (UUT) to be tested concurrently. Other types of parallel test include a single UUT that could have multiple tests run on it concurrently; and the third type of parallelism that can be exploited is a system that could concurrently execute its actions. The test community has been moving to create more portable test information with ATML. One component of the ATML family of standards is the ATML TestDescription (IEEE P1671.1). It is possible to extend the Test Description to explicitly define parallelism in test. The directed graph is a technique that can be used to express all of the test information for a TPS. Extending the ATML test description can be accomplished by utilizing aspects of other XML based standards for directed graphs like GraphML, that have better capabilities for conveying the information.
In a rural radio telephony system design it is proposed that each village be provided with one radio unit consisting of two transceivers, two omnidirectional antennas, at least one telephone, and a control unit. With ...
详细信息
In a rural radio telephony system design it is proposed that each village be provided with one radio unit consisting of two transceivers, two omnidirectional antennas, at least one telephone, and a control unit. With such equipment a cluster of villages will form a local area radio network. Clusters of villages will be linked via gateways, with each gateway linking such a local network to the existing PSTN. This paper concentrates on the software side, describing the proposed call setup and routing schemes, how the control program is envisaged to work, and the network simulation using transputers.< >
Transactional memory (TM) has been proposed to address some of the programmability issues of chip multiprocessors. Hardware implementations of transactional memory (HTMs) have made significant progress in providing su...
详细信息
ISBN:
(纸本)9781424456581
Transactional memory (TM) has been proposed to address some of the programmability issues of chip multiprocessors. Hardware implementations of transactional memory (HTMs) have made significant progress in providing support for features such as long transactions that spill out of the cache, and context switches, page and thread migration in the middle of transactions. While essential for the adoption of HTMs in real products, supporting these features has resulted in significant state overhead. For instance, TokenTM adds at least 16 bits per block in the caches which is significant in absolute terms, and steals 16 of 64 (25%) memory ECC bits per block, weakening error protection. Also, the state bits nearly double the tag array size. These significant and practical concerns may impede the adoption of HTMs, squandering the progress achieved by HTMs. The overhead comes from tracking the thread identifier and the transactional read-sharer count at the Ll-block granularity. The thread identifier is used to identify the transaction, if only one, to which an Ll-evicted block belongs. The read-sharer count is used to identify conflicts involving multiple readers (i.e., write to a block with non-zero count). To reduce this overhead, we observe that the thread identifiers and read-sharer counts are not needed in a majority of cases. (1) Repeated misses to the same blocks are rare within a transaction (i.e., locality holds). (2) Transactional read-shared blocks that both are evicted from multiple sharers' Lis and are involved in conflicts are rare. Exploiting these observations, we propose a novel HTM, called LiteTM, which completely eliminates the count and identifier and uses software to infer the lost information. Using simulations of the STAMP benchmarks running on 8 cores, we show that LiteTM reduces TokenTM's state overhead by about 87% while performing within 4%, on average, and 10%, in the worst case, of TokenTM.
Most of the legacy software systems were developed in imperative languages with traditional design approaches. Instead of continually maintaining these legacy systems at high cost, re-engineering them to new systems w...
详细信息
Most of the legacy software systems were developed in imperative languages with traditional design approaches. Instead of continually maintaining these legacy systems at high cost, re-engineering them to new systems with good design and architecture can surely improve their understandability, reusability and maintainability. Design patterns (DPs) have integrated the concept of standardization and expert experiences into a set of inter-related components that can function certain behaviors with better flexible structure. The software development with DPs provides easier understanding and standardization that makes the system evolution much more effective. We use a parallel program generation environment (PPGE) as a case study to the re-engineering of a traditional software system into a pattern based software system. An architecture with the Dynamic-Packing Component Library (ADPCL) which is composed of existing well-known design patterns, and a pattern based re-engineering approach for transformation systems are also proposed.
Two years ago Utah Valley University (UVU) began offering a Master of Computer Science Degree(MCS). This program needed to be distinct from other similar programs in our service area, and needed to align with the work...
详细信息
ISBN:
(数字)9781728142913
ISBN:
(纸本)9781728142920
Two years ago Utah Valley University (UVU) began offering a Master of Computer Science Degree(MCS). This program needed to be distinct from other similar programs in our service area, and needed to align with the workforce development goals of UVU. The CS 6150 Advanced Algorithms course, one of the fundamental core courses students take in the MCS program, exhibits this distinctiveness. This paper describes five problems taught to students when they take CS 6150. They are: Balancing a Two-wheeled Robot, Stable Marriage Problem, Lemoine's Conjecture, Largest Triangle and Blockchains. These problems are an eclectic set of problems that are not commonly taught in data structures and algorithms textbooks and courses, but provide enough theory to be rigorous while giving experience with real-world, practical problems around which to develop new professional skills. Students are required to produce performant, working code while learning about the algorithms and related theories, concepts and mathematics involved. This blend supports the unique missions of UVU and the MCS Program. Student feedback is that the course is difficult, for reasons such as new advanced material and higher expectations of graduate students, however they also enjoy the challenging projects and they use the knowledge and skills they develop in school and work.
We propose an implementation of an efficient fused matrix multiplication kernel for W4A16 quantized inference, where we perform dequantization and GEMM in a fused kernel using a SplitK work decomposition. Our implemen...
详细信息
This paper introduces MultiSim, a prototype, user-oriented tool specifically designed to automate the model development process for parallel simulation models. Targeted toward the simulationist and written in Ada for ...
详细信息
This paper introduces MultiSim, a prototype, user-oriented tool specifically designed to automate the model development process for parallel simulation models. Targeted toward the simulationist and written in Ada for high transportability among different numbers of processors, MultiSim combines discrete-event simulation knowledge, parallel programming knowledge, and target language knowledge and represents this knowledge in frame-like constructs. Through user interaction, knowledge of the system to be modeled is abstracted and a parallel Ada simulation model is automatically generated based on the knowledge resident within MultiSim.
parallel programming models should attempt to satisfy two conflicting goals. On one hand, they should hide architectural details so that algorithm designers can write simple, portable programs. On the other hand, mode...
详细信息
parallel programming models should attempt to satisfy two conflicting goals. On one hand, they should hide architectural details so that algorithm designers can write simple, portable programs. On the other hand, models must expose architectural details so that designers can evaluate and optimize the performance of their algorithms. In this paper we experimentally examine the trade-offs made by a simple shared-memory model, QSM, to address this dilemma. The results indicate that analysis under the QSM model yields quite accurate results for reasonable input sizes and that algorithms developed under QSM achieve performance close to that obtainable through more complex models, such as BSP and LogP.
暂无评论