A fault tolerant computerarchitecture, FTCX, is an experimental computerarchitecture intended to serve as a general-purpose real-time computing system for fault sensitive supervisory and control applications. FTCX u...
详细信息
ISBN:
(纸本)0818607033
A fault tolerant computerarchitecture, FTCX, is an experimental computerarchitecture intended to serve as a general-purpose real-time computing system for fault sensitive supervisory and control applications. FTCX uses tightly synchronous triplex computation in its core to detect and mask all first faults. Synchronization, fault detection, and fault correction are all performed in the hardware. Novel to this architecture are the means by which interrupt requests and data are exchanged between the simplex local or remote industry standard bus (VMEbus) environments and the triplexed core environment. these exchanges are software transparent, yet fully implement all of the necessary algorithms to maintain data consistency and synchronization in the three channels of the core, even in the face of byzantine faults.
Cloud technology has made it easy to use infrastructure or software as a service to businesses, saving huge cost associated with acquisition and management of resources to run a data center or software licenses which ...
详细信息
ISBN:
(纸本)9781479961238
Cloud technology has made it easy to use infrastructure or software as a service to businesses, saving huge cost associated with acquisition and management of resources to run a data center or software licenses which is particularly vital for Small and Medium Enterprises. this has resulted in changing the dynamics of performance management by cloud service providers as the control of day-to-day business are passed to them. this paper focuses on addressing a number of related issues and discusses the break points or criteria to look for in performance management in applications, server and systems in cloud environment architecture. It recommends a semi-automated dynamic cloud resource management for both cloud clients and providers based on performance for small and medium enterprises.
the proceedings contain 11 papers. the topics discussed include: integrating existing scientific workflow systems: the Kepler/Pegasus example;workflow adaptation as an autonomic computing problem;supporting large-scal...
ISBN:
(纸本)9781595937155
the proceedings contain 11 papers. the topics discussed include: integrating existing scientific workflow systems: the Kepler/Pegasus example;workflow adaptation as an autonomic computing problem;supporting large-scale science with workflows;on the black art of designing computational workflows;WS-VLAM: towards a scalable workflow system on the grid;a semantic workflow authoring tool for programming grids;a workflow approach to designed reservoir study;myExperiment: social networking for workflow-using e-scientists;GRIDCC: real-time workflow system;and cache for workflows.
Task scheduling is a key element in achieving highperformance from multicomputer systems. To be efficient, scheduling algorithms must be based on a cost model appropriate for computing systems in use. the optimal sch...
详细信息
ISBN:
(纸本)0769517722
Task scheduling is a key element in achieving highperformance from multicomputer systems. To be efficient, scheduling algorithms must be based on a cost model appropriate for computing systems in use. the optimal scheduling of tasks is NP-hard, and a large number of heuristic algorithms have been Proposed for a variety of scheduling conditions (graph types, granularities or cost models). this paper studies the problem of task scheduling under the LogP model and presents boththeoretical and experimental results for a cluster-based, task duplication methodology.
For a long time the Instruction Set architecture (ISA) has been the firm contract between software and hardware. this firm contract plays an important role by decoupling the development of software from hardware micro...
详细信息
ISBN:
(纸本)9781509012336
For a long time the Instruction Set architecture (ISA) has been the firm contract between software and hardware. this firm contract plays an important role by decoupling the development of software from hardware micro-architectural features, enabling both to evolve independently. Nonetheless, it also condemns the ISA to become larger, more cluttered and inefficient as new instructions are incorporated over the years and deprecated instructions are left untouched to keep legacy compatibility. In this work we propose OpenISA, a flexible ISA that enables boththe software and the hardware to evolve independently and discuss how OpenISA 1.0 was designed to enable efficient OpenISA software emulation on alien ISAs, which is key to free the user from hardware lock-ins. Our results show that software compiled to OpenISA can be latter emulated on x86 and ARM processors with very little overhead achieving near native performance, under 10% for the majority of programs.
For most applications, taking full advantage of the memory system is key to achieving good performance on GPUs. In this paper, we introduce register caching, a novel idea where registers of multiple threads are combin...
详细信息
ISBN:
(纸本)9781479984480
For most applications, taking full advantage of the memory system is key to achieving good performance on GPUs. In this paper, we introduce register caching, a novel idea where registers of multiple threads are combined and used as a shared, last level, manually managed cache for the contributing threads. this method is enabled by the shuffle instruction recently introduced in Nvidia's Kepler GPU architecture, which allows threads in the same warp to exchange data directly, previously only possible by going through shared memory. We evaluate our proposal with a stencil computation benchmark, achieving speedups of up to 2.04, compared to using shared memory on a GTX680 GPU. Stencil computations form the core of many scientific applications, which can therefore benefit from our proposal. Furthermore, our method is not limited to stencil computations, but is applicable to any application with a predictable memory access pattern suitable for manual caching.
Commodities-built clusters, a low cost alternative for distributed parallel processing, brought high-performancecomputing to a wide range of users. However the existing widespread tools for distributed parallel progr...
详细信息
ISBN:
(纸本)0769520464
Commodities-built clusters, a low cost alternative for distributed parallel processing, brought high-performancecomputing to a wide range of users. However the existing widespread tools for distributed parallel programming, such as messaging passing libraries, does not attend new software engineering requirements that nave emerged due to increase in complexity of applications. Haskell(#) is a parallel programming language intending to reconcile higher abstraction and modularity with scalable performance. In this paper it is demonstrated the use of Haskell(#) in the programming of three SPMD benchmark programs, which have lower-level MPI implementations available.
State-of-the-art software distributed shared-memory systems (SDSMs) provide a cost-effective solution to run single-program-multiple-data (SPMD) applications on clusters of distributed memory computers. However, SDSMs...
详细信息
ISBN:
(纸本)0769527043
State-of-the-art software distributed shared-memory systems (SDSMs) provide a cost-effective solution to run single-program-multiple-data (SPMD) applications on clusters of distributed memory computers. However, SDSMs are unsuitable for running applications with dynamic, highly asynchronous task parallelism (ATP), such as graphics, simulators, and decision support systems. In ATP-based applications, the execution of tasks depends not only on the input data but also on the variable amount of data that each task produces at runtime, which generates high load imbalance and communication traffic that degrades performance of SDSM systems drastically. In this work, we propose a new load balancing (LB) mechanism to enable SDSM systems to support dynamic task scheduling as required by ATP applications. To evaluate the benefits of our LB mechanism, we developed Clik a new multithreaded SDSM system with automatic load balancing. Our preliminary performance results of Clik running on a 16-node Linux SMP cluster for five ATP applications showed that Clik attained significant speedups. For four of our five applications, the speedups varied from 7.2 up to 13.8 on 16 processors.
Distributed-memory parallel computers refer to parallel computers in which each processor has its own private memory In such a system, processors communicate information by exchanging messages via the interconnection ...
详细信息
ISBN:
(纸本)0769516262
Distributed-memory parallel computers refer to parallel computers in which each processor has its own private memory In such a system, processors communicate information by exchanging messages via the interconnection network rather than storing it in shared memory. One of the important communication methods in such systems is data broadcasting. All-to-all broadcasting is the process in which each processor sends its message to all other processors, and receives messages from all other processors in the system. Two complexity elements are usually considered when measuring the performance of a distributed-memory parallel model: time complexity and message complexity In this paper, we develop an efficient communication scheme to reduce boththe communication time and the message complexity in the star network model under the half-duplex and full-duplex communication capability the complexity measures of our scheme are compared against known bounds to verify the efficiency of the suggested scheme.
this paper describes a novel approach to generate an optimized schedule to run threads on distributed shared memory (DSM) systems. the approach relies upon a binary instrumentation tool to automatically acquire the me...
详细信息
ISBN:
(纸本)1595936734
this paper describes a novel approach to generate an optimized schedule to run threads on distributed shared memory (DSM) systems. the approach relies upon a binary instrumentation tool to automatically acquire the memory sharingrelationship between user-level threads by analyzing their memory trace. We introduce the concept of Affinity Graph to model the relationship. Expensive I/O for large trace files is completely eliminated by using an online graph creation scheme. We apply the technique of hierarchical graph partitioning and thread reordering to the affinity graph to determine an optimal thread schedule. We have performed experiments on an SGI Altix system. the experimental results show that our approach is able to reduce the totalexecution time by 10% to 38% for a variety of applications through the maximization of the data reuse within a single processor, minimization of the data sharing between processors, and a good load balance. Copyright 2007 ACM.
暂无评论