Handling very large datasets has been a key problem addressed in real-time distributed rendering research. With the advent of the programmable graphics processing unit (GPU), it is now possible and even profitable to ...
详细信息
Handling very large datasets has been a key problem addressed in real-time distributed rendering research. With the advent of the programmable graphics processing unit (GPU), it is now possible and even profitable to move many application-specific computations to be carried out by the GPU. It has been shown that modern GPUs outperform the standard PC-platform CPUs on a broad class of computations by over a factor of seven. Given the low costs and high processing speeds of GPUs, there is a trend towards using clusters of CPU/GPU systems. Configuring and programming these clusters for efficient distribution of data and computations is a major challenge. What are the computations that can be offloaded from the CPU to a GPU? The answer to this question is not simple as it depends on the following four factors: GPU's processing capacity, GPU's internal bandwidth, GPU-CPU communication bandwidth and the external network bandwidth. All these factors are subject to change with every generation of hardware. But additions and alternatives to the traditional data-parallel architectures are now needed to exploit the full capability of such clusters using functional parallelism. In this paper, we present a number of architectural configurations that could be adapted on such clusters. Specifically, we demonstrate use of one such architecture: application of a GPU-based pipelined architecture to our work on real-time processing and rendering of large-point datasets, which demands complex computations. We have also introduced a list of application and system parameters that are necessary to determine an optimal distribution of computation on the GPUs of a graphics cluster.
Scheduling strategies for parallel and distributed computing have mostly been oriented toward performance, while striving to achieve some notion of fairness. With the increase in size, complexity, and heterogeneity of...
详细信息
ISBN:
(纸本)9780780390379
Scheduling strategies for parallel and distributed computing have mostly been oriented toward performance, while striving to achieve some notion of fairness. With the increase in size, complexity, and heterogeneity of today's computing environments, we argue that, in addition to performance metrics, scheduling algorithms should be designed for robustness. That is, they should have the ability to maintain performance under a wide variety of operating conditions. Although robustness is easy to define, there are no widely used metrics for this property. To this end, we present a methodology for characterizing and measuring the robustness of a system to a specific disturbance. The methodology is easily applied to many types of computing systems and it does not require sophisticated mathematical models. To illustrate its use, we show three applications of our technique to job scheduling; one supporting a previous result with respect to backfilling, one examining overload control in a streaming video server, and one comparing two different scheduling strategies for a distributed network service. The last example also demonstrates how consideration of robustness leads to better system design as we were able to devise a new and effective scheduling heuristic.
We examine the task of constructing bounded-time self-stabilizing rule-based systems that take their input from an external environment. Bounded response-time and self-stabilization are essential for rule-based progra...
详细信息
We examine the task of constructing bounded-time self-stabilizing rule-based systems that take their input from an external environment. Bounded response-time and self-stabilization are essential for rule-based programs that must be highly fault-tolerant and perform in a real-time environment. We present an approach for solving this problem using the OPS5 programming language as it is one of the most expressive and widely used rule-based programming languages. Bounded response-time of the program is ensured by constructing the state space graph so that the programmer can visualize the control flow of the program execution. Potential infinite firing sequences, if any, should be detected and the involved rules should be revised to ensure bounded termination. Both the input variables and internal variables are made fault-tolerant from corruption caused by transient faults via the introduction of new self-stabilizing rules in the program. Finally, the timing analysis of the self-stabilizing OPS5 program is shown in terms of the number of rule firings and the comparisons performed in the Rete network.
distributedsystems verification is one of the main issues in softwareengineering. It is considered as the major field of the formal specification techniques. However, many difficulties remain. In fact, the principal...
详细信息
ISBN:
(纸本)0769521320
distributedsystems verification is one of the main issues in softwareengineering. It is considered as the major field of the formal specification techniques. However, many difficulties remain. In fact, the principal problem is in producing a coherent specification and providing a fully integrated semantics. Since formal methods are mathematical description models that try to give a response concerning the reliability of a system. It remains a hard way for the designers. Thus, we present, in this paper, an open environment for the integration of formal methods in the description and verification of distributed and concurrent systems. The system currently uses UML notation and provides rewriting logic, model checking, theorem proving, and simulation techniques.
Coarse-grain reconflgurable systems offer high performance and energy-efficiency, provided an efficient run-time reconfiguration mechanism is available. Using an embedded software vantage point, we define three levels...
详细信息
ISBN:
(纸本)0769521320
Coarse-grain reconflgurable systems offer high performance and energy-efficiency, provided an efficient run-time reconfiguration mechanism is available. Using an embedded software vantage point, we define three levels of reconfigurability for such systems, each with a different degree of coupling between embedded software and reconfigurable hardware. We classify reconfigurable systems starting with tightly-coupled coprocessors and evolving to processor networks. This results in a gradual increase of energy-efficiency when compared to software-only systems, at the cost of increasing programming complexity. Using several sample applications including signal-, crypto-, and network-processing acceleration units, we demonstrate energy-efficiency improvements of 12 times over software for tightly-coupled systems up to 84 times for network-on-chip systems.
Performance prediction is necessary and crucial in order to deal with multi-dimensional performance effects on parallelsystems. The increasing use of parallel supercomputers and cluster systems to solve large-scale s...
详细信息
ISBN:
(纸本)0769521320
Performance prediction is necessary and crucial in order to deal with multi-dimensional performance effects on parallelsystems. The increasing use of parallel supercomputers and cluster systems to solve large-scale scientific problems has generated a need for tools that can predict scalability trends of applications written for these machines. In this paper, we describe a compiler tool to automate performance prediction for execution times of parallel programs by runtime formulas in closed form. For an arbitrary parallel MPI source program the tool generates a corresponding runtime function modeling the CPU execution time and the message passing overhead. The environment is proposed to support the development process and the performance engineering activities that accompany the whole software life cycle. The performance prediction tool is shown to be effective in analyzing a representative application for varying problem sizes on several platforms using different numbers of processors.
The proceedings contains 48 papers from the conference on Eight IEEE internationalsymposium on High Assurance systemsengineering. The topics discussed include: decomposition of fairness and performance aspects for h...
详细信息
The proceedings contains 48 papers from the conference on Eight IEEE internationalsymposium on High Assurance systemsengineering. The topics discussed include: decomposition of fairness and performance aspects for high-assurance continuous process-control systems;software fault tree analysis for product lines;structural analysis of explicit fault-tolerant programs;assessing reliability risk using fault correction profiles;a formal specification-based approach to distributedparallel programming and resource0sensitive intrusion detection models for network traffic.
Computational Grids have been proposed as the next generation computing platform for solving large-scale problems in science, engineering, and commerce. There is an enormous amount of interest in applications, called ...
详细信息
ISBN:
(纸本)0769521320
Computational Grids have been proposed as the next generation computing platform for solving large-scale problems in science, engineering, and commerce. There is an enormous amount of interest in applications, called Grid Workflows in which a number of otherwise independent programs are run in a "pipeline". In practice, there are a number of different mechanisms that can be used to couple the models, ranging from loosely coupled file based IO to tightly coupled message passing. In this paper we propose a flexible IO architecture that provides a wide range of mechanisms for building Grid Work/lows without the need for any source code modification and without the need to fix them at design time. Further, the architecture works with legacy applications. We evaluate the performance of our prototype system using a workflow in computational mechanics.
Many researches have been devoted to designing appropriate concurrency control algorithms for real-time database systems, which not only satisfy consistency requirements but also meet transaction timing constraints as...
详细信息
ISBN:
(纸本)0769521320
Many researches have been devoted to designing appropriate concurrency control algorithms for real-time database systems, which not only satisfy consistency requirements but also meet transaction timing constraints as much as possible. Optimistic concurrency control protocols have the nice properties of being non-blocking and deadlock-free, but they have the problems of late conflict detection and transaction restarts. Although the number of transaction restarts is reduced by dynamic adjustment of serialization order (DASO) in real-time database systems, it still has some unnecessary transaction restarts. In this paper, we first propose a new method called dynamic adjustment of execution order (DAEO) and a new optimistic concurrency control algorithm based on DAEO, which can reduce the number of unnecessary restarts near to zero and outperforms the previous algorithms, and then discuss the experiments and the results.
In this paper, A 1D DCT Processor with parallel pipelined VLSI architecture is designed for MPEG visual and audio applications. The processor is based on distributed arithmetic to obtain low power and high computation...
详细信息
ISBN:
(纸本)0780385934
In this paper, A 1D DCT Processor with parallel pipelined VLSI architecture is designed for MPEG visual and audio applications. The processor is based on distributed arithmetic to obtain low power and high computation efficiency. The simulation with EDA software shows the pipelined parallel architecture can reach an efficient compromise between hardware cost and computing speed for real-time MPEG-related applications.
暂无评论