In a few short years, computers capable of over one Petaflops performance will become a reality. The most likely approach for first successfully reaching this performance level will involve several thousands of parall...
详细信息
ISBN:
(纸本)0769523129
In a few short years, computers capable of over one Petaflops performance will become a reality. The most likely approach for first successfully reaching this performance level will involve several thousands of parallel processing elements. What are the key considerations for building such systems? What are the software requirements and demands? How will applications scale? How reliable are they likely to be? What will they be good for? We will address these questions and more based on early experience with the BlueGene system.
Scheduling is a fundamental issue in achieving high performance on metacomputers and computational grids. For the first time, the job scheduling problem for grid computing on metacomputers is studied as a combinatoria...
详细信息
ISBN:
(纸本)0769523129
Scheduling is a fundamental issue in achieving high performance on metacomputers and computational grids. For the first time, the job scheduling problem for grid computing on metacomputers is studied as a combinatorial optimization problem. It is proven that the list scheduling algorithm can achieve reasonable worst-case performance bound in grid environments supporting distributed supercomputing with large applications. It is also observed that communication heterogeneity does have significant impact on schedule lengths.
Real-time spatio-temporal VLSI 3D IIR digital filters may be used for imaging or beamforming applications employing 3D input signals from synchronously-sampled multi-sensor arrays. Such filters have high computational...
详细信息
ISBN:
(纸本)0780388348
Real-time spatio-temporal VLSI 3D IIR digital filters may be used for imaging or beamforming applications employing 3D input signals from synchronously-sampled multi-sensor arrays. Such filters have high computational complexity and often require arithmetic throughputs of hundreds of millions of floating point operations per second, especially in the case of potential radio frequency beamforming applications. A novel high-throughput distributedparallel processor (DPP) architecture is proposed that is suitable for on-chip real-time VLSI/FPGA direct-form 3D IIR digital filter implementations. Using the proposed architecture and Matlab/Simulink and Minx simulation software, the design and bit-level simulation of a first-order highly-selective FPGA-based 3D IIR Frequency-planar filter circuit is reported for 3D plane-wave filtering.
This paper extends the previous work on the maximal allowable workload (MAW) problem [2] by investigating a resource allocation problem for distributed real-time systems that contain replicable applications. The syste...
详细信息
ISBN:
(纸本)0769523129
This paper extends the previous work on the maximal allowable workload (MAW) problem [2] by investigating a resource allocation problem for distributed real-time systems that contain replicable applications. The systems may use multiple resources of a single type and be affected by multiple environmental factors. The approach searches for a feasible allocation that maximizes a user defined metric of stability. Several algorithms were developed and experiments were conducted to demonstrate the relative strength of these algorithms. The results showed that Simulated Annealing provides results that are the closest to the optimal for maximizing environmental parameter settings. In addition modified greedy first fit is shown to be the best performing algorithm for finding feasible allocations.
This paper describes a Global Computing (GC) environment, called Xtrem Web-CH (XWCH). XWCH is an improved version of a GC tool called Xtrem Web (XW). XWCH tries to enrich XW in order to match P2P concepts: distributed...
详细信息
ISBN:
(纸本)0769525091
This paper describes a Global Computing (GC) environment, called Xtrem Web-CH (XWCH). XWCH is an improved version of a GC tool called Xtrem Web (XW). XWCH tries to enrich XW in order to match P2P concepts: distributed scheduling, distributed communication, development of symmetrical models. Two versions of XWCH were developed The first, called XWCH-sMs, manages inter-task communications in a centralized way. The second version, called XWCH-p2p, allows a direct communication between "workers". XWCH is evaluated in the case of a real high performance genetic application.
One of the first steps in starting a program on a cluster is to get the executable, which generally resides on some network file server. This creates not only contention on the network, but causes unnecessary strain o...
详细信息
ISBN:
(纸本)0769523129
One of the first steps in starting a program on a cluster is to get the executable, which generally resides on some network file server. This creates not only contention on the network, but causes unnecessary strain on the network file system as well, which is busy serving other requests at the same time. This approach is certainly not scalable as clusters grow larger. We present a new approach that uses a high speed interconnect, novel network features, and a scalable design. We provide a fast, efficient, and scalable solution to the distribution of executable files on production parallel machines.
This paper presents PPerfGrid, a tool that addresses the challenges involved in the exchange of heterogeneous parallel computing performance data. parallel computing performance data exists in a wide variety of differ...
详细信息
ISBN:
(纸本)0769523129
This paper presents PPerfGrid, a tool that addresses the challenges involved in the exchange of heterogeneous parallel computing performance data. parallel computing performance data exists in a wide variety of different schemas and formats, from basic text files to relational databases to XML, and it is stored on geographically dispersed host systems of various platforms. PPerfGrid uses Grid Services to address these challenges. PPerfGrid exposes Application and Execution semantic objects as Grid services and publishes their location and characteristics in a registry. PPerfGrid clients access this registry, locate the PPerfGrid sites with performance data they are interested in, and bind to a set of Grid services that represent this data. This set of Application and Execution Grid services provides a uniform, virtual view of the data available in a particular PPerfGrid session. PPerfGrid addresses scalability by allowing specific questions to be asked about a data store, thereby narrowing the scope of the data returned to a client. In addition, by using a Grid services approach, the Application and Execution Grid services involved in a particular query can be dynamically distributed across several hosts, thereby taking advantage of parallelism and improving scalability, We describe our PPerfGrid prototype and include data from preliminary prototype performance tests.
This paper studies distributed scheduling of parallel I/O data transfers on systems that provide data replication. In our previous work, we proposed a centralized algorithm for solving this problem in systems where da...
详细信息
ISBN:
(纸本)0769523129
This paper studies distributed scheduling of parallel I/O data transfers on systems that provide data replication. In our previous work, we proposed a centralized algorithm for solving this problem in systems where data transfer information is centrally available. This algorithm finds the optimal scheduling by constructing augmenting paths in the data transfer bipartite graph, requiring O(nmlog n + n2log3/2 n) time, with n nodes and m edges in the bipartite graph. In this paper, we investigate this scheduling problem in distributedsystems where data transfer information may not be centrally available. We propose a distributed scheduling algorithm, Highest Degree Lowest Workload first (HDLWF), which approximates the augmenting path algorithm in distributed environments. HDLWF is based on a distributed, two-step scheme that determines appropriate execution order of data requests through a small number of rounds of bidding between clients and I/O servers. Our experimental results indicate that HDLWF yields schedules close to the centralized optimal solution, and in some cases within 3% of the optimal solution.
State-space based techniques represent a powerful analysis tool of discrete-event systems. One way to face the state-space explosion is the exploitation of behavioral symmetries of distributedsystems. Well-Formed Col...
详细信息
ISBN:
(纸本)0769523129
State-space based techniques represent a powerful analysis tool of discrete-event systems. One way to face the state-space explosion is the exploitation of behavioral symmetries of distributedsystems. Well-Formed Coloured Petri Nets (WN) allow the direct construction of a symbolic reachability graph (SRG) that captures symmetries suitably encoded in WN syntax. Most real systems however mix symmetric and asymmetric behaviors. The SRG, and more generally, all those approaches based on a static description of symmetries, have shown not to be effective in such cases. In this paper two quotient graphs are proposed as effective analysis frameworks for asymmetric systems. Both rely on WN syntax extended with relational operators. The first one is an extension of the SRG that exploits local symmetries. The second technique uses linear constraints and substate inclusion in order to aggregate states. An asymmetric distributed leader-election algorithm is used as running example.
At present, saving energy consumption of modern processors and fault tolerance become major concerns due to the fact that high power consumption increases heat dissipation, which leads to decreased reliability of syst...
详细信息
ISBN:
(纸本)0769523129
At present, saving energy consumption of modern processors and fault tolerance become major concerns due to the fact that high power consumption increases heat dissipation, which leads to decreased reliability of systems. Similarly, the faults of running tasks also reduce the reliability of systems. The algorithms proposed in this paper are based on the policy of shortest-task-first and combined with other efficient techniques, such as shared slack reclamation and checkpoint. Consequently, not only real-time tasks can be completed before deadline, but also reduction of the global power consumption and fault-tolerance will be satisfied dynamically. In this paper, we present four algorithms to cope with scheduling independent task sets and task sets with precedence relationship in homogeneous and heterogeneous systems, respectively. Moreover, we present dynamic fault-tolerant algorithm. Compared to the efficient algorithms presented so far, our algorithms show lower communicational complexity and much better scheduling performance in terms of makespan and energy consumption.
暂无评论