A parallel algorithm for EDT transform on linear array with reconfigurable pipeline bus system (LARPBS) is presented For an image with n × n pixels, the algorithm can complete the EDT transform in O(nlogn/(c(n)lo...
详细信息
ISBN:
(纸本)0769521320
A parallel algorithm for EDT transform on linear array with reconfigurable pipeline bus system (LARPBS) is presented For an image with n × n pixels, the algorithm can complete the EDT transform in O(nlogn/(c(n)logd(n))) time using n.d(n).c(n) processors, where c(n) and d(n) are parameters satisfying 1 &le c(n) &le n, and 1&le, the algorithm can be completed in O(1) time using n2+e processors. To our best knowledge, this is the most efficient constant-time EDT algorithm on LARPBS.
This paper introduces the design and complexity analysis of a parallel genetic algorithm to generate a "best" path for a robot arm to follow, given a starting position and a goal in three-dimensional space. ...
详细信息
ISBN:
(纸本)0769521320
This paper introduces the design and complexity analysis of a parallel genetic algorithm to generate a "best" path for a robot arm to follow, given a starting position and a goal in three-dimensional space. Path generation takes into account any obstacles near the arm. This algorithm uses multiple optimization criteria, independent cross-pollinating populations, and handles multiple hard constraints. Individuals in the population consist of multiple chromosomes. The complexity of the algorithm is the number of generations processed times O(N2) where N is the total number of individuals used for path generation on all of the optimizations. This research is being sponsored by NASA grant NAG 9-1401.
The major issue today on cluster and grid computing is the efficient resource management. The evaluation of scheduling strategies is hard because of the generation of jobs under realistic scenario. This is true for ri...
详细信息
ISBN:
(纸本)0769521320
The major issue today on cluster and grid computing is the efficient resource management. The evaluation of scheduling strategies is hard because of the generation of jobs under realistic scenario. This is true for rigid jobs (where the number of processors is fixed) and even more for moldable ones. This paper presents an approach to generate realistic workloads for this kind of jobs. The model we propose is based on the analysis of one year of utilization of the I-cluster, a 225 processors cluster. From this log we extract a typical load for this kind of parallel machines and introduce a way to generate synthetic realistic workloads in an automatic way. This work was done as a way to test scheduling strategies taking into account both rigid and moldable jobs so as the workload generator may handle moldable jobs.
Performance prediction is necessary and crucial in order to deal with multi-dimensional performance effects on parallel systems. The increasing use of parallel supercomputers and cluster systems to solve large-scale s...
详细信息
ISBN:
(纸本)0769521320
Performance prediction is necessary and crucial in order to deal with multi-dimensional performance effects on parallel systems. The increasing use of parallel supercomputers and cluster systems to solve large-scale scientific problems has generated a need for tools that can predict scalability trends of applications written for these machines. In this paper, we describe a compiler tool to automate performance prediction for execution times of parallel programs by runtime formulas in closed form. For an arbitrary parallel MPI source program the tool generates a corresponding runtime function modeling the CPU execution time and the message passing overhead. The environment is proposed to support the development process and the performance engineering activities that accompany the whole software life cycle. The performance prediction tool is shown to be effective in analyzing a representative application for varying problem sizes on several platforms using different numbers of processors.
This paper describes the development of a fine-grained meta-heuristic for solving large strip packing problems with guillotine layouts. aCe, an architecture-adaptive environment, and the aCe C parallel programming lan...
详细信息
ISBN:
(纸本)0769521320
This paper describes the development of a fine-grained meta-heuristic for solving large strip packing problems with guillotine layouts. aCe, an architecture-adaptive environment, and the aCe C parallel programming language are used to implement a massively parallel genetic simulated annealing (GSA) algorithm. The parallel GSA combines the temperature schedule of simulated annealing with the crossover and mutation operators that are applied to chromosome populations in genetic algorithms. For our problem, chromosomes are normalized postfix expressions that represent guillotine strip packings. Preliminary results for some benchmark data sets are reported and indicate that the parallel GSA method holds promise as a technique for solving the strip packing problem.
AIAC algorithms (Asynchronous Iterations Asynchronous Communications) are a particular class of parallel iterative algorithms. Their asynchronous nature makes them more efficient than their synchronous counterparts in...
详细信息
ISBN:
(纸本)0769521320
AIAC algorithms (Asynchronous Iterations Asynchronous Communications) are a particular class of parallel iterative algorithms. Their asynchronous nature makes them more efficient than their synchronous counterparts in numerous cases as has already been shown in previous works. The first goal of this article is to compare several parallel programming environments in order to see if there is one of them which is best suited to efficiently implement AIAC algorithms. The main criterion for this comparison consists in the performances achieved in a global context of grid computing for two classical scientific problems. Nevertheless, we also take into account two secondary criteria which are the ease of programming and the ease of deployment. The second goal of this study is to extract from this comparison the important features that a parallel programming environment must have in order to be suited for the implementation of AIAC algorithms.
This paper deals with the definition and implementation of a distributed Web-based architecture for the on-line power system security analysis. The architecture employs field data measurements to dynamically estimate ...
详细信息
ISBN:
(纸本)0769521320
This paper deals with the definition and implementation of a distributed Web-based architecture for the on-line power system security analysis. The architecture employs field data measurements to dynamically estimate the actual state of an electrical grid and solves, for each possible contingency, the power system state equations to classify critical contingencies that could lead to system vulnerabilities. To identify the critical contingencies a computational engine able to execute the power system state equations is implemented according to a variant of the master/slave pattern. This pattern is defined to easily leverage the hierarchical network topology of resources connected to the Internet. The pattern is implemented with both RMI and ProActive, while the resulting framework is deployed on a network of heterogeneous clusters and used to compute the contingency analysis of a realistic electrical grid. The experimental results demonstrate that on-line security analysis can be executed in few minutes even for large network complexity.
Real-time distributed applications complexity is steadily increasing. A well-known technique used to manage such complexity consists in decomposing the whole system in different quasi-independent distributed subsystem...
详细信息
ISBN:
(纸本)0769521320
Real-time distributed applications complexity is steadily increasing. A well-known technique used to manage such complexity consists in decomposing the whole system in different quasi-independent distributed subsystems. Inter-subsystem communication, when necessary, is performed via gateway nodes that filter in and outgoing traffic. For real-time systems, this architecture poses additional design challenges, since it becomes necessary to consider both intra and inter-network message exchanges with real-time constraints. In the work carried out so far, the FTT communication paradigm has been provided with tools for supporting flexible real-time communication on isolated networks. This work presents a first approach to incorporate multi-segment support into the FTT protocol family. Particularly, two approaches are presented, analyzed and compared, which allow breaking end-to-end deadlines into parameters that are local to each one of the interconnected networks.
In this paper we present a new decider mechanism for the self-tuning dynP job scheduler for modern resource management systems. This scheduler switches the active scheduling policy dynamically during run time, in orde...
详细信息
ISBN:
(纸本)0769521320
In this paper we present a new decider mechanism for the self-tuning dynP job scheduler for modern resource management systems. This scheduler switches the active scheduling policy dynamically during run time, in order to reject changing characteristics of waiting jobs. The new decider explicitly prefers a single scheduling policy instead of being fair to all available policies. We use discrete event simulations to evaluate the achieved slowdown and utilization and compare the results with the fair decider mechanism and the static basic scheduling policies. The evaluation shows, that the self-tuning dynP scheduler in combination with the preferred decider achieves good results and that it is superior to common static scheduling approaches, which use only a single policy.
The technique of replicating frequently-accessed files to other nodes has been widely used in a high-performance distributed system to reduce the load of the nodes hosting these files. Traditional file replication alg...
详细信息
ISBN:
(纸本)0769521320
The technique of replicating frequently-accessed files to other nodes has been widely used in a high-performance distributed system to reduce the load of the nodes hosting these files. Traditional file replication algorithms rely on the analysis of client-access logs to determine the location of the replicated nodes. In this paper, we present LessLog, a logless file replication algorithm, developed for a peer-to-peer distributed system. We first construct a lookup tree for each node. LessLog uses bitwise operations to determine the location of the replicated node without any client-access history. In addition, each replication is guaranteed to reduce the workload of the replicating node by half. A fault-tolerant LessLog model is also presented. The experimental results show that LessLog successfully and efficiently reduces the load of overloaded nodes.
暂无评论