this paper describes a number of different coarse-grain GA's, including various migration strategies and connectivity schemes to address the premature convergence problem. these approaches are evaluated on a graph...
详细信息
this paper describes a number of different coarse-grain GA's, including various migration strategies and connectivity schemes to address the premature convergence problem. these approaches are evaluated on a graph partitioning problem. Our experiments showed, first, that the sequential GA's used are not as effective as parallel GA's for this graph partition problem. Second, for coarse-grain GA's, the results indicate that using a large number of nodes and exchanging individuals asynchronously among them is very effective. third, GA's that exchange solutions based on population similarity instead of a fixed connection topology get better results without any degradation in speed. Finally, we propose a new coarse-grained GA architecture, the Injection Island GA (iiGA). the preliminary results of iiGA's show them to be a promising new approach to coarse-grain GA's.< >
Fault tolerance to support mission life reliability is a key consideration in many system applications. Redundancy for defect tolerance, i.e., yield enhancement, and wafer-level reliability enhancement have been stand...
详细信息
ISBN:
(纸本)0780318501
Fault tolerance to support mission life reliability is a key consideration in many system applications. Redundancy for defect tolerance, i.e., yield enhancement, and wafer-level reliability enhancement have been standard practice since the advent of wafer scale technology. Koren [1], [2], [3], [4], [5] and others have shown how effective static redundancy techniques can be for improving both yield and reliability. However, designers are often faced withthe conflicting goals of maximizing system reliability and minimizing hardware. Not only are these two goals contradictory, but there is a point of diminishing return relative to the cost effectiveness of increasing system reliability at the expense of adding spare hardware. At ICWSI '93, Samson [6] presented an approach for optimizing real-time fault tolerance design in WSI via the Reliability-Availability Product. the approach was based upon the identification of fundamental optimization metrics, represented by simple product and quotient (reciprocal product) relationships, which extend traditional cost/benefit analysis to fault tolerance in VLSI and wafer scale architectures and systems [7]. the Reliability-Hardware Quotient (RHQ) is an example of another fundamental composite metric which is useful for identifying the optimal design point in a VSLI or wafer scale system. In this paper, this metric is applied to the problem of optimizing a two-level distributed (parallel) processing architecture. In particular, a graphical optimization technique using the 3D and contour plot features of MathematicaTM [8] is introduced which characterizes the trade space and identifies the optimum design point. the constraints of wafer scale technology can be superimposed upon the optimal solution space either to identify the limits of a given wafer scale implementation or to show what level of wafer scale technology is needed to achieve the optimum design.
New constructs for synchronization termed synchronization expressions (SEs) have been developed as high-level language constructs for parallel programming languages. We introduce a new family of languages named synchr...
详细信息
New constructs for synchronization termed synchronization expressions (SEs) have been developed as high-level language constructs for parallel programming languages. We introduce a new family of languages named synchronization languages which we use to give a precise semantic description for SEs. Under this description, relations such as equivalence and inclusion between SEs can be easily understood and tested. In practice, it also provides us with a systematic way for the implementation as well as the simplification of SEs in parallel programming languages. We also show that each synchronization language is closed under the following rewriting rules: (1) a/sub s/b/sub s//spl rarr/b/sub s/a/sub s/, (2) a/sub t/b/sub t//spl rarr/b/sub t/a/sub t/, (3) a/sub s/b/sub t//spl rarr/b/sub t/a/sub s/, (4) a/sub t/a/sub s/b/sub t/b/sub s//spl rarr/b/sub t/b/sub s/a/sub t/a/sub s/ and also h(a/sub t/a/sub s/b/sub t/b/sub s/)/spl rarr/h(b/sub t/b/sub s/a/sub t/a/sub s/) for any morphism h that satisfies certain conditions which will be specified in the paper. We show that this property can be used to reduce the number of states of a finite automaton that describes a synchronization language.< >
In this note, the method given in [13] is extended to a certain distributed parameter systems to check the stability of the convex combinations of the irrational polynomials. Illustrative examples are included.
ISBN:
(纸本)0780312813
In this note, the method given in [13] is extended to a certain distributed parameter systems to check the stability of the convex combinations of the irrational polynomials. Illustrative examples are included.
Approximation of partial differential equations of hyperbolic type by a set of ordinary differential equations is presented. the method of weighted-residual is applied. the Galerkin method and the finite element metho...
详细信息
ISBN:
(纸本)0780312813
Approximation of partial differential equations of hyperbolic type by a set of ordinary differential equations is presented. the method of weighted-residual is applied. the Galerkin method and the finite element method are presented as examples.
the proceedings contain 128 papers. the topics discussed include: C parallelizing compiler on local-net work- based computer environment;OCCAM prototyping of massively parallelapplications from colored Petri-nets;per...
ISBN:
(纸本)0818634421
the proceedings contain 128 papers. the topics discussed include: C parallelizing compiler on local-net work- based computer environment;OCCAM prototyping of massively parallelapplications from colored Petri-nets;performance characteristics of the iPSC/SSO and CM-2 I/O systems;automatic parallelization of LINPACK routines on distributed memory parallel processors;transformation of doacross loops on distributed memory systems;an efficient atomic multicast protocol for client-server models;a new horizon for sorting on mesh architectures;mapping of uniform dependence algorithm onto fixed size processor arrays;and towards understanding block partitioning for sparse Cholesky factorization.
the proceedings contain 17 papers. the topics discussed include: a selection theory and methodology for heterogeneous supercomputing;partitioning problems in heterogeneous computer systems;experiments with a task part...
ISBN:
(纸本)0818635312
the proceedings contain 17 papers. the topics discussed include: a selection theory and methodology for heterogeneous supercomputing;partitioning problems in heterogeneous computer systems;experiments with a task partitioning model for heterogeneous computing;heuristics for mapping parallel computations to heterogeneous parallel architectures;load distribution optimization in heterogeneous multiple processor systems;problem representations for an automatic mapping algorithm on heterogeneous processing environments;towards a virtual multicomputer;developing applications for a heterogeneous computing environment;heterogeneous by design: an environment for exploiting heterogeneity;a case study in metacomputing: distributed simulations of mixing in turbulent convection;and design of a heterogeneous parallelprocessing system for beam forming.
Non-linear filters have been used in many signal processingapplications, for example, to obtain optimum signal extraction or detection in the presence of random noise. the weighted median filter (WMF), of which the s...
详细信息
ISBN:
(纸本)081864222X
Non-linear filters have been used in many signal processingapplications, for example, to obtain optimum signal extraction or detection in the presence of random noise. the weighted median filter (WMF), of which the standard median is a special case, is a novel non-linear technique designed for 2D image processing. A major advantage of the WMF is its flexibility in design to deal with a wide variety of properties. this paper describes a commonly used class W(4,4,1) of the WMF. As with most non-linear methods, the computational demands of this technique are high and require a non-trivial number of `expensive' operations. A data parallel approach for efficient implementation of the WMF is described and implemented on two architecturally dissimilar supercomputers, the Convex C3840 and the Connection Machine CM-200. An analysis of the performance obtained from these two high performance parallel platforms is presented.
the centralized computation of a global state in a distributed system creates a performance bottleneck. In order to overcome this problem for a hypercube distributed system, we first develop the concept of nC-Tree and...
详细信息
ISBN:
(纸本)0780312813
the centralized computation of a global state in a distributed system creates a performance bottleneck. In order to overcome this problem for a hypercube distributed system, we first develop the concept of nC-Tree and define a revolving permutation on the nodes of the nC-Tree. We define a mapping of hypercube distributed system on a sequence of 2C-Tree with a logical hierarchical structure. the logical hierarchical structure in conjunction withthe revolving permutation defines the revolving hierarchy of the hypercube system. the repeated centralized computation of a global state is performed with uniform loading thereby removing this performance bottleneck.
distributed Arithmetic (DA) is used as a method for efficient implementation of inner product computation, where the coefficients of one vector are fixed. In this paper, we compare different structures for the impleme...
详细信息
ISBN:
(纸本)0780312813
distributed Arithmetic (DA) is used as a method for efficient implementation of inner product computation, where the coefficients of one vector are fixed. In this paper, we compare different structures for the implementation of DA. the area-time tradeoff study includes processors based on 1) vectors with N = 4, 8, 16 or 32 variables, 2) four different adder circuits with and without pipelining and 3) two memory saving techniques. the architectures are implemented in a double metal 1.2 μm CMOS technology within a standard cell environment and are verified by simulations. So, this allows a comparison by means of real values for chip area and computation time.
暂无评论