A new interconnection network for massively parallel computing is introduced. This network is called an Optical Multi-Mesh Hypercube (OMMH) network. The OMMH integrates positive features of both hypercube (small diame...
详细信息
A new interconnection network for massively parallel computing is introduced. This network is called an Optical Multi-Mesh Hypercube (OMMH) network. The OMMH integrates positive features of both hypercube (small diameter, high connectivity, symmetry, simple control and routing, fault tolerance, etc.) and mesh (constant node degree and scalability) topologies and at the same time circumvents their limitations (e.g., the lack of scalability of hypercubes, and the large diameter of meshes). The OMMH can maintain a constant node degree regardless of the increase in the network size. In addition, the flexibility of the OMMH network makes it well suited for optical implementations. This paper presents the OMMH topology, analyzes its architectural properties and potentials for massively parallel computing, and compares it to the hypercube. Moreover, it also presents a three-dimensional optical design methodology based on free-space optics. The proposed optical implementation has totally space-invariant connection patterns at every node, which enables the OMMH to be highly amenable to optical implementation using simple and efficient large space-bandwidth product space-invariant optical elements.
As a simplified representation of a geometric model, the medial axis (MA) has been used in a wide range of engineering applications. While obtaining the true MA of a complicated CAD model is known to be a difficult ta...
详细信息
As a simplified representation of a geometric model, the medial axis (MA) has been used in a wide range of engineering applications. While obtaining the true MA of a complicated CAD model is known to be a difficult task, current research is predominantly focused on computing its approximate MA instead. To improve its quality, this work develops a novel and efficient method for obtaining a high-quality MA composed of MA faces for a CAD model. Specifically, an MA point is computed using a dual-normal-tracing algorithm for each sample point. This algorithm can be implemented through GPU-enabled parallel computing and be executed in an iterative manner until MA points have been found for all sample points. After the iteration is completed, the MA points generated are then converted into the resultant MA by evaluating the topological connectivities of their corresponding sample points. Finally, the resultant MA is converted into MA faces using the information of boundary CAD faces. The proposed method is evaluated by analyzing its complexity and robustness, discussing its applicability and testing its performance in a couple of computational experiments. As shown in the evaluation, this method is easy to implement through exploiting parallel computing and can support effective and high-quality MA generation for a CAD model.
Adding the number of computing nodes is a common approach to achieving higher performance in a parallel computing system. However, with constraint of fixed system architecture and fixed algorithm structure, it is diff...
详细信息
Adding the number of computing nodes is a common approach to achieving higher performance in a parallel computing system. However, with constraint of fixed system architecture and fixed algorithm structure, it is difficult to improve the performance of parallel computing only by extending its scale absolutely. To realize such extension with fixed structure, we analyze key factors from architecture and parallel task, which affect the scalability, and then use the weighted graph to model architecture as well as parallel task. Especially, focusing on the case that architecture graph and parallel task graph are homogeneous, we propose the extension method of graph similarity;for the case that architecture graph and parallel task graph are heterogeneous, a critical-path-unchanged scaling method is proposed. Actually, the above two extending methods do not change the graph's structure. They only adjust the node weight and edge-weight in the relevant graph. Furthermore, through mathematical derivation, some conclusions about the new scaling methods are drawn. Finally, in order to verify the effectiveness, some simulative experiments are conducted on the platform SimGrid. The experimental results show that the proposed methods can realize iso-speed-efficiency extension, and can guide practical extensions for parallel computing.
The characterization of nonlinear dynamical systems and their attractors in terms of invariant measures, basins of attractions and the structure of their vector fields usually outlines a task strongly related to the u...
详细信息
The characterization of nonlinear dynamical systems and their attractors in terms of invariant measures, basins of attractions and the structure of their vector fields usually outlines a task strongly related to the underlying computational cost. In this work, the practical aspects related to the use of parallel computing - specially the use of Graphics Processing Units (CPUs) and of the Compute Unified Device Architecture (CUDA) - are reviewed and discussed in the context of nonlinear dynamical systems characterization. In this work such characterization is performed by obtaining both local and global Lyapunov exponents for the classical forced Duffing oscillator. The local divergence measure was employed by the computation of the Lagrangian Coherent Structures (LCSs), revealing the general organization of the flow according to the obtained separatrices, while the global Lyapunov exponents were used to characterize the attractors obtained under one or more bifurcation parameters. These simulation sets also illustrate the required computation time and speedup gains provided by different parallel computing strategies, justifying the employment and the relevance of GPUs and CUDA in such extensive numerical approach. Finally, more than simply providing an overview supported by a representative set of simulations, this work also aims to be a unified introduction to the use of the mentioned parallel computing tools in the context of nonlinear dynamical systems, providing codes and examples to be executed in MATLAB and using the CUDA environment, something that is usually fragmented in different scientific communities and restricted to specialists on parallel computing strategies. (C) 2016 Elsevier B.V. All rights reserved.
This article outlines necessary steps to perform numerical orbit integrations based on a Lie series approach. Its implementation requires an efficient evaluation of resulting series coefficients. As an example we trea...
详细信息
This article outlines necessary steps to perform numerical orbit integrations based on a Lie series approach. Its implementation requires an efficient evaluation of resulting series coefficients. As an example we treat the classical main problem in satellite orbit calculation (12 only) and the case of a 4 x 4-gravity field. All calculations were performed in very high precision with up to 100 significant digits. In comparison to independent third party computations this approach led to superior results referring to the verifiable constancy of various integrals of motion. To achieve a performance similar to classical numerical integrations in terms of acceptable computing time, at least for non-Keplerian motion problems, we exploited parallel computing capabilities. For our examples, run times were improved by several orders of magnitude, depending on the actual chosen precision level (up to a factor of 50,000 in case of double precision). Here we present the mathematical framework of the proposed orbital integration scheme as well as the work flow for its application in a multi-core, parallel computing environment. (C) 2013 COSPAR. Published by Elsevier Ltd. All rights reserved.
The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is present...
详细信息
The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is presented. It has many promising characteristics such as powerful computing capability, broad I/O bandwidth, topology flexibility, and expansibility. The parallel system performance is evaluated by practical experiment.
We propose lightweight middleware solutions that facilitate and simplify the execution of failure-resilient Message Passing Interface (MPI) programs across multidomain clusters. The system described in this paper leve...
详细信息
We propose lightweight middleware solutions that facilitate and simplify the execution of failure-resilient Message Passing Interface (MPI) programs across multidomain clusters. The system described in this paper leverages H2O, a distributed metacomputing framework, to route MPI message passing across heterogeneous aggregates located in different administrative or network domains. MPI communication is aided by a specially written H2O pluglet;messages that are destined for remote sites are intercepted and transparently forwarded to their final destinations. We demonstrate that the proposed technique is indeed effective in enabling communication by MPI programs across distinct clusters and across firewalls. Only marginally lowered performance was observed in our tests, and we believe the substantially increased functionality would compensate for this overhead in most situations. In addition to enabling multicluster communications, we note that with the increasing size and distribution of metacomputing environments, fault tolerance aspects become critically important. We argue that the fault tolerance model proposed by FT-MPI fits well in geographically distributed environments, even though its current implementation is confined to a single administrative domain. We describe extensions to overcome these limitations by combining FT-MPI with the H2O framework. Our holistic approach allows users to run fault-tolerant MPI programs on heterogeneous, geographically distributed shared machines, without sacrificing performance and with minimal involvement of resource providers.
The mesh deformation method based on radial basis functions (RBF) has many advantages and is widely used. RBF based mesh deformation method mainly has two steps: data reduction and displacement interpolation. The data...
详细信息
The mesh deformation method based on radial basis functions (RBF) has many advantages and is widely used. RBF based mesh deformation method mainly has two steps: data reduction and displacement interpolation. The data reduction step includes solving interpolation weight coefficients and searching for the node with the maximum interpolation error. The data reduction schemes based on greedy algorithm is used to select an optimum reduced set of surface mesh nodes. In this paper, a parallel mesh deformation method based on parallel data reduction and displacement interpolation is proposed. The proposed recurrence Choleskey decomposition method (RCDM) can decrease the computational cost of solving interpolation weight coefficients from O (N-c(4)) to O (N-c(3)), where N-c denotes the number of support nodes. The technology of parallel computing is used to accelerate the searching for the node with the maximum interpolation error and displacement interpolation. The combination of parallel data reduction and parallel interpolation can greatly improve the efficiency of mesh deformation. Two typical deformation problems of the ONERA M6 and DLR-F6 wing-body-Nacelle-Pylon configuration are taken as the test cases to validate the proposed approach and can get up to 19.57 times performance improvement with the proposed approach. Finally, the aeroelastic response of HIRENASD wing-body configuration is used to verify the efficiency and robustness of the proposed method. (C) 2018 Elsevier Inc. All rights reserved.
A cell-mapping approach is implemented and parallelized to analyze three-body problem orbits in the vicinity of icy moons (Europa and Enceladus). The cell-mapping method is developed for studying nonlinear dynamics wi...
详细信息
A cell-mapping approach is implemented and parallelized to analyze three-body problem orbits in the vicinity of icy moons (Europa and Enceladus). The cell-mapping method is developed for studying nonlinear dynamics with periodic motions. The method does not require previously known solutions as inputs, which is an essential requirement of continuation approaches, and does not impose symmetric constraints. As major strengths of the method, multiple-period periodic solutions and bifurcation studies can be easily performed. This method is especially applicable to a systematic periodic orbit search over a region of interest using an integration time of one period. The parallelized cell-mapping method facilitates a rapid understanding of the global dynamics.
暂无评论