This paper developed a scalable parallel computing method that can be used for platoon simulations and controller validations. A scalable adaptive platooning control law was firstly designed, which accommodates a vari...
详细信息
This paper developed a scalable parallel computing method that can be used for platoon simulations and controller validations. A scalable adaptive platooning control law was firstly designed, which accommodates a variety of vehicle-to-vehicle communication topologies. A road vehicle dynamics model that considered the Magic Formula tyre model and suspension dynamics was then derived and validated. The parallel computing method adopted the Message Passing Interface technique to allow fast and scalable simulations. Platoon length changes do not require controller and algorithm changes. An 11-vehicle platoon on a real-world 10 km long road section was simulated. Different localisation sensor errors, communication delays, heterogenous vehicle masses and driving modes were considered. Results show that localisation errors have negligible influences on space errors. Aggressive driving and heterogeneous vehicle masses slightly increase space errors (increases less than 0.23 m). Communication delays are the greatest influencer for space errors. Increases for 15, 45 and 75 ms delays were 0.43, 1.41 and 2.41 m, respectively. It is further shown that parallel computing can improve the computing speed by three times on personal computers and seven to 12 times on workstations.
In this work, an improved domain decomposition method is developed to address workload imbalance when implementing the parallel computing of a four-dimensional lattice spring model (4D-LSM) to solve problems in rock e...
详细信息
In this work, an improved domain decomposition method is developed to address workload imbalance when implementing the parallel computing of a four-dimensional lattice spring model (4D-LSM) to solve problems in rock engineering on a large scale. A cubic domain decomposition scheme is adopted and optimized by a simulated annealing algorithm (SAA) to minimize the workload imbalance among subdomains. The improved domain decomposition method is implemented in the parallel computing of the 4D-LSM. Numerical results indicate that the proposed domain decomposition method can further improve the workload balance among processors, which is helpful to supersede the limit of computational scale when solving large-scale geotechnical problems and decrease the runtime of the parallel 4D-LSM by at most 40% compared to the original cubic decomposition method. This shows the practicability of the proposed method in parallel computing. Two types of target functions of SAA are tested, and their influence on the performance of the parallel 4D-LSM is investigated. Finally, a computational model with one billion particles for one actual engineering application of using 4D-LSM is realized, and the result shows the advantages of parallel computing.
The purpose of this work is to develop a mathematical apparatus and computational algorithms for implementation of parallel computing in geometric modeling and computer-aided design (CAD) systems. The analysis of exis...
详细信息
The purpose of this work is to develop a mathematical apparatus and computational algorithms for implementation of parallel computing in geometric modeling and computer-aided design (CAD) systems. The analysis of existing approaches to parallel computing implementation in CAD systems is carried out. As a result, it is found that most information modeling and CAD systems do not support parallel computing at the level of the geometric kernel. A concept for the development of a CAD geometric kernel based on the invariants of parallel projection of geometric objects onto the axes of the global coordinate system is proposed. It combines the potential of constructive methods for geometric modeling, capable of parallelizing geometric constructions by tasks (message passing), and the mathematical apparatus of point calculus, capable of parallelization by data through coordinate-by-coordinate calculation (data parallel). The use of the coordinate-by-coordinate calculation for point equations not only makes it possible to parallelize computations along coordinate axes, but also ensures the consistency of computational operations with respect to threads, which significantly reduces the idle time and optimizes the CPU operation to achieve the maximum effect from the use of parallel computing.
Topology optimization is often used in the conceptual design stage as a preprocessing tool to obtain overall material distribution in the solution domain. The resulting topology is then used as an initial guess for sh...
详细信息
Topology optimization is often used in the conceptual design stage as a preprocessing tool to obtain overall material distribution in the solution domain. The resulting topology is then used as an initial guess for shape optimization. It is always desirable to use fine computational grids to obtain high-resolution layouts that minimize the need for shape optimization and postprocessing (Bendsoe and Sigmund, Topology optimization theory, methods and applications. Springer, Berlin Heidelberg New York 2003), but this approach results in high computation cost and is prohibitive for large structures. In the present work, parallel computing in combination with domain decomposition is proposed to reduce the computation time of such problems. The power law approach is used as the material distribution method, and an optimality criteria-based optimizer is used for locating the optimum solution [Sigmund (2001)21:120-127;Rozvany and Olhoff, Topology optimization of structures and composites continua. Kluwer, Norwell 2000]. The equilibrium equations are solved using a preconditioned conjugate gradient algorithm. These calculations have been done using a master-slave programming paradigm on a coarse-grain, multiple instruction multiple data, shared-memory architecture. In this study, by avoiding the assembly of the global stiffness matrix, the memory requirement and computation time has been reduced. The results of the current study show that the parallel computing technique is a valuable tool for solving computationally intensive topology optimization problems.
The grid-based Xin'anjiang model (GXM) has been widely applied to flood forecasting. However, when the model warm-up period is long and the amount of input data is large, the computational efficiency of the GXM is...
详细信息
The grid-based Xin'anjiang model (GXM) has been widely applied to flood forecasting. However, when the model warm-up period is long and the amount of input data is large, the computational efficiency of the GXM is obviously low. Therefore, a GXM parallel algorithm based on grid flow direction division is proposed from the perspective of spatial parallelism, which realizes the parallel computing of the GXM by extracting the parallel routing sequence of the watershed grids. To solve data skew, a DAG scheduling algorithm based on dynamic priority is proposed for task scheduling. The proposed GXM parallel algorithm is verified in the Qianhe River watershed of Shaanxi Province and the Tunxi watershed of Anhui Province. The results show that the GXM parallel algorithm based on grid flow direction division has good flood forecasting accuracy and higher computational efficiency than the traditional serial computing method. In addition, the DAG scheduling algorithm can effectively improve the parallel efficiency of the GXM.
To segment regions of interest (ROIs) from ultrasound images, one novel dynamic texture based algorithm is presented with surfacelet transform, hidden Markov tree (HMT) model and parallel computing. During surfacelet ...
详细信息
To segment regions of interest (ROIs) from ultrasound images, one novel dynamic texture based algorithm is presented with surfacelet transform, hidden Markov tree (HMT) model and parallel computing. During surfacelet transform, the image sequence is decomposed by pyramid model, and the 3D signals with high frequency are decomposed by directional filter banks. During HMT modeling, distribution of coefficients is described with Gaussian mixture model (GMM), and relationship of scales is described with scale continuity model. From HMT parameters estimated through expectation maximization, the joint probability density is calculated and taken as feature value of image sequence. Then ROIs and non-ROIs in collected sample videos are used to train the support vector machine (SVM) classifier, which is employed to identify the divided 3D blocks from input video. To improve the computational efficiency, parallel computing is implemented with multi-processor CPU. Our algorithm has been compared with the existing texture based approaches, including gray level co-occurrence matrix (GLCM), local binary pattern (LBP), Wavelet, for ultrasound images, and the experimental results prove its advantages of processing noisy ultrasound images and segmenting higher accurate ROIs.
This paper presents a parallel program for assessing the codetermination of gene transcriptional states from large-scale simultaneous gene expression measurements with cDNA microarrays. The parallel program is based o...
详细信息
ISBN:
(纸本)0819439444
This paper presents a parallel program for assessing the codetermination of gene transcriptional states from large-scale simultaneous gene expression measurements with cDNA microarrays. The parallel program is based on a nonlinear statistical framework recently proposed for the analysis of gene interaction via multivariate expression arrays. parallel computing is key in the application of the statistical framework to a large set of genes because a prohibitive amount of computer time is required on a classical single-CPU machine. Our parallel program, named the parallel Analysis of Gene Expression (PAGE) program, exploits inherent parallelism exhibited in the proposed codetermination prediction models. By running PAGE on 64 processors in Beowulf, a clustered parallel system, an analysis of melanoma cDNA microarray expression data has been completed within 12 days of computer time, an analysis that would have required about one and half years on a single-CPU computing system. A data visualization program, named the Visualization of Gene Expression (VOGE) program, has been developed to help interpret the massive amount of quantitative information produced by PAGE. VOGE provides graphical data visualization and analysis tools with filters, histograms, and accesses to other genetic databanks for further analyses of the quantitative information.
parallel computing is currently used in many engineering problems. However, because of limitations in curriculum design, it is not always possible to offer students specific formal teaching in PF this topic. Furthermo...
详细信息
parallel computing is currently used in many engineering problems. However, because of limitations in curriculum design, it is not always possible to offer students specific formal teaching in PF this topic. Furthermore, parallel machines are still too expensive for many institutions. The latest microprocessors, such as Intel's Pentium III and IV, embody single instruction multiple-data (SIMD) type parallel features, which makes them a viable solution for introducing parallel computing concepts to students. Final year projects have been initiated utilizing SSE (streaming SIMD extensions) features and it has been observed that students can easily learn parallel programming concepts after going through some programming exercises. They can now experiment with parallel algorithms on their own PCs at home.
The placement of elemental operations (as opposed to data) of a data-driven data-parallel computation in a network of processors is examined. A fast suboptimal algorithm is proposed for such placement which tends to e...
详细信息
The placement of elemental operations (as opposed to data) of a data-driven data-parallel computation in a network of processors is examined. A fast suboptimal algorithm is proposed for such placement which tends to examined. A fast suboptimal algorithm is proposed for such placement which tends to minimise the overall network load when the computation is essentially nonlocal. The cases of grid, torus and hypercube topology are considered. It is shown that the proposed algorithm, while having moderate computational complexity, demonstrates up to a 50% reduction in required network throughput over some straightforward placement schemes in the practical range of network sizes.
The goal of ranking and selection (R&S) procedures is to identify the best stochastic system from among a finite set of competing alternatives. Such procedures require constructing estimates of each system's p...
详细信息
The goal of ranking and selection (R&S) procedures is to identify the best stochastic system from among a finite set of competing alternatives. Such procedures require constructing estimates of each system's performance, which can be obtained simultaneously by running multiple independent replications on a parallel computing platform. Nontrivial statistical and implementation issues arise when designing R&S procedures for a parallel computing environment. We propose several design principles for parallel R&S procedures that preserve statistical validity and maximize core utilization, especially when large numbers of alternatives or cores are involved. These principles are followed closely by our parallel Good Selection Procedure (GSP), which, under the assumption of normally distributed output, (i) guarantees to select a system in the indifference zone with high probability, (ii) in tests on up to 1,024 parallel cores runs efficiently, and (iii) in an example uses smaller sample sizes compared to existing parallel procedures, particularly for large problems (over 106 alternatives). In our computational study we discuss three methods for implementing GSP on parallel computers, namely the Message-Passing Interface (MPI), Hadoop MapReduce, and Spark, and show that Spark provides a good compromise between the efficiency of MPI and robustness to core failures.
暂无评论