The modular power electronic transformer (PET) faces difficulty carrying out microsecond-level electromagnetic transient simulations. This article provides a high-speed and high-precision simulation method capable of ...
详细信息
The modular power electronic transformer (PET) faces difficulty carrying out microsecond-level electromagnetic transient simulations. This article provides a high-speed and high-precision simulation method capable of eliminating the internal nodes and reducing the order of the nodal admittance matrix. Meanwhile, the parallel computing is integrated into the whole solution process, which achieves a significant simulation speedup. A physical prototype is established to prove that the detailed model (DM) is sufficient to reflect the dynamics of physical devices. Moreover, simulations in PSCAD/EMTDC are carried out to compare the proposed method with the DM in terms of accuracy and time efficiency. Simulation results show that the proposed method is accurate to simulate the external and internal dynamics of PET with hundreds of times simulation speed acceleration.
Solving linear equations and finding eigenvalues are essential tasks in many simulations for engineering applications, but these tasks often cause performance bottlenecks. In this work, the hierarchical subspace evolu...
详细信息
Solving linear equations and finding eigenvalues are essential tasks in many simulations for engineering applications, but these tasks often cause performance bottlenecks. In this work, the hierarchical subspace evolution method (HiSEM), a hierarchical iteration framework for solving scientific computing problems with solution locality, is proposed. In HiSEM, the original problem is converted to a corresponding minimization function. The problem is decomposed into a series of subsystems. Subspaces and their weights are established for the subsystems and evolve in each iteration. The subspaces are calculated based on local equations and knowledge of physical problems. A small-scale minimization problem determines the weights of the subspaces. The solution system can be hierarchically established based on the subspaces. As the iterations continue, the degrees of freedom gradually converge to an accurate solution. Two parallel algorithms are derived from HiSEM. One algorithm is designed for symmetric positive definite linear equations, and the other is designed for generalized eigenvalue problems. The linear solver and eigensolver performance is evaluated using a series of benchmarks and a tower model with a complex topology. Algorithms derived from HiSEM can solve a super large-scale problem with high performance and good scalability.
An algorithm of angular superresolution based on the Cholesky decomposition, which is a modification of the Capon algorithm, is proposed. It is shown that the proposed algorithm makes it possible to abandon the invers...
详细信息
An algorithm of angular superresolution based on the Cholesky decomposition, which is a modification of the Capon algorithm, is proposed. It is shown that the proposed algorithm makes it possible to abandon the inversion of the covariance matrix of input signals. The proposed algorithm is compared with the Capon algorithm by the number of operations. It is established that the proposed algorithm, with a large dimension of the problem, provides some gain both when implemented on a single-threaded and multithreaded computer. Numerical estimates of the performance of the proposed and original algorithm using the Compute Unified Device Architecture (CUDA) NVidia parallel computing technology are obtained. It is established that the proposed algorithm saves GPU computing resources and is able to solve the problem of constructing a spatial spectrum when the dimensionality of the covariance matrix of input signals is almost doubled.
In large-scale parallel computing systems, machines and the network suffer from non-negligible faults, often leading to system crashes. The traditional method to increase reliability is to restart the failed jobs. To ...
详细信息
In large-scale parallel computing systems, machines and the network suffer from non-negligible faults, often leading to system crashes. The traditional method to increase reliability is to restart the failed jobs. To avoid unnecessary time wasted on reboots, we propose an optimal scheduling strategy to enable fault-tolerant reliable computation to protect the integrity of computation. Specifically, we determine the optimal redundancy-failure rate tradeoff to incorporate redundancy into parallel computing units running multiple-precision arithmetics, like the Chinese Remainder Theorem, that are useful for applications such as asymmetric cryptography and fast integer multiplication. Inspired by network coding in distributed storage for disk failures, we propose coding matrices to strategically map partial computation to available computing units, so that the central unit can reliably reconstruct the results of any failed machine without recalculations to yield the final correct computation output. We propose optimization-based algorithms to efficiently construct the optimal coding matrices subject to fault tolerance specifications. Performance evaluation demonstrates that the optimal scheduling effectively reduces the overall running time of parallel computing while resisting wide-ranging failure rates.
Sparse Tensor-Times-Matrix (SpTTM) is the core calculation in tensor analysis. The sparse distributions of different tensors vary greatly, which poses a big challenge to designing efficient and general SpTTM. In this ...
详细信息
Sparse Tensor-Times-Matrix (SpTTM) is the core calculation in tensor analysis. The sparse distributions of different tensors vary greatly, which poses a big challenge to designing efficient and general SpTTM. In this paper, we describe SpTTM on CPU-GPU heterogeneous hybrid systems and give a parallel execution strategy for SpTTM in different sparse formats. We analyze the theoretical computer powers and estimate the number of tasks to achieve the load balancing between the CPU and the GPU of the heterogeneous systems. We discuss a method to describe tensor sparse structure by graph structure and design a new graph neural network SPT-GCN to select a suitable tensor sparse format. Furthermore, we perform extensive experiments using real datasets to demonstrate the advantages and efficiency of our proposed input-aware slice-wise SpTTM. The experimental results show that our input-aware slice-wise SpTTM can achieve an average speedup of 1.310x compared to ParTI! library on a CPU-GPU heterogeneous system.
We propose a nodal stochastic generation and transmission expansion planning model that incorporates the output from high -resolution global climate models through load and generation availability scenarios. We implem...
详细信息
We propose a nodal stochastic generation and transmission expansion planning model that incorporates the output from high -resolution global climate models through load and generation availability scenarios. We implement our model in Pyomo and perform computational studies on a realistically -sized test case of the California electric grid in a high performance computing environment. We propose model reformulations and algorithm tuning to efficiently solve this large problem using a variant of the Progressive Hedging Algorithm. We utilize the parallelization capabilities and overall versatility of mpi-sppy , exploiting its hub -and -spoke architecture to concurrently obtain inner and outer bounds on an optimal expansion plan. Initial results show that instances with 360 representative days on a system with over 8,000 buses can be solved to within 5% of optimality in under 4 h of wall clock time, a first step towards solving a large-scale power system expansion planning problem across a wide range of climate -informed operational scenarios.
Sparse matrix reordering is an important step in Cholesky decomposition. By reordering the rows and columns of the matrix, the time of computation and storage cost can be greatly reduced. With the proposal of various ...
详细信息
ISBN:
(纸本)9798350359329;9798350359312
Sparse matrix reordering is an important step in Cholesky decomposition. By reordering the rows and columns of the matrix, the time of computation and storage cost can be greatly reduced. With the proposal of various reordering algorithms, the selection of suitable reordering methods for various matrices has become an important research topic. In this paper, we propose a method to predict the optimal reordering method by visualizing sparse matrices in chunks in a parallel manner and feeding them into a deep convolutional neural network. The results show that the theoretical performance can reach 95% of the optimal performance, the prediction accuracy of the method can reach up to 85%, the parallel framework achieves an average speedup ratio of 11.35 times over the serial framework, and the performance is greatly improved compared with the traversal selection method on large sparse matrices.
The advancements in computational techniques regarding parallel computing and machine learning are revolutionizing stock market prediction, This study explores the effectiveness of parallel computing architectures in ...
详细信息
ISBN:
(纸本)9798350395839;9798350395846
The advancements in computational techniques regarding parallel computing and machine learning are revolutionizing stock market prediction, This study explores the effectiveness of parallel computing architectures in predicting stock market movements. Existing literature reveals a huge shift towards employing machine learning models, especially in handling larger financial datasets, yet there remains a gap in understanding the full potential of parallel computing in this domain. Our research aims to bridge this gap by developing a comparative analysis between two Random Forest models: one utilizing parallel processing and the other based on sequential computation. Employing a comprehensive dataset that includes financial data from 2018 with 225 indicators of the US stock market, the data has been pre-processed to ensure its suitability for analysis. The methodology involves constructing and training both models, with the parallel model utilizing the multi-core capability of an Apple M1 chip and evaluating them based on accuracy and training time. The findings reveal that while both models achieve an impressive 100% accuracy, the parallel processing model significantly reduces training time, demonstrating the efficiency of parallel computing in rapid data processing. This research not only highlights the potential of parallel computing in enhancing the speed and accuracy of financial market predictions but also contributes to the broader field of financial analytics by suggesting new avenues for future research, including the application of deep learning models and the integration of a wider range of financial indicators.
PurposeThe computational efficiency of numerical solutions in structural analysis is a critical concern for researchers and scientists. In this work, the author has integrated a parallel computing algorithm and MAPLE ...
详细信息
PurposeThe computational efficiency of numerical solutions in structural analysis is a critical concern for researchers and scientists. In this work, the author has integrated a parallel computing algorithm and MAPLE within MATLAB to analyse the asymmetric vibrations of multi-directional functionally graded annular nanoplates with linearly varying thickness under thermal *** temperature-dependent material properties and nonlinear temperature profile are assumed to vary in radial and thickness directions. Being functionally graded material, the contribution of the physical neural surface has also been included. The thickness of the plate is assumed to vary linearly in the radial direction. Based on first-order shear deformation theory, Hamilton's principle produced the governing equations that are discretized by the Chebyshev polynomials to compute the fundamental frequencies. Further, the introduction of sizedependency also affects the boundary conditions, particularly, simply-supported boundary conditions have been modified to compute the correct values of fundamental *** adopted approach significantly reduced computational cost by employing the Chebyshev polynomials. The inclusion of MAPLE and parallel computing for symbolic computation drastically decreases the computational cost of the analysis. The investigation of the effect of nonlocal parameter, non-uniformity parameter, graded indexes, temperature profile, and nodal lines on the frequency parameter has also been presented. Silicon Nitride (Si3N4) and Aluminium Alloy (6061-T6Al) are adopted as ceramic and metal, respectively.
High-precision static analysis can effectively detect Null Pointer Dereference (NPD) vulnerabilities in C language, but the performance overhead is significant. In recent years, researchers have attempted to enhance t...
详细信息
ISBN:
(纸本)9798400707056
High-precision static analysis can effectively detect Null Pointer Dereference (NPD) vulnerabilities in C language, but the performance overhead is significant. In recent years, researchers have attempted to enhance the efficiency of static analysis by leveraging multicore resources. However, due to complex dependencies in the analysis process, the parallelization of static value-flow NPD analysis for large-scale software still faces significant challenges. It is difficult to achieve a good balance between detection efficiency and accuracy, which impacts its *** paper presents PANDA, the first parallel detector for high-precision static value-flow NPD analyzer in the C language. The core idea of PANDA is to utilize dependency analysis to ensure high precision while decoupling the strong dependencies between static value-flow analysis steps. This transforms the traditionally challenging-to-parallelize NPD analysis into two parallelizable algorithms: function summarization and combined query-based vulnerability analysis. PANDA introduces a task-level parallel framework and enhances it with a dynamic scheduling method to parallel schedule the above two key steps, significantly improving the performance and scalability of memory vulnerability *** implemented within the LLVM framework (version 15.0.7), PANDA demonstrates a significant advantage in balancing accuracy and efficiency compared to current popular open-source detection tools. In precision-targeted benchmark tests, PANDA maintains a false positive rate within 3.17% and a false negative rate within 5.16%;in historical CVE detection rate tests, its recall rate far exceeds that of comparative open-source tools. In performance evaluations, compared to its serial version, PANDA achieves up to an 11.23-fold speedup on a 16-node server, exhibiting outstanding scalability.
暂无评论