At present, network model is a general framework for the representation of complex system, and its structure is the fundamental and prerequisite for control and other applications of networked system. Due to the adven...
详细信息
At present, network model is a general framework for the representation of complex system, and its structure is the fundamental and prerequisite for control and other applications of networked system. Due to the advent of Big Data era, the network structure scale is expanding sharply. Obviously, the traditional centralized reconstruction methods require high-performance computing resources and can hardly be suitable in practice. Therefore, it is a challenge to reconstruct large-scale networks with limited resources. To resolve the problem, a distributed local reconstruction method is proposed for unweighted networks. Specifically, the local reconstruction problems of nodes are distributed to multiple computing units. ADMM is introduced for compressed sensing framework to decompose the complex reconstruction problem into multiple subproblems, so it can reduce the high requirement of computing resources. Through parallelcomputing, network reconstruction subproblems are solved simultaneously. In addition, to further guarantee the reconstruction accuracy, a binary constraint is introduced based on the characteristics obtained by analyzing the network structure. Finally, extensive experiments are conducted to demonstrate the superiority of the proposed method. Compared with some state-of-the-art methods, the proposed method can reconstruct networks of different scales and types with limited computing resources, and it is accurate and robust against noise.
Efficiently solving the traffic assignment problem (TAP) for large-scale transport networks is a critical problem for transportation studies. Most of the existing algorithms for TAP are serial ones based on single-com...
详细信息
Efficiently solving the traffic assignment problem (TAP) for large-scale transport networks is a critical problem for transportation studies. Most of the existing algorithms for TAP are serial ones based on single-computer mode, which has inherently limited the computational efficiency, compared with parallelcomputing methods. Thus, this paper aims to propose an efficient distributed multi-computer cluster resource allocation method for the parallelcomputing of TAP. Previous studies on the parallelcomputing of TAP are mainly based on a single -mode, which is extended to a more complex combined modal split and traffic assignment (CMSTA) case in this paper. In order to decompose the CMSTA problem, we proposed a block-decomposed model for solving the CMSTA problem. Then we designed an optimal parallelcomputing resource schedule for solving each block problem more quickly on the huge transportation network. Therefore, we implemented a customized two-stage parallel (TP) algorithm that can fully use parallel resources. The first parallel stage of the TP algorithm is used in the path generation phase, and the second parallel stage is used in the path flow adjustment phase. Besides, the parallel slowdown is uncovered in calculating each block problem of the path flow adjustment phase by using parallel resources. Numerical examples are taken to validate the efficiency and robustness of the proposed TP algorithm.
A large number of reads generated by the next generation sequencing platform will contain many repetitive subsequences. Effective localizing and identifying genomic regions containing repetitive subsequences will cont...
详细信息
ISBN:
(纸本)9781665496391
A large number of reads generated by the next generation sequencing platform will contain many repetitive subsequences. Effective localizing and identifying genomic regions containing repetitive subsequences will contribute to the subsequent genomic data analysis. To accelerate the alignment between large-scale short reads and reference genome with many repetitive subsequences, this paper develops a compact de Bruijn graph based short-read alignment algorithm on distributed parallel computing platform. The algorithm uses resilient distributed data sets (RDDS) to perform calculations in memory, and executes the broadcast method to distribute short reads and reference genome to the computing nodes to reduce the data communication time on the cluster system, and the number of RDD partitions is set to optimize the performance of parallel aligning algorithm. Experimental results on real datasets show that compared with the compact de Bruijn graph based sequential short-read alignment algorithm, our implemented distributedparallel alignment algorithm achieves good acceleration on the premise of obtaining the same correct alignment percentage as a whole, and compared with existing distributedparallel alignment algorithms, the implemented parallel algorithm can more quickly complete the alignment between large-scale short reads and reference genome with highly repetitive subsequences.
Quantum state tomography (QST) allows for the reconstruction of quantum states through measurements and some inference technique under the assumption of repeated state preparations. Bayesian inference provides a promi...
详细信息
Quantum state tomography (QST) allows for the reconstruction of quantum states through measurements and some inference technique under the assumption of repeated state preparations. Bayesian inference provides a promising platform to achieve both efficient QST and accurate uncertainty quantification, yet is generally plagued by the computational limitations associated with long Markov chains. In this work, we present a novel Bayesian QST approach that leverages modern distributedparallel computer architectures to efficiently sample a D-dimensional Hilbert space. Using a parallelized preconditioned Crank-Nicholson Metropolis-Hastings algorithm, we demonstrate our approach on simulated data and experimental results from IBM Quantum systems up to four qubits, showing significant speedups through parallelization. Although highly unorthodox in pooling independent Markov chains, our method proves remarkably practical, with validation ex post facto via diagnostics like the intrachain autocorrelation time. We conclude by discussing scalability to higher-dimensional systems, offering a path toward efficient and accurate Bayesian characterization of large quantum systems.
The physiological signal acquisition and analyzing are important for intelligent health services, human-computer interaction and other applications. Due to the computing power limitation of terminal devices, many anal...
详细信息
The physiological signal acquisition and analyzing are important for intelligent health services, human-computer interaction and other applications. Due to the computing power limitation of terminal devices, many analyzing methods of physiological signals are in offline mode. However, in many applications, physiological signal should be analyzed in real time. To overcome this problem, a real-time physiological signal acquisition and analysis method based on fractional calculus and stream computing is proposed. Mobile terminals read the physiological data from sensors and upload them to the stream computing platform. A fractal index is used to estimate the physiological status. Based on the stream computing platform, this index is calculated by distributed parallel computing. The experiment results show this method can distinguish the heart health status and reflect driver mental status to some extent.
We present a variant of the immersed boundary method integrated with octree meshes for highly efficient and accurate Large Eddy Simulations (LES) of flows around complex geometries. We demonstrate the scalability of t...
详细信息
We present a variant of the immersed boundary method integrated with octree meshes for highly efficient and accurate Large Eddy Simulations (LES) of flows around complex geometries. We demonstrate the scalability of the proposed method up to theta(32K) processors. This is achieved by (a) rapid in-out tests;(b) adaptive quadrature for an accurate evaluation of forces;(c) tensorized evaluation during matrix assembly. We showcase this method on two non-trivial applications: accurately computing the drag coefficient of a sphere across Reynolds numbers 1 - 10(6) encompassing the drag crisis regime;simulating flow features across a semi-truck for investigating the effect of platooning on efficiency.
Short-term Load Forecasting (STLF) is the basis of smart distribution network system operation, planning, and dispatching. The traditional linear regression prediction method has the problems of slow prediction speed ...
详细信息
Short-term Load Forecasting (STLF) is the basis of smart distribution network system operation, planning, and dispatching. The traditional linear regression prediction method has the problems of slow prediction speed and low prediction accuracy. In order to solve the problem, an improved regression model based on mini-batch stochastic gradient descent is proposed in this paper. Combined with the big data analysis and processing platform, the collected data is conformed, and the parallelcomputing model Map-Reduce is used to parallelize mini-batch stochastic gradient descent algorithm for improving the processing ability of mini-batch stochastic gradient descent algorithm in big data load forecasting, and shorten load forecasting time. Meanwhile, in order to clean up the duplicated data and bad data generated by the smart meter and sensor before calculation, an adaptive sorted neighborhood method is proposed to detect the repeatedly recorded data, and the K-means clustering method is used to eliminate the noise data .The experimental results show that the parallelized minibatch stochastic gradient descent algorithm is much faster than the traditional regression analysis algorithm when the data volume is large. The average absolute percentage error of the load forecasting model for Belgium and a transformer station in Baiyin city of Gansu Province in China is 1.902% and 2.058% respectively, which satisfies the requirements of load forecasting.
distributed energy resource (DER) including wind power, solar energy and energy storage system (ESS) are connected to the active distribution network (ADN) in various combination ways, which makes the distribution net...
详细信息
distributed energy resource (DER) including wind power, solar energy and energy storage system (ESS) are connected to the active distribution network (ADN) in various combination ways, which makes the distribution network have interaction. As a bridge connecting the transmission grid (TG) and micro grid (MG), ADN breaks the traditional operation pattern of TG + ADN + MG. Considering the physical connections and shared information among TG, ADN and MG, this paper proposes a decentralized and parallel analytical target cascading (ATC) algorithm for interactive unit commitment (UC) implementation in regional power systems. To explore the synergistic ability of the TG + ADN + MG coping with uncertainties of DER, i.e., wind power, the primary and secondary frequency regulation of TG are implemented to cope with uncertainties. Furthermore, the distributional uncertainty of wind power is well modeled by data driven, which is proposed in our previous work (Zhang et al., 2019) [1]. Both the startup/shutdown variables of the thermal units and the variables in TG + ADN + MG are integrated into the multi-level interactive UC model to optimize simultaneously, thus realizing the optimal goal of the whole network, resources complementary and optimal allocation of power system. An improved 6-bus system is used to test the proposed model, the numerical results show that the proposed decentralized algorithm is a fully parallelized procedure. And it also demonstrates the parallel implementation significantly enhances computations efficiency of the ATC algorithm.
This study was devoted to investigating stochastic model updating in a Bayesian inference framework based on a frequency response function (FRF) vector without any post processing such as smoothing and windowing. The ...
详细信息
This study was devoted to investigating stochastic model updating in a Bayesian inference framework based on a frequency response function (FRF) vector without any post processing such as smoothing and windowing. The statistics of raw FRFs were inferred with a multivariate complex-valued Gaussian ratio distribution. The likelihood function was formulated by embedding the theoretical FRFs that contained the model parameters to be updated in the class of the probability model of the raw FRFs. The Transitional Markov chain Monte Carlo (TMCMC) used to sample the posterior probability density function implies considerable computational toll because of the large batch of repetitive analyses of the forward model and the increasing expense of the likelihood function calculations with large-scale loop operations. The vectorized formula was derived analytically to avoid time-consuming loop operations involved in the likelihood function evaluation. Furthermore, a distributed parallel computing scheme was developed to allow the TMCMC stochastic simulation to run across multiple CPU cores on multiple computers in a network. The case studies demonstrated that the fast-computational scheme could exploit the availability of high-performance computing facilities to drastically reduce the time-to-solution. Finally, parametric analysis was utilized to illustrate the uncertainty propagation properties of the model parameters with the variations of the noise level, sampling time, and frequency bandwidth. (c) 2021 Elsevier Ltd. All rights reserved.
Training large deep neural networks is timeconsuming and may take days or even weeks to complete. Although parameter-server-based approaches were initially popular in distributed training, scalability issues led the f...
详细信息
ISBN:
(纸本)9781728199986
Training large deep neural networks is timeconsuming and may take days or even weeks to complete. Although parameter-server-based approaches were initially popular in distributed training, scalability issues led the field to move towards all -reduce -based approaches. Recent developments in cloud networking technologies, however, such as the Elastic Fabric Adapter (EFA) and Scalable Reliable Datagram (SRD), motivate a re -thinking of the parameter-server approach to address its fundamental inefficiencies. To this end, we introduce a novel communication library, Herring, which is designed to alleviate the performance bottlenecks in parameter-server-based training. We show that gradient reduction with Herring is twice as fast as all -reduce -based methods. We further demonstrate that training deep learning models like BERTiarge using Herring outperforms all -reduce -based training, achieving 85% scaling efficiency on large clusters with up to 2048 MINA V100 GPUs without accuracy drop.
暂无评论