Traditionally, reducing complexity in Machine Learning promises benefits such as less overfitting. However, complexity control in Genetic Programming (GP) often means reducing the sizes of the evolving expressions, an...
详细信息
ISBN:
(纸本)9781450368667
Traditionally, reducing complexity in Machine Learning promises benefits such as less overfitting. However, complexity control in Genetic Programming (GP) often means reducing the sizes of the evolving expressions, and past literature shows that size reduction does not necessarily reduce overfitting. In fact, whether size consistently represents complexity is itself debatable. Therefore, this paper proposes evaluation time of an evolving model - the computational time required to evaluate a model on data - as the estimate of its complexity. Evaluation time depends upon the size, but crucially also on the composition of an evolving model, and can thus distil its underlying complexity. To discourage complexity, this paper takes an innovative approach that asynchronously evaluates multiple models concurrently. These models race to their completion;thus, those models that finish earlier, join the population earlier to breed further in a steady-state fashion. Thus, the computationally simpler models, even if less accurate, get further chances to evolve before the more accurate yet expensive models join the population. Crucially, since evaluation times vary from one execution to another, this paper also shows how to significantly minimise this variation. The paper compares the proposed method on six challenging symbolic regression problems with both standard GP and GP with an effective bloat control method. The results demonstrated that the proposed asynchronous parallel GP (APGP) indeed produces individuals that are smaller, faster and more accurate than those in standard GP. While GP with bloat control (GP+BC) produced smaller individuals, it did so at the cost of lower accuracy than APGP both on training and test data, thus questioning the overall benefits of bloat control. Also, while APGP took the fewest evaluations to match the training accuracy of GP, GP+BC took the most. These results, and the portability of evaluation time as an estimate of complexity encourage f
To solve the inter-node bandwidth bottleneck in parallel computing systems, we propose a wavelength-routing inter-node interconnect "Optical Hub". The physical topology of Optical Hub is star network, which ...
详细信息
ISBN:
(纸本)9781450372367
To solve the inter-node bandwidth bottleneck in parallel computing systems, we propose a wavelength-routing inter-node interconnect "Optical Hub". The physical topology of Optical Hub is star network, which leads to advantages in term of its throughput, size, energy consumption and life-time cost. The logical topology is full-mesh network, which leads to advantages in term of its latency and reliability. We introduced multi-path routings, which expand the effective bandwidth with the full-mesh topology such as Optical Hub, by replacing conventional MPI functions with our wrapper functions. We simulated execution time of parallel benchmarks on the parallel computing system with Optical Hub using parallel computing simulator SimGrid. As a result, we have confirmed that the parallel computing system with Optical Hub can achieve higher performance and lower energy consumption than conventional ones. We also examined the scalability of Optical Hub and showed that recursive hierarchical configurations of Optical Hub can save cable count drastically in case of large number of nodes against Dragonfly networks.
Background: Single-cell RNA-Sequencing (scRNA-Seq) has provided single-cell level insights into complex biological processes. However, the high frequency of gene expression detection failures in scRNA-Seq data make it...
详细信息
Background: Single-cell RNA-Sequencing (scRNA-Seq) has provided single-cell level insights into complex biological processes. However, the high frequency of gene expression detection failures in scRNA-Seq data make it challenging to achieve reliable identification of cell-types and Differentially Expressed Genes (DEG). Moreover, with the explosive growth of single-cell data using 10x genomics protocol, existing methods will soon reach the computation limit due to scalability issues. The single-cell transcriptomics field desperately need new tools and framework to facilitate large-scale single-cell analysis. Results: In order to improve the accuracy, robustness, and speed of scRNA-Seq data processing, we propose a generalized zero-inflated negative binomial mixture model, "JOINT," that can perform probability-based cell-type discovery and DEG analysis simultaneously without the need for imputation. JOINT performs soft-clustering for cell-type identification by computing the probability of individual cells, i.e. each cell can belong to multiple cell types with different probabilities. This is drastically different from existing hard-clustering methods where each cell can only belong to one cell type. The soft-clustering component of the algorithm significantly facilitates the accuracy and robustness of single-cell analysis, especially when the scRNA-Seq datasets are noisy and contain a large number of dropout events. Moreover, JOINT is able to determine the optimal number of cell-types automatically rather than specifying it empirically. The proposed model is an unsupervised learning problem which is solved by using the Expectation and Maximization (EM) algorithm. The EM algorithm is implemented using the TensorFlow deep learning framework, dramatically accelerating the speed for data analysis through parallel GPU computing. Conclusions: Taken together, the JOINT algorithm is accurate and efficient for large-scale scRNA-Seq data analysis via parallel computing. The Py
The integration of different energy resources from traditional power systems presents new challenges for real-time implementation and operation. In the last decade, a way has been sought to optimize the operation of s...
详细信息
The integration of different energy resources from traditional power systems presents new challenges for real-time implementation and operation. In the last decade, a way has been sought to optimize the operation of small microgrids (SMGs) that have a great variety of energy sources (PV (photovoltaic) prosumers, Genset CHP (combined heat and power), etc.) with uncertainty in energy production that results in different market prices. For this reason, metaheuristic methods have been used to optimize the decision-making process for multiple players in local and external markets. Players in this network include nine agents: three consumers, three prosumers (consumers with PV capabilities), and three CHP generators. This article deploys metaheuristic algorithms with the objective of maximizing power market transactions and clearing price. Since metaheuristic optimization algorithms do not guarantee global optima, an exhaustive search is deployed to find global optima points. The exhaustive search algorithm is implemented using a parallel computing architecture to reach feasible results in a short amount of time. The global optimal result is used as an indicator to evaluate the performance of the different metaheuristic algorithms. The paper presents results, discussion, comparison, and recommendations regarding the proposed set of algorithms and performance tests.
parallel computing contributes significantly to most disciplines for solving several scientific problems such as partial differential equations (PDEs), load balancing, and deep learning. The primary characteristic of ...
详细信息
ISBN:
(纸本)9783030451820;9783030451837
parallel computing contributes significantly to most disciplines for solving several scientific problems such as partial differential equations (PDEs), load balancing, and deep learning. The primary characteristic of parallelism is its ability to ameliorate performance on many different sets of computers. Consequently, many researchers are continually expending their efforts to produce efficient parallel solutions for various problems such as heat equation. Heat equation is a natural phenomenon used in many fields like mathematics and physics. Usually, its associated model is defined by a set of partial differential equations (PDEs). This paper is primarily aimed at showing two parallel programs for solving the heat equation which has been discrete-sized using the finite difference method (FDM). These programs have been implemented through different parallel platforms such as SkelGIS and Compute Unified Device Architecture (CUDA).
Spatial neighboring analysis is an indispensable part of geo-raster spatial analysis. In the big data era, high-resolution raster data offer us abundant and valuable information, and also bring enormous computational ...
详细信息
Spatial neighboring analysis is an indispensable part of geo-raster spatial analysis. In the big data era, high-resolution raster data offer us abundant and valuable information, and also bring enormous computational challenges to the existing focal statistics algorithms. Simply employing the in-memory computing framework Spark to serve such applications might incur performance issues due to its lack of native support for spatial data. In this article, we present a Spark-based parallel computing approach for the focal algorithms of neighboring analysis. This approach implements efficient manipulation of large amounts of terrain data through three steps: (1) partitioning a raster digital elevation model (DEM) file into multiple square tile files by adopting a tile-based multifile storing strategy suitable for the Hadoop Distributed File System (HDFS), (2) performing the quintessential slope algorithm on these tile files using a dynamic calculation window (DCW) computing strategy, and (3) writing back and merging the calculation results into a whole raster file. Experiments with the digital elevation data of Australia show that the proposed computing approach can effectively improve the parallel performance of focal statistics algorithms. The results also show that the approach has almost the same calculation accuracy as that of ArcGIS. The proposed approach also exhibits good scalability when the number of Spark executors in clusters is increased.
We study different parallelization schemes for the stochastic dual dynamic programming (SDDP) algorithm. We propose a taxonomy for these parallel algorithms, which is based on the concept of parallelizing by scenario ...
详细信息
We study different parallelization schemes for the stochastic dual dynamic programming (SDDP) algorithm. We propose a taxonomy for these parallel algorithms, which is based on the concept of parallelizing by scenario and parallelizing by node of the underlying stochastic process. We develop a synchronous and asynchronous version for each configuration. The parallelization strategy in the parallelscenario configuration aims at parallelizing the Monte Carlo sampling procedure in the forward pass of the SDDP algorithm, and thus generates a large number of supporting hyperplanes in parallel. On the other hand, the parallel-node strategy aims at building a single hyperplane of the dynamic programming value function in parallel. The considered algorithms are implemented using Julia and JuMP on a high performance computing cluster. We study the effectiveness of the methods in terms of achieving tight optimality gaps, as well as the scalability properties of the algorithms with respect to an increasing number of CPUs. In particular, we study the effects of the different parallelization strategies on performance when increasing the number of Monte Carlo samples in the forward pass, and demonstrate through numerical experiments that such an increase may be harmful. Our results indicate that a parallel-node strategy presents certain benefits as compared to a parallel-scenario configuration.
Large-scale scientific code plays an important role in scientific researches. In order to facilitate module and element evaluation in scientific applications, we introduce a unit testing framework and describe the dem...
详细信息
ISBN:
(纸本)9783030227418;9783030227401
Large-scale scientific code plays an important role in scientific researches. In order to facilitate module and element evaluation in scientific applications, we introduce a unit testing framework and describe the demand for module-based experiment customization. We then develop a parallel version of the unit testing framework to handle long-term simulations with a large amount of data. Specifically, we apply message passing based parallelization and I/O behavior optimization to improve the performance of the unit testing framework and use profiling result to guide the parallel process implementation. Finally, we present a case study on litter decomposition experiment using a standalone module from a large-scale Earth System Model. This case study is also a good demonstration on the scalability, portability, and high-efficiency of the framework.
The real-time tracking process of dim targets in space is mainly achieved through the correlation and prediction of dots after the detection and calculation process. The on-board calculation of the tracking needs to b...
详细信息
The real-time tracking process of dim targets in space is mainly achieved through the correlation and prediction of dots after the detection and calculation process. The on-board calculation of the tracking needs to be completed in milliseconds, and it needs to reach the microsecond level at high frame rates. For real-time tracking of dim targets in space, it is necessary to achieve universal tracking calculation acceleration in response to different space regions and complex backgrounds, which poses high requirements for engineering implementation architecture. This paper designs a Kalman filter calculation based on digital logic parallel acceleration architecture for real-time solution of dim target tracking on-board. A unified architecture of Vector Processing Element (VPE) was established for the calculation of Kalman filtering matrix, and an array computing structure based on VPE was designed to decompose the entire filtering process and form a parallel pipelined data stream. The prediction errors under different fixed point bit widths were analyzed and deduced, and the guidance methods for selecting the optimal bit width based on the statistical results were provided. The entire design was engineered based on Xilinx's XC7K325T, resulting in an energy efficiency improvement compared to previous designs. The single iteration calculation time does not exceed 0.7 microseconds, which can meet the current high frame rate target tracking requirements. The effectiveness of this design has been verified through simulation of random trajectory data, which is consistent with the theoretical calculation error.
In this paper, we propose a scalable massivelyparallel algorithm to solve the general mapping problem in large-scale networks in real-time. The proposed parallel algorithm takes advantage of GPU architecture and launc...
详细信息
ISBN:
(纸本)9781538670248
In this paper, we propose a scalable massivelyparallel algorithm to solve the general mapping problem in large-scale networks in real-time. The proposed parallel algorithm takes advantage of GPU architecture and launches millions of workers to calculate values on a target network simultaneously. Threads are managed through the SIMT execution model and target values are updated through atomic operations. Our experiments show the proposed algorithm can accomplish network mapping (find importance weights for links in a real-world large-scale shared-mobility network) with more than 2 million weights within 1.82 mu s (microsecondlevel), which is truly real-time. The algorithm performance suggests that mapping computations may no longer be the bottleneck in highly dynamic network-centered problems, as the computations can be completed faster than the solid state drive (SSD) read access latency. Compared to serial algorithms, the speedup is more than 12,000 times. The proposed algorithm is also scalable. Results on simulated data show that even when the network size grows exponentially, microsecond-level computing performance can still be obtained, and even more than 190,000 times speedup can be achieved. The proposed algorithm can serve as a cornerstone for ultra-fast processing of highly dynamic large-scale networks.
暂无评论