Novel mathematics and mathematical modelling approaches together with scalable algorithms are needed to enable key applications at extreme-scale. This is especially true as HPC systems continue to scale up in compute ...
详细信息
Novel mathematics and mathematical modelling approaches together with scalable algorithms are needed to enable key applications at extreme-scale. This is especially true as HPC systems continue to scale up in compute node and processor core count. At the moment computational scientists are at the critical point/threshold of novel mathematics development as well as large-scale algorithm development and re-design and implementation that will affect most of the application areas. Thus the paper will focus on the mathematical and algorithmic challenges and approaches towards exascale and beyond and in particular on stochastic and hybrid methods that in turn lead to scalable scientific algorithms with minimal or no global communication, hiding network and memory latency, have very high computation/communication overlap, have no synchronization points.
Novel mathematics and mathematical modelling approaches together with scalable algorithms are needed to enable key applications at extreme-scale. This is especially true as HPC systems continue to scale up in compute ...
详细信息
Novel mathematics and mathematical modelling approaches together with scalable algorithms are needed to enable key applications at extreme-scale. This is especially true as HPC systems continue to scale up in compute node and processor core count. At the moment computational scientists are at the critical point/threshold of novel mathematics development as well as large-scale algorithm development and re-design and implementation that will affect most of the application areas. Thus the paper will focus on the mathematical and algorithmic challenges and approaches towards exascale and beyond and in particular on stochastic and hybrid methods that in turn lead to scalable scientific algorithms with minimal or no global communication, hiding network and memory latency, have very high computation/communication overlap, have no synchronization points.
作者:
Jun ZhuJianfei ChenWenbo HuBo ZhangTNList Lab
State Key Lab for Intelligent Technology and Systems CBICR Center Department of Computer Science and Technology Tsinghua University
The explosive growth in data volume and the availability of cheap computing resources have sparked increasing interest in Big learning, an emerging subfield that studies scalable machine learning algorithms,systems an...
详细信息
The explosive growth in data volume and the availability of cheap computing resources have sparked increasing interest in Big learning, an emerging subfield that studies scalable machine learning algorithms,systems and applications with Big Data. Bayesian methods represent one important class of statistical methods for machine learning, with substantial recent developments on adaptive, flexible and scalable Bayesian learning. This article provides a survey of the recent advances in Big learning with Bayesian methods, termed Big Bayesian Learning, including non-parametric Bayesian methods for adaptively inferring model complexity, regularized Bayesian inference for improving the flexibility via posterior regularization, and scalable algorithms and systems based on stochastic subsampling and distributed computing for dealing with large-scale applications. We also provide various new perspectives on the large-scale Bayesian modeling and inference.
We present analytical and experimental results for fine-grained list ranking algorithms. We compare the scalability of two representative algorithms on random lists, then address the question of how the locality prope...
详细信息
We present analytical and experimental results for fine-grained list ranking algorithms. We compare the scalability of two representative algorithms on random lists, then address the question of how the locality properties of image edge lists can be used to improve the performance of this highly data-dependent operation. Starting with Wyllie's algorithm and Anderson and Miller's randomized algorithm as bases, we use the spatial locality of edge links to derive scalable algorithms designed to exploit the characteristics of image edges. Tested on actual and synthetic edge data, this approach achieves significant speedup on the MasPar MP-I and MP-2, compared to the standard list ranking algorithms. The modified algorithms exhibit good scalability and are robust across a wide variety of image types. We also show that load balancing on fine grained machines performs well only for large problem to machine size ratios.
Experimental evidence suggests that the dynamics of many physical phenomena are significantly affected by the underlying uncertainties associated with variations in properties and fluctuations in operating conditions....
详细信息
Experimental evidence suggests that the dynamics of many physical phenomena are significantly affected by the underlying uncertainties associated with variations in properties and fluctuations in operating conditions. Recent developments in stochastic analysis have opened the possibility of realistic modeling of such systems in the presence of multiple sources of uncertainties. These advances raise the possibility of solving the corresponding stochastic inverse problem: the problem of designing/estimating the evolution of a system in the presence of multiple sources of uncertainty given limited information. A scalable, parallel methodology for stochastic inverse/design problems is developed in this article. The representation of the underlying uncertainties and the resultant stochastic dependant variables is performed using a sparse grid collocation methodology. A novel stochastic sensitivity method is introduced based on multiple solutions to deterministic sensitivity problems. The stochastic inverse/design problem is transformed to a deterministic optimization problem in a larger-dimensional space that is subsequently solved using deterministic optimization algorithms. The design framework relies entirely on deterministic direct and sensitivity analysis of the continuum systems, thereby significantly enhancing the range of applicability of the framework for the design in the presence of uncertainty of many other systems usually analyzed with legacy codes. Various illustrative examples with multiple sources of uncertainty including inverse heat conduction problems in random heterogeneous media are provided to showcase the developed framework. (C) 2008 Elsevier Inc. All rights reserved.
Theoretical and experimental results concerning FETI based algorithms for contact problems of elasticity are reviewed. A discretized model problem is first reduced by the duality theory of convex optimization to the q...
详细信息
Theoretical and experimental results concerning FETI based algorithms for contact problems of elasticity are reviewed. A discretized model problem is first reduced by the duality theory of convex optimization to the quadratic programming problem with bound and equality constraints. The latter is then optionally modified by means of orthogonal projectors to the natural coarse space introduced by Farhat and Roux in the framework of their FETI method. The resulting problem is then solved either by special algorithms for bound constrained quadratic programming problems combined with penalty that imposes the equality constraints, or by an augmented Lagrangian type algorithm with the inner loop for the solution of bound constrained quadratic programming problems. Recent theoretical results are reported that guarantee certain optimality and scalability of both algorithms. The results are confirmed by numerical experiments. The performance of the algorithm in solution of more realistic engineering problems by basic algorithm is demonstrated on the solution of 3D problems with large displacements or Coulomb friction. (C) 2004 Elsevier B.V. All rights reserved.
MR-Search is a framework for massively parallel heuristic search. Based on the MapReduce paradigm, it efficiently utilizes all available resources: processors, memories, and disks. MR-Search uses OpenMP on shared memo...
详细信息
MR-Search is a framework for massively parallel heuristic search. Based on the MapReduce paradigm, it efficiently utilizes all available resources: processors, memories, and disks. MR-Search uses OpenMP on shared memory systems, Message Passing Interface on clusters with distributed memory, and a combination of both on clusters with multi-core processors. Large graphs that do not fit into the main memory can be efficiently processed with an out-of-core variant. We implemented two node expansion strategies in MR-Search: breadth-first frontier search and breadth-first iterative deepening A*. With breadth-first frontier search, we computed large and powerful table-driven heuristics, so-called pattern databases that exceed the main memory capacity. These pattern databases were then used to solve random instances of the 24-puzzle with breadth-first iterative deepening A* on systems with up to 4093 processor cores. MR-Search is conceptually simple. It takes care of data partitioning, process scheduling, out-of-core data merging, communication, and synchronization. Application developers benefit from the parallel computational capacity without having the burden of implementing parallel application code. Copyright (c)?2011 John Wiley & Sons, Ltd.
This paper considers a cell-free massive multiple-input multiple-output network (cfm-MIMO) with a massive number of access points (APs) distributed across an area to deliver information to multiple users. Based on onl...
详细信息
This paper considers a cell-free massive multiple-input multiple-output network (cfm-MIMO) with a massive number of access points (APs) distributed across an area to deliver information to multiple users. Based on only local channel state information, conjugate beamforming is used under both proper and improper Gaussian signalings. To accomplish the mission of cfm-MIMO in providing fair service to all users, the problem of power allocation to maximize the geometric mean (GM) of users' rates (GM-rate) is considered. A new scalable algorithm, which iterates linear-complex closed-form expressions and thus is practical regardless of the scale of the network, is developed for its solution. The problem of quality-of-service (QoS) aware network energy-efficiency is also addressed via maximizing the ratio of the GM-rate and the total power consumption, which is also addressed by iterating linear-complex closed-form expressions. Intensive simulations are provided to demonstrate the ability of the GM-rate based optimization to achieve multiple targets such as a uniform QoS, a good sum rate, and a fair power allocation to the APs.
Linear model trees are regression trees that incorporate linear models in the leaf nodes. This preserves the intuitive interpretation of decision trees and at the same time enables them to better capture linear relati...
详细信息
Linear model trees are regression trees that incorporate linear models in the leaf nodes. This preserves the intuitive interpretation of decision trees and at the same time enables them to better capture linear relationships, which is hard for standard decision trees. But most existing methods for fitting linear model trees are time consuming and therefore not scalable to large data sets. In addition, they are more prone to overfitting and extrapolation issues than standard regression trees. In this paper we introduce PILOT, a new algorithm for linear model trees that is fast, regularized, stable and interpretable. PILOT trains in a greedy fashion like classic regression trees, but incorporates an L2 boosting approach and a model selection rule for fitting linear models in the nodes. The abbreviation PILOT stands for PIecewise Linear Organic Tree, where 'organic' refers to the fact that no pruning is carried out. PILOT has the same low time and space complexity as CART without its pruning. An empirical study indicates that PILOT tends to outperform standard decision trees and other linear model trees on a variety of data sets. Moreover, we prove its consistency in an additive model setting under weak assumptions. When the data is generated by a linear model, the convergence rate is polynomial.
We introduce HPCx - the U.K.'s new National HPC Service - which aims to deliver a world-class service for capability computing to the U.K. scientific community. HPCx is targeting an environment that will both resu...
详细信息
We introduce HPCx - the U.K.'s new National HPC Service - which aims to deliver a world-class service for capability computing to the U.K. scientific community. HPCx is targeting an environment that will both result in world-leading science and address the challenges involved in scaling existing codes to the capability levels required. Close working relationships with scientific consortia and user groups throughout the research process will be a central feature of the service. A significant number of key user applications have already been ported to the system. We present initial benchmark results from this process and discuss the optimization of the codes and the performance levels achieved on HPCx in comparison with other systems. We find a range of performance with some algorithms scaling far better than others. Copyright (c) 2005 John Wiley & Sons, Ltd.
暂无评论