In the current work the authors present several approaches to the high performance simulation of human diseases propagation using hybrid two-component imitational models. The models under study were created by couplin...
详细信息
In the current work the authors present several approaches to the high performance simulation of human diseases propagation using hybrid two-component imitational models. The models under study were created by coupling compartmental and discrete-event submodels. The former is responsible for the simulation of the demographic processes in a population while the latter deals with a disease progression for a certain individual. The number and type of components used in a model may vary depending on the research aims and data availability. The introduced high performance approaches are based on batch random number generation, distribution of simulation runs and the calculations on graphical processor units. The emphasis was made on the possibility to use the approaches for various model types without considerable code refactoring for every particular model. The speedup gained was measured on simulation programs written in C++ and MATLAB for the models of HIV and tuberculosis spread and the models of tumor screening for the prevention of colorectal cancer. The benefits and drawbacks of the described approaches along with the future directions of their development are discussed.
High Performance computing (HPC) has been a dominated technology used in seismic data processing at the petroleum industry. However, with the increasing data size and varieties, traditional HPC focusing on computation...
详细信息
ISBN:
(纸本)9781479999255
High Performance computing (HPC) has been a dominated technology used in seismic data processing at the petroleum industry. However, with the increasing data size and varieties, traditional HPC focusing on computation meets new challenges. Researchers are looking for new computing platforms with a balance of both performance and productivity, as well as featured with big data analytics capability. Apache Spark is a new big data analytics platform that supports more than map/reduce parallel execution mode with good scalability and fault tolerance. In this paper, we try to answer the question that if Apache Spark is scalable to process seismic data with its in-memory computation and data locality features. We use a few typical seismic data processing algorithms to study the performance and productivity. Our contributions include customized seismic data distributions in Spark, extraction of commonly used templates for seismic data processing algorithms, and performance analysis of several typical seismic processing algorithms.
As a fundamental tool in modeling and analyzing social, and information networks, large-scale graph mining is an important component of any tool set for big data analysis. Processing graphs with hundreds of billions o...
详细信息
ISBN:
(纸本)9781450333177
As a fundamental tool in modeling and analyzing social, and information networks, large-scale graph mining is an important component of any tool set for big data analysis. Processing graphs with hundreds of billions of edges is only possible via developing distributed algorithms under distributed graph mining frameworks such as MapReduce, Pregel, Gigraph, and alike. For these distributed algorithms to work well in practice, we need to take into account several metrics such as the number of rounds of computation and the communication complexity of each round. For example, given the popularity and ease-of-use of MapReduce framework, developing practical algorithms with good theoretical guarantees for basic graph algorithms is a problem of great importance. In this tutorial, we first discuss how to design and implement algorithms based on traditional MapReduce architecture. In this regard, we discuss various basic graph theoretic problems such as computing connected components, maximum matching, MST, counting triangle and overlapping or balanced clustering. We discuss a computation model for MapReduce and describe the sampling, filtering, local random walk, and core-set techniques to develop efficient algorithms in this framework. At the end, we explore the possibility of employing other distributed graph processing frameworks. In particular, we study the effect of augmenting MapReduce with a distributed hash table (DHT) service and also discuss the use of a new graph processing framework called ASYMP based on asynchronous message-passing method. In particular, we will show that using ASyMP, one can improve the CPU usage, and achieve significantly improved running time.
In this dissertation we develop multiple algorithms for efficient parallel solution of structured nonlinear programming problems by decomposition of the linear augmented system solved at each iteration of a nonlinear ...
详细信息
In this dissertation we develop multiple algorithms for efficient parallel solution of structured nonlinear programming problems by decomposition of the linear augmented system solved at each iteration of a nonlinear interior-point approach. In particular, we address large-scale, block-structured problems with a significant number of complicating, or coupling variables. This structure arises in many important problem classes including multi-scenario optimization, parameter estimation, two-stage stochastic programming, optimal control and power network problems. The structure of these problems induces a block-angular structure in the augmented system, and parallel solution is possible using a Schur-complement decomposition. Three major variants are implemented: a serial, full-space interior-point method, serial and parallel versions of an explicit Schur-complement decomposition, and serial and parallel versions of an implicit PCG-based Schur-complement decomposition. All of these algorithms have been implemented in C++ in an extensible software framework for nonlinear optimization. The explicit Schur-complement decomposition is typically effective for problems with a few hundred coupling variables. We demonstrate the performance of our implementation on an important problem in optimal power grid operation, the contingency-constrained AC optimal power flow problem. In this dissertation, we present a rectangular IV formulation for the contingency-constrained ACOPF problem and demonstrate that the explicit Schur-complement decomposition can dramatically reduce solution times for a problem with a large number of contingency scenarios. Moreover, a comparison of the explicit Schur-complement decomposition implementation and the Progressive Hedging approach provided by Pyomo is provided, showing that the internal decomposition approach is computationally favorable to the external approach. However, the explicit Schur-complement decomposition approach is not appropriate for
Various sensors from airborne and satellite platforms are producing large volumes of remote sensing images for mapping, environmental monitoring, disaster management, military intelligence, and others. However, it is ...
详细信息
Various sensors from airborne and satellite platforms are producing large volumes of remote sensing images for mapping, environmental monitoring, disaster management, military intelligence, and others. However, it is challenging to efficiently storage, query and process such big data due to the data- and computing-intensive issues. In this paper, a Hadoop-based framework is proposed to manage and process the big remote sensing data in a distributed and parallel manner. Especially, remote sensing data can be directly fetched from other data platforms into the Hadoop Distributed File System ( HDFS). The Orfeo toolbox, a ready-to-use tool for large image processing, is integrated into MapReduce to provide affluent image processing operations. With the integration of HDFS, Orfeo toolbox and MapReduce, these remote sensing images can be directly processed in parallel in a scalable computing environment. The experiment results show that the proposed framework can efficiently manage and process such big remote sensing data.
parallel computing is a simultaneous use of multiple compute resources, for example, processors to solve complex computational problems. It has been used in high-end computing areas such as pattern recognition, medica...
详细信息
ISBN:
(纸本)9781479982523
parallel computing is a simultaneous use of multiple compute resources, for example, processors to solve complex computational problems. It has been used in high-end computing areas such as pattern recognition, medical diagnosis, national defense, and web search engine. This paper focuses on the implementation of pattern classification technique, Support Vector Machine (SVM) using vector processor approach. We have carried out a performance analysis to benchmark the sequential SVM program against the Graphics Processing Units (GPUs) optimization. The result shows that the parallelization of SVM training duration achieves a better performance than the sequential code speedups by 6.40.
The fast numerical solutions of Riesz fractional equation have computational cost of O(NMlogM), where M, N are the number of grid points and time steps. In this paper, we present a GPU-based fast solution for Riesz sp...
详细信息
ISBN:
(纸本)9781479999422
The fast numerical solutions of Riesz fractional equation have computational cost of O(NMlogM), where M, N are the number of grid points and time steps. In this paper, we present a GPU-based fast solution for Riesz space fractional equation. The GPU-based fast solution, which is based on the fast method using FFT and implemented with CUDA programming model, consists of parallel FFT, vector vector addition and vector-vector multiplication on GPU. The experimental results show that the GPU-based fast solution compares well with the exact solution. Compared to the known parallel fast solution on 8-core Intel E5-2670 CPU, the overall performance speedup on NVIDIA GTX650 GPU reaches 2.12 times and that on NVIDIA K20C GPU achieves 10.93 times.
As the number of cores increase in chip multiprocessor microarchitecture (CMP)or multicores, we often observe performance degradation due to complex memorybehavior on such systems. To mitigate such inefficiencies, we ...
详细信息
As the number of cores increase in chip multiprocessor microarchitecture (CMP)or multicores, we often observe performance degradation due to complex memorybehavior on such systems. To mitigate such inefficiencies, we develop schemes thatcan be used to characterize and improve the memory behavior of a multicore nodefor scientific computing applications that require high *** leverage the fact that such scientific computing applications often comprisecode blocks that are repeated, leading to certain periodic properties. We conjecturethat their periodic properties and their observable impacts on cache performancecan be characterized in sufficient detail by simple 'alpha + beta*sine'models. Additionally,starting from such a model of the observable reuse distances, we developa predictive cache miss model, followed by appropriate extensions for predictivecapability in the presence of *** consider the utilization of our reuse distance and cache miss models for acceleratingscientific workloads on multicore system. We use our cache miss modelto determine a set of preferred applications to be co-scheduled with a given applicationto minimize performance degradation from interference. Further, we proposea reuse distance reducing ordering that improves the performance of Laplacianmesh smoothing. We reorder mesh vertices based on the initial quality for eachnode and its neighboring nodes so that we can improve both temporal and spatiallocalities. The reordering results show that 38.75% of performance improvementof Laplacian mesh smoothing can be obtained by our reuse distance reducing orderingwhen running on a single core. 75x of speedup is obtained when scaling upto 32 cores.
The E-Biothon platform [1] is an experimental Cloud platform to help speed up and advance research in biology, health and environment. It is based on a Blue Gene/P system and a web portal that allow members of the bio...
详细信息
ISBN:
(纸本)9781467375627
The E-Biothon platform [1] is an experimental Cloud platform to help speed up and advance research in biology, health and environment. It is based on a Blue Gene/P system and a web portal that allow members of the bioinformatics community to easily launch their scientific applications. We describe in this paper the technical capacities of the platform, the different applications supported and finally a set of user experiences on the platform.
Map-Reduce is a popular data-parallel programming model for varied analysis of huge volumes of data. While a multicore and many CPU HPC infrastructure can be used to improve parallelism of map-reduce tasks, IO-bandwid...
详细信息
ISBN:
(纸本)9781450329286
Map-Reduce is a popular data-parallel programming model for varied analysis of huge volumes of data. While a multicore and many CPU HPC infrastructure can be used to improve parallelism of map-reduce tasks, IO-bandwidth limitations may make them ineffective. IO-intensive activities are essential in any MapReduce cluster. In HPC nodes, IO-intensive jobs get queued at the IO-resources while the CPU remain underutilized, resulting in a poor performance, high power consumption and thus, energy inefficiency. In this paper, we investigate which power management setting can be used to improve the energy efficiency of IO-intensive MapReduce jobs by performing a thorough empirical study. Our analysis indicates that a constant CPU frequency can reduce the energy consumption of an IO-intensive job, while improving its performance. Consequently, we build a set of regression models to predict the energy consumption of IO-intensive jobs at a CPU frequency for a given input data volume. We obtained same set of models, with different coefficients, for two different types of IO-intensive jobs, which substantiates the suitability of identified models. These models predict respective outcomes with 80% accuracy for 80% of the new test cases.
暂无评论