The spatial distribution of parameters that characterize the subsurface is never known to any reasonable level of accuracy required to solve the governing PDEs of multiphase flow or species transport through porous me...
详细信息
The spatial distribution of parameters that characterize the subsurface is never known to any reasonable level of accuracy required to solve the governing PDEs of multiphase flow or species transport through porous media. This paper presents a numerically cheap, yet efficient, accurate and parallel framework to estimate reservoir parameters, for example, medium permeability, using sensor information from measurements of the solution variables such as phase pressures, phase concentrations, fluxes, and seismic and well log data. Numerical results are presented to demonstrate the method.
We present practical parallel algorithms using prefix computations for various problems that arise in pairwise comparison of biological sequences. We consider both constant and affine gap penalty functions, full-seque...
详细信息
We present practical parallel algorithms using prefix computations for various problems that arise in pairwise comparison of biological sequences. We consider both constant and affine gap penalty functions, full-sequence and subsequence matching, and space-saving algorithms. The best known sequential algorithms solve these problems in O(mn) time and O(m+n) space, where m and n are the lengths of the two sequences. All the algorithms presented in this paper are time optimal with respect to the best known sequential algorithms and can use O (n/log n) processors where n is the length of the larger sequence. While optimal parallel algorithms for many of these problems are known, we use a simple framework and demonstrate how these problems can be solved systematically using repeated parallel prefix operations. We also present a space-saving algorithm that uses O (m+n/p) space and runs in optimal time where p is the number of the processors used.
In this paper, we propose a parallel convolution algorithm for estimating the partial derivatives of 2D and 3D images on distributed-memory MIMD architectures. Exploiting the separable characteristics of the Gaussian ...
详细信息
In this paper, we propose a parallel convolution algorithm for estimating the partial derivatives of 2D and 3D images on distributed-memory MIMD architectures. Exploiting the separable characteristics of the Gaussian filter, the proposed algorithm consists of multiple phases such that each phase corresponds to a separated filter. Furthermore, it exploits both the task and data parallelism, and reduces communication through data redistribution. We have implemented the proposed algorithm on the Intel Paragon and obtained a substantial speedup using more than 100 processors. The performance of the algorithm is also evaluated analytically. The analytical results confirming with the experimental results indicate that the proposed algorithm scales very well with the problem size and number of processors. We have also applied our algorithm to the design and implementation of an efficient parallel scheme for the 3D surface tracking process. Although our focus is on 3D image data, the algorithm is also applicable to 2D image data, and can be useful for a myriad of important applications including medical imaging, magnetic resonance imaging, ultrasonic imagery, scientific visualization, and image sequence analysis.
Model checking has long been used as a means of verification of formal specifications. This is a verification technique of dynamic systems that explores all possible states of the system. It determines whether the giv...
详细信息
Model checking has long been used as a means of verification of formal specifications. This is a verification technique of dynamic systems that explores all possible states of the system. It determines whether the given system satisfies its specification. This technique suffers from the state explosion problem when traversing all possible states of systems. parallel and/or distributed approaches are used to cope with the state space explosion problem. In this article, we propose a synchronized parallel algorithm of exploration based on a fixed number of threads. We present many experiments for a comparison between our parallel approach and the algorithm proposed for a parallel exploration in SPIN. We show by an experimental study that our parallel approach gives encouraging results.
With the rapid increase in size and number of jobs that are being processed in the MapReduce framework, efficiently scheduling jobs under this framework is becoming increasingly important. We consider the problem of m...
详细信息
ISBN:
(纸本)9781467359467
With the rapid increase in size and number of jobs that are being processed in the MapReduce framework, efficiently scheduling jobs under this framework is becoming increasingly important. We consider the problem of minimizing the total flow-time of a sequence of jobs in the MapReduce framework, where the jobs arrive over time and need to be processed through both Map and Reduce procedures before leaving the system. We show that for this problem for non-preemptive tasks, no on-line algorithm can achieve a constant competitive ratio (defined as the ratio between the completion time of the online algorithm to the completion time of the optimal non-causal off-line algorithm). We then construct a slightly weaker metric of performance called the efficiency ratio. An online algorithm is said to achieve an efficiency ratio of. when the flow-time incurred by that scheduler divided by the minimum flow-time achieved over all possible schedulers is almost surely less than or equal to.. Under some weak assumptions, we then show a surprising property that, for the flow-time problem, any work-conserving scheduler has a constant efficiency ratio in both preemptive and non-preemptive scenarios. More importantly, we are able to develop an online scheduler with a very small efficiency ratio (2), and through simulations we show that it outperforms the state-of-the-art schedulers.
Feasibility analysis of a hard-real-time system refers to the process of determining off-line whether the specified system will meet all deadlines at runtime. For many important land interesting) task models and sched...
详细信息
ISBN:
(纸本)0769509304
Feasibility analysis of a hard-real-time system refers to the process of determining off-line whether the specified system will meet all deadlines at runtime. For many important land interesting) task models and scheduling algorithms, feasibility analysis is provably computationally very expensive. A framework is established for speeding up the feasibility analysis of uniprocessor real-time systems by implementing these algorithms on parallel machines. The viability of this framework is validated by developing a parallel algorithm for the feasibility-analysis of systems of asynchronous periodic tasks that are to be scheduled using the preemptive earliest deadline first scheduling algorithm, and by implementing and testing the performance of this parallel algorithm.
K-Nearest Neighbor (k-NN) search is one of the most commonly used approaches for similarity search. It finds extensive applications in machine learning and data mining. This era of big data warrants efficiently scalin...
详细信息
ISBN:
(纸本)9781728166773
K-Nearest Neighbor (k-NN) search is one of the most commonly used approaches for similarity search. It finds extensive applications in machine learning and data mining. This era of big data warrants efficiently scaling k-NN search algorithms for billion-scale datasets with high dimensionality. In this paper, we propose a solution towards this end where we use vantage point trees for partitioning the dataset across multiple processes and exploit an existing graph-based sequential approximate k-NN search algorithm called HNSW (Hierarchical Navigable Small World) for searching locally within a process. Our hybrid MPI-OpenMP solution employs techniques including exploiting MPI one-sided communication for reducing communication times and partition replication for better load balancing across processes. We demonstrate computation of k-NN for 10,000 queries in the order of seconds using our approach on similar to 8000 cores on a dataset with billion points in an 128-dimensional space. We also show 10X speedup over a completely k-d tree-based solution for the same dataset, thus demonstrating better suitability of our solution for high dimensional datasets. Our solution shows almost linear strong scaling.
Solving a target problem by using a single algorithm or writing portable programs that perform well is not always efficient on any parallel environment due to the increasing diversity of existing computational support...
详细信息
ISBN:
(纸本)9728865694
Solving a target problem by using a single algorithm or writing portable programs that perform well is not always efficient on any parallel environment due to the increasing diversity of existing computational supports where new characteristics are influencing the execution of parallel applications. The inherent heterogeneity and the diversity of networks of such environments represent a great challenge to efficiently implement parallel applications for high performance computing. Our objective within this work is to propose a generic framework based on adaptive techniques for solving a class of numerical problems on cluster-based heterogeneous hierarchical platforms. Toward this goal, we refer to adaptive approaches to better adapt a given application to a target parallel system. We apply this methodology on a basic numerical problem, namely solving the matrix multiplication problem, while determining an adaptive execution scheme minimizing the overall execution time depending on the problem and architecture parameters.
In this paper we present an evaluation of selected parallel strategies for Simulated Annealing and Simulated Evolution, identifying the impact of various issues on the effectiveness of parallelization. Issues under co...
详细信息
ISBN:
(纸本)1595930108
In this paper we present an evaluation of selected parallel strategies for Simulated Annealing and Simulated Evolution, identifying the impact of various issues on the effectiveness of parallelization. Issues under consideration are the characteristics of these algorithms, the problem instance, and the implementation environment. Observations are presented regarding the impact of parallel strategies on runtime and achievable solution quality. Effective parallel algorithm design choices are identified, along with pitfalls to avoid. We further attempt to generalize our assessments to other heuristics.
The OpenCL framework supports SIMD capabilities available in general purpose processors, which have been used to prospect performance improvements in several applications. In this paper we propose efficient algorithms...
详细信息
ISBN:
(纸本)9781424489350
The OpenCL framework supports SIMD capabilities available in general purpose processors, which have been used to prospect performance improvements in several applications. In this paper we propose efficient algorithms for linear image processing by exploring the provided SIMD extensions on AMD and Intel processors. The efficiency of the SIMD based computation inferred by the OpenCL compiler is also experimentally evaluated. Starting from a reference algorithm and implementation, several optimizations are proposed that lead to increasingly higher performance figures. Experimental results suggest an average 4-fold performance improvement when the vectorization of the operations is tuned. Furthermore, more than 10 times speedup is suggested by applying efficient data organization. The experimental work and achieved results also suggest that the SIMD based OpenCL implementations provide an average of 1.8 times lower performance than equivalent implementations that directly employ the SIMD intrinsics supported by the Intel Compiler. Moreover, it is shown that real time image processing is achieved when SIMD instructions are used.
暂无评论