A mathematical model is developed and numerical modeling is performed to solve a scientific and industrial problem in the field of studying mass transfer processes in the "fracture set - matrix" system in a ...
详细信息
This paper explores the application of parallel algorithms and high-performance computing (HPC) in the processing and forecasting of large-scale water demand data. Building upon prior work, which identified the need f...
详细信息
This paper explores the application of parallel algorithms and high-performance computing (HPC) in the processing and forecasting of large-scale water demand data. Building upon prior work, which identified the need for more robust and scalable forecasting models, this study integrates parallel computing frameworks such as Apache Spark for distributed data processing, Message Passing Interface (MPI) for fine-grained parallel execution, and CUDA-enabled GPUs for deep learning acceleration. These advancements significantly improve model training and deployment speed, enabling near-real-time data processing. Apache Spark's in-memory computing and distributed data handling optimize data preprocessing and model execution, while MPI provides enhanced control over custom parallel algorithms, ensuring high performance in complex simulations. By leveraging these techniques, urban water utilities can implement scalable, efficient, and reliable forecasting solutions critical for sustainable water resource management in increasingly complex environments. Additionally, expanding these models to larger datasets and diverse regional contexts will be essential for validating their robustness and applicability in different urban settings. Addressing these challenges will help bridge the gap between theoretical advancements and practical implementation, ensuring that HPC-driven forecasting models provide actionable insights for real-world water management decision-making.
Given a set of vectors X = {x1, . . ., xn} ⊂ Rd, the Euclidean max-cut problem asks to partition the vectors into two parts so as to maximize the sum of Euclidean distances which cross the partition. We design new alg...
详细信息
The minimum cut and minimum length linear arrangement problems usually occur in solving wiring problems and have a lot in common with job sequencing questions. Both problems are NP-complete for general graphs and in P...
详细信息
The minimum cut and minimum length linear arrangement problems usually occur in solving wiring problems and have a lot in common with job sequencing questions. Both problems are NP-complete for general graphs and in P for trees. We present here two parallel algorithms for the CREW PRAM. The first solves the minimum length linear arrangement problem for trees and the second solves the minimum cut arrangement for trees. We prove that the first problem belongs to NC for trees, and the second problem is in NC for bounded degree trees. To the best of our knowledge, these are the first parallel algorithms for the minimum length and the minimum cut linear arrangement problems.
We present efficient (parallel) algorithms for two hierarchical clustering heuristics. We point out that these heuristics can also be applied to solving some algorithmic problems in graphs, including split decompositi...
详细信息
We present efficient (parallel) algorithms for two hierarchical clustering heuristics. We point out that these heuristics can also be applied to solving some algorithmic problems in graphs, including split decomposition. We show that efficient parallel split decomposition induces an efficient parallel parity graph recognition algorithm. This is a consequence of the result of S. Cicerone and D. Di Stefano [7] that parity graphs are exactly those graphs that can be split decomposed into cliques and bipartite graphs, (C) 2000 Academic Press.
A bus system whose configuration can be dynamically changed is called reconfigurable bus system. In this paper, parallel algorithms for generating combinations, subsets, and binary trees on Linear processor array with...
详细信息
A bus system whose configuration can be dynamically changed is called reconfigurable bus system. In this paper, parallel algorithms for generating combinations, subsets, and binary trees on Linear processor array with reconfigurable bus systems (PARBS) are presented.
We present new local-memory multiprocessor algorithms for solving sparse triangular systems of equations that arise in the context of Cholesky factorization. Unlike in the existing algorithms, we use the notion of the...
详细信息
We present new local-memory multiprocessor algorithms for solving sparse triangular systems of equations that arise in the context of Cholesky factorization. Unlike in the existing algorithms, we use the notion of the elimination tree and achieve significant improvement in the performance of both the forward and backward substitution phases. Our algorithms also incorporate the generalization of an important technique of Li and Coleman that gave rise to the best performance for dense triangular system solution.
A couple of approximate inversion techniques are presented which provide a parallel enhancement to several iterative methods for solving linear systems arising from the discretization of boundary value problems. In pa...
详细信息
A couple of approximate inversion techniques are presented which provide a parallel enhancement to several iterative methods for solving linear systems arising from the discretization of boundary value problems. In particular, the Jacobi, Gauss‐Seidel, and successive overrelaxation methods can be improved substantially in a parallel environment by the extensions considered. A special case convergence proof is presented. The use of our approximate inverses with the preconditioned conjugate gradient method is examined and comparisons are made with some recently proposed algorithms in this area that also employ approximate inverses. The methods considered are compared under sequential and parallel hardware assumptions.
We present work-optimal PRAM algorithms for Burrows-Wheeler compression and decompression of strings over a constant alphabet. For a string of length n, the depth of the compression algorithm is O(log(2)n), and the de...
详细信息
We present work-optimal PRAM algorithms for Burrows-Wheeler compression and decompression of strings over a constant alphabet. For a string of length n, the depth of the compression algorithm is O(log(2)n), and the depth of the corresponding decompression algorithm is O(logn). These appear to be the first polylogarithmic-time work-optimal parallel algorithms for any standard lossless compression scheme. The algorithms for the individual stages of compression and decompression may also be of independent interest: (1) a novel O(logn)-time, O(n)-work PRAM algorithm for Huffman decoding;(2) original insights into the stages of the BW compression and decompression problems, bringing out parallelism that was not readily apparent, allowing them to be mapped to elementary parallel routines that have O(logn)-time, O(n)-work solutions, such as: (i) prefix-sums problems with an appropriately-defined associative binary operator for several stages, and (ii) list ranking for the final stage of decompression. Follow-up empirical work suggests potential for considerable practical speedups on a PRAM-driven many-core architecture, against a backdrop of negative contemporary results on common commercial platforms. (C) 2013 Elsevier B.V. All rights reserved.
暂无评论