We present new parallel algorithms for testing pattern involvement for all length 4 permutations. Our algorithmshave the complexity of O(log n) time with n/log nprocessors on the CREW PRAM model, O(logloglog n) timewi...
详细信息
We present new parallel algorithms for testing pattern involvement for all length 4 permutations. Our algorithmshave the complexity of O(log n) time with n/log nprocessors on the CREW PRAM model, O(logloglog n) timewith n/logloglog n processors or constant time and nlog3 nprocessors on a CRCW PRAM model. parallel algorithms werenot designed before for some of these patterns and for otherpatters the previous best algorithms require O(log n) time and n processors on the CREW PRAM model.
Multi-agent systems represent a powerful tool to model several interesting real-world problems. Unfortunately, the limited scalability of many state-of-the-art algorithms hinders their applicability in practical situa...
详细信息
ISBN:
(纸本)9781634391313
Multi-agent systems represent a powerful tool to model several interesting real-world problems. Unfortunately, the limited scalability of many state-of-the-art algorithms hinders their applicability in practical situations: in fact, complex dynamics and interactions among a large number of agents often make the search for an optimal solution an unfeasible task. Against this background, the study and design of new highly parallel computational models could greatly improve solution techniques in the above mentioned fields. In particular, I will introduce two parallel approaches to the coalition formation problem in the context of multi-agent systems, detailing how their performances can benefit from the use of modern parallel architectures.
In this article, some local and parallel finite element methods are proposed and investigated for the time-dependent convection-diffusion problem. With backward Euler scheme for the temporal discretization, the basic ...
详细信息
In this article, some local and parallel finite element methods are proposed and investigated for the time-dependent convection-diffusion problem. With backward Euler scheme for the temporal discretization, the basic idea of the present methods is that for a solution to the considered equations, low frequency components can be approximated well by a relatively coarse grid and high frequency components can be computed on a fine grid by some local and parallel procedure at each time step. The partition of unity is used to collect the local high frequency components to assemble a global continuous approximation. Theoretical results are obtained and numerical tests are reported to support the theoretical findings.
Contour tracing is an important pre-processing step in many image-processing applications such as feature recognition, biomedical imaging, security and surveillance. As single-processor architectures reach their perfo...
详细信息
Contour tracing is an important pre-processing step in many image-processing applications such as feature recognition, biomedical imaging, security and surveillance. As single-processor architectures reach their performance limits, parallel processing architectures offer energy-efficient and high-performance solutions for real-time applications. parallel processing architectures are thus used for several real-time image processing applications. Among the several interconnection schemes available, Cayley graph-based interconnections offer easy routing and symmetric implementation capabilities. For parallel processing systems with a Cayley graph-based interconnection scheme, torus, we developed three accelerated algorithms corresponding to three existing families of contour tracing algorithms. We simulated these algorithms on a parallel processing framework to quantify the normalized speed-up possible in any torus-connected parallel processing system. We also compared our best-performing algorithm with the existing parallel processing implementations for Nvidia GPUs. We observed a speed-up of up to 468 times using our algorithms on a parallel processing architecture in comparison to the corresponding algorithm on a single processor architecture. We evaluated a speedup of 194 (and 47) compared to the existing parallel processing contour tracing implementation on Tesla K40c (and Quadro RTX 5000 GPU hardware, respectively). We observe that for torus-connected parallel processing architectures used for image processing, our algorithms can speed up contour tracing without any hardware modification.
Average linkage Hierarchical Agglomerative Clustering (HAC) is an extensively studied and applied method for hierarchical clustering. Recent applications to massive datasets have driven significant interest in near-li...
详细信息
One of the types of renewable energy sources is geothermal energy, obtained from the bowels of the Earth, which use the heat of the water pumped from underground reservoirs. As a rule, the produced water is highly sal...
详细信息
One of the types of renewable energy sources is geothermal energy, obtained from the bowels of the Earth, which use the heat of the water pumped from underground reservoirs. As a rule, the produced water is highly saline and contains chemical compounds that can be hazardous to the environment. In this regard, the actual problems are mathematical models that describe the process of propagation of temperature fields in a geothermal reservoir. The paper presents one of such mathematical models of an open geothermal system, consisting of production and injection wells. The novelty of the proposed model lies in considering the most significant technical features of wells and specific thermophysical parameters of the geothermal reservoir. On the basis of an implicit finite-difference scheme, a numerical algorithm for modeling non-stationary heat transfer processes in a three-dimensional area of an aquifer has been developed. Justification for applicability of the numerical method is presented. Since the complete numerical simulation requires significant computer time, a parallel version of the computational algorithm and a program for multicore processors using the OpenMP technology were also developed. The results of a series of numerical experiments and evaluation of the efficiency of the parallel algorithm are presented. Article Highlights A mathematical model is proposed to study the influence of the location of wells in a geothermal system to estimate the effectiveness. parallel numerical algorithm for solving the systems of difference equations for heat and mass transfer is constructed and implemented on multicore CPU. Investigation of efficiency and speedup of the parallel algorithm is performed.
Numerous research efforts have been proposed for efficient processing of range queries in high-dimensional space by either redesigning R-tree access structure for exploring massive parallelism on single GPU or explori...
详细信息
Numerous research efforts have been proposed for efficient processing of range queries in high-dimensional space by either redesigning R-tree access structure for exploring massive parallelism on single GPU or exploring distributed framework of R-tree. However, none of the existing efforts explores the integration of the parallelization of the R-tree on a single GPU with a distributed framework for the R-tree. The problem of designing an efficient multi-GPU indexing method, which can effectively combine the parallelism maximization with distributed processing of the R-tree, remains an open challenge. In this article, we present a novel multi-GPU efficient parallel and distributed indexing method, called LBPG-tree. The rationale of the LBPG-tree is to combine the advantages of an instruction pipeline in CPU with the massive parallel processing potential of multiple GPUs by introducing two new optimization strategies: First, we exploit the GPU L2 cache for accelerating both index search and index node access on GPUs. Second, we further improve utilization of L2 cache on GPUs by compacting and sorting candidate nodes called Compact-and-Sort. Our experimental results show that the LBPG-tree outperforms G-tree, the previous representative GPU index method and effectively support multiple GPUs for providing efficient high dimensional range query service.
Community detection (or clustering) in large-scale graphs is an important problem in graph mining. Communities reveal interesting organizational and functional characteristics of a network. Louvain algorithm is an eff...
详细信息
Community detection (or clustering) in large-scale graphs is an important problem in graph mining. Communities reveal interesting organizational and functional characteristics of a network. Louvain algorithm is an efficient sequential algorithm for community detection. However, such sequential algorithms fail to scale for emerging large-scale data. Scalable parallel algorithms are necessary to process large graph datasets. In this work, we show a comparative analysis of our different parallel implementations of Louvain algorithm. We design parallel algorithms for Louvain method in shared memory and distributed memory settings. Developing distributed memory parallel algorithms is challenging because of inter-process communication and load balancing issues. We incorporate dynamic load balancing in our final algorithm DPLAL (Distributed parallel Louvain Algorithm with Load-balancing). DPLAL overcomes the performance bottleneck of the previous algorithms and shows around 12-fold speedup scaling to a larger number of processors. We also compare the performance of our algorithm with some other prominent algorithms in the literature and get better or comparable performance . We identify the challenges in developing distributed memory algorithm and provide an optimized solution DPLAL showing performance analysis of the algorithm on large-scale real-world networks from different domains.
Mass spectrometry (MS) based omics data analysis require significant time and resources. To date, few parallel algorithms have been proposed for deducing peptides from mass spectrometry-based data. However, these para...
详细信息
Mass spectrometry (MS) based omics data analysis require significant time and resources. To date, few parallel algorithms have been proposed for deducing peptides from mass spectrometry-based data. However, these parallel algorithms were designed, and developed when the amount of data that needed to be processed was smaller in scale. In this paper, we prove that the communication bound that is reached by the existing parallel algorithms is Omega(mn + 2r q/p), where m and n are the dimensions of the theoretical database matrix, q and r are dimensions of spectra, and pis the number of processors. We further prove that communication-optimal strategy with fast-memory root M = mn + 2qr/p can achieve Omega(2mnq/p) but is not achieved by any existing parallel proteomics algorithms till date. To validate our claim, we performed a meta-analysis of published parallel algorithms, and their performance results. We show that sub-optimal speedups with increasing number of processors is a direct consequence of not achieving the communication lower-bounds. We further validate our claim by performing experiments which demonstrate the communication bounds that are proved in this paper. Consequently, we assert that next-generation of provable, and demonstrated superior parallel algorithms are urgently needed for MS based large systems-biology studies especially for meta-proteomics, proteogenomic, microbiome, and proteomics for non-model organisms. Our hope is that this paper will excite the parallel computing community to further investigate parallel algorithms for highly influential MS based omics problems. (C) 2021 Elsevier Inc. All rights reserved.
Large-scale graphs with billions and trillions of vertices and edges require efficient parallel algorithms for common graph problems, one of which is single-source shortest paths (SSSP). Bulk-synchronous parallel algo...
详细信息
暂无评论