We use the reconfiguration framework to analyze problems that involve the rearrangement of items among groups. In various applications, a group of items could correspond to the files or jobs assigned to a particular m...
详细信息
This paper implements the Fast Fourier Transform (FFT) algorithm for signal data processing using Open Computing Language (OpenCL). A parallel algorithm model suitable for staged FFT across different GPUs is proposed,...
详细信息
ISBN:
(数字)9798350363760
ISBN:
(纸本)9798350363777
This paper implements the Fast Fourier Transform (FFT) algorithm for signal data processing using Open Computing Language (OpenCL). A parallel algorithm model suitable for staged FFT across different GPUs is proposed, including methods for execution and memory model settings. The characteristics of the OpenCL model and specific data structures are applied to optimize the logical structure of the parallel algorithm. Finally, the proposed method is applied and implemented in the Synthetic Aperture Radar(SAR) imaging RD algorithm. Experimental data confirm that the computational speed of the parallel algorithm in this paper is significantly higher than that of a serial CPU-based algorithm. Compared to the fastest FFT algorithm FFTW on the current CPU platform, it achieves substantially better performance. Additionally, compared to the CUDA-based CUFFT parallel algorithm, the performance of the algorithm in this paper is notably improved. In the SAR imaging RD algorithm, based on classical airborne SAR imaging parameters, it shows a significant improvement over FFTW.
This paper describes the adaptation to a distributed computational setting of a well-scaling parallel algorithm for computing Morse-Smale segmentations based on path compression. Additionally, we extend the algorithm ...
详细信息
ISBN:
(数字)9798331516925
ISBN:
(纸本)9798331516932
This paper describes the adaptation to a distributed computational setting of a well-scaling parallel algorithm for computing Morse-Smale segmentations based on path compression. Additionally, we extend the algorithm to efficiently compute connected components in distributed structured and unstructured grids, based either on the connectivity of the underlying mesh or a feature mask. Our implementation is seamlessly integrated with the distributed extension of the Topology ToolKit (TTK), ensuring robust performance and scalability. To demonstrate the practicality and efficiency of our algorithms, we conducted a series of scaling experiments on large-scale datasets, with sizes of up to 4096
3
vertices on up to 64 nodes and 768 cores.
Correlation Clustering is a classic clustering objective arising in numerous machine learning and data mining applications. Given a graph G = (V, E), the goal is to partition the vertex set into clusters so as to mini...
详细信息
Welcome to the Sixth IEEE Workshop on parallel and Distributed Processing for Computational Social Systems (ParSocial 2022). This year the workshop highlights novel algorithms and models that leverage parallel computi...
详细信息
ISBN:
(数字)9781665497473
ISBN:
(纸本)9781665497480
Welcome to the Sixth IEEE Workshop on parallel and Distributed Processing for Computational Social Systems (ParSocial 2022). This year the workshop highlights novel algorithms and models that leverage parallel computing with applications in social network and social media analysis. The first set of papers focus on the key individual identification problem in social network analysis. The paper by Vandromme et al entitled “Efficient parallel PageRank Algorithm for Network Analysis” proposes a more efficient parallel algorithm for PageRank that has been shown to improve the time complexity by a factor of two. In a similar vein, the paper by Sahu et al entitled “Dynamic Batch parallel algorithms for Updating PageRank” proposes two parallel algorithms for recomputing PageRank of nodes in a dynamic social network that can scale across various architectures. A related research problem is identifying opinion leaders that can improve information dissemination within communities. The paper entitled “Effect of Community-based Opinion Leaders on Guideline Dissemination in Large-Scale Physician Networks” by Murugappan et al, focuses on the problem of the dissemination of medical guidelines. The authors propose a culturally infused agent based model to analyze the effectiveness of various opinion leader selection strategies and the tradeoffs between the reach and rate of spread of medical guideline information. The next set of papers focus on social media analysis. Systems for large scale ingestion of social media data sets can support a wide range of research problems in computational social systems. A step in this direction is taken by authors Huber et al, who have proposed a parallel system for large scale processing of Reddit data in their paper entitled “A Streaming System for Large-scale Mining of Reddit Data”. On the other hand, authors Abeysinghe et al in their short research paper entitled “Unsupervised User Stance Detection on Tweets Against Web Articles Using Sentence
Core decomposition is a critical metric for evaluating the vertex importance and analyzing graph structure. Given a graph $G$ , a k-core is the largest subgraph of $G$ where each vertex has at least $k$ neighbors...
详细信息
ISBN:
(数字)9798350317152
ISBN:
(纸本)9798350317169
Core decomposition is a critical metric for evaluating the vertex importance and analyzing graph structure. Given a graph
$G$
, a k-core is the largest subgraph of
$G$
where each vertex has at least
$k$
neighbors. Most existing works mainly focus on homogeneous graphs in which edges are of the same type and cannot be applied to heterogeneous information networks (HINs) directly. However, most real-world networks are HINs which consist of different vertex types and edge types. To reveal the cohesive subgraphs with hierarchical relations on HINs, we adopt the well-known
$(k,\mathcal{P})$
-core model to compute coreness over HINs, where
$\mathcal{P}$
is a meta-path, i.e., a sequence of relations defined between different types of vertices. Hence, the
$(k,\mathcal{P})$
-core is a subgraph where each vertex is connected to at least
$k$
other vertices via instances of
$\mathcal{P}$
. Based on two kinds of sparse matrix products, we propose two kinds of algebraic core decomposition algorithms, which are suitable for general HINs and locally dense HINs, respectively. We have performed extensive empirical evaluations of our algorithms on six large real-world HINs. The results show that the proposed solutions are highly efficient for core decomposition and achieve up to
$258.84\times$
speedup than the state-of-the-art parallel algorithm on 20 cores. Moreover, other HIN tasks that involve homogeneous graph construction can also benefit from our algorithms.
Exploring the structural and functional properties of real-world large graphs, such as detecting community structure in social networks and assessing the connectivity of different brain regions in brain graphs, is an ...
详细信息
ISBN:
(数字)9798331529048
ISBN:
(纸本)9798331529055
Exploring the structural and functional properties of real-world large graphs, such as detecting community structure in social networks and assessing the connectivity of different brain regions in brain graphs, is an increasingly prominent research area. Quantitative graph theory has been developed to quantify both structural and functional aspects of graphs. Typically, nodal and global graph measures are employed to estimate the information content of a graph. There is currently a pronounced interest in parallel graph processing, driven by the imperative to quickly analyze the large graphs available today. Modern desktop and laptop computers are equipped with multicore processors featuring shared memory architecture. The utilization of the OpenMP API offers numerous advantages for shared memory systems. In this study, parallel algorithms for four global graph measures have been designed and implemented on multicore shared memory systems using both task-centric and data-centric parallel techniques. We assess performance across varying numbers of cores and for different sizes of random graphs, as well as numerous real brain graphs, comparing the results against serial algorithms within the same hardware environment. Experimental results demonstrate a significant enhancement in the parallel algorithms across multiple cores, effectively meeting the demand for accelerated computation of graph measures.
Two-electron repulsion integrals (ERIs) are among the most foundational quantities for numerically solving the Schrödinger equation, accounting for the Coulomb interactions between electrons in a molecule. Calcul...
详细信息
ISBN:
(数字)9798331505349
ISBN:
(纸本)9798331505356
Two-electron repulsion integrals (ERIs) are among the most foundational quantities for numerically solving the Schrödinger equation, accounting for the Coulomb interactions between electrons in a molecule. Calculating ERIs is a computationally intensive task of determining integral values for every combination of four atomic orbitals, which dominates the execution time of fundamental frameworks in quantum chemistry such as the Hartree-Fock method. It is known that the numerous ERI calculations can be significantly reduced using the Schwarz inequality. However, the Schwarz screening requires evaluating the upper bound for every integral value, leading to the time complexity comparable to the entire ERI calculations themselves. This paper proposes a dynamic screening algorithm that minimizes the number of the upper bound evaluations using the asynchronous parallelism of GPUs. Our parallel algorithm performed by dynamically scheduled CUDA blocks can discard most ERI calculations without even evaluating their upper bounds. Furthermore, by screening ERIs at the level of the finest integral units, primitive integrals, we reduce more integral calculations over coarse-grained screening. Experimental results for various molecules using an NVIDIA A100 GPU demonstrate that the proposed dynamic screening achieves up to an 18.0-fold speedup compared to the non-screening ERI computation while keeping the energy error below 1.0 × 10 −7 hartree.
The increasing scale of power systems necessitates corresponding enhancements in the speed and accuracy of power flow calculations. Leveraging the study of Graphics Processing Unit (GPU) architecture, parallel algorit...
详细信息
ISBN:
(数字)9798331518066
ISBN:
(纸本)9798331518073
The increasing scale of power systems necessitates corresponding enhancements in the speed and accuracy of power flow calculations. Leveraging the study of Graphics Processing Unit (GPU) architecture, parallel algorithms, and power flow calculation methods, we exploit the high parallelism offered by GPU. In this paper, we optimize the Newton method under traditional polar coordinates to develop parallel programs and utilize Compute Unified Device Architecture (CUDA) for data parallel computing. Simulation results demonstrate that as the number of nodes in the test system increases, our proposed program exhibits more pronounced advantages, effectively improving power flow calculation speed and providing a promising approach for further research in this field.
We develop a distributed-memory parallel algorithm for performing batch updates on streaming graphs, where vertices and edges are continuously added or removed. Our algorithm leverages distributed sparse matrices as t...
详细信息
ISBN:
(数字)9798350355543
ISBN:
(纸本)9798350355550
We develop a distributed-memory parallel algorithm for performing batch updates on streaming graphs, where vertices and edges are continuously added or removed. Our algorithm leverages distributed sparse matrices as the core data structures, utilizing equivalent sparse matrix operations to execute graph updates. By reducing unnecessary communication among processes and employing shared-memory parallelism, we accelerate updates of distributed graphs. Additionally, we maintain a balanced load in the output matrix by permuting the resultant matrix during the update process. We demonstrate that our streaming update algorithm is at least 25 times faster than alternative linear-algebraic methods and scales linearly up to 4,096 cores (32 nodes) on a Cray EX supercomputer.
暂无评论