Influence maximization (IM) is an interesting study in the domain of social and complex networks which has gained widespread importance in recent years due to the growth in viral marketing and targeted advertisements ...
详细信息
Influence maximization (IM) is an interesting study in the domain of social and complex networks which has gained widespread importance in recent years due to the growth in viral marketing and targeted advertisements through the use of online social networks. There have been several enriching contributions which focus on efficient algorithms for IM;a fundamental question remains as to how can the IM computations be done efficiently with better performance that can aid evaluations in real time. In this work, we present a novel mechanism of IM computation, using two case studies HBUTA and HSSA, that utilizes all the available computing hardware present in a current-generation high-end computing system. We present techniques for efficiently partitioning work, computations of IM in a distributed manner, consolidation, and addressing challenges towards ensuring maximum coverage of the input graph. We see that with our initial implementations, we get a maximum of 29.05% and 35.67% improvement in performance due to our HBUTA and HSSA algorithms, respectively, when used on a graph consisting of 1.6 million vertices and 30 million edges.
Graph algorithms play an important role in several fields of sciences and engineering. Prominent among them are the All-Pairs-Shortest-Paths (APSP) and related problems. Indeed there are several efficient implementati...
详细信息
ISBN:
(纸本)9780769561493
Graph algorithms play an important role in several fields of sciences and engineering. Prominent among them are the All-Pairs-Shortest-Paths (APSP) and related problems. Indeed there are several efficient implementations for such problems on a variety of modern multi-and many-core architectures. It can be noticed that for several graph problems, parallelism offers only a limited success as current parallel architectures have severe short-comings when deployed for most graph algorithms. At the same time, some of these graphs exhibit clear structural properties due to their sparsity. This calls for particular solution strategies aimed at scalable processing of large, sparse graphs on modern parallel architectures. In this paper, we study the applicability of an ear decomposition of graphs to problems such as all-pairs-shortest-paths and minimum cost cycle basis. Through experimentation, we show that the resulting solutions are scalable in terms of both memory usage and also their speedup over best known current implementations. We believe that our techniques have the potential to be relevant for designing scalable solutions for other computations on large sparse graphs.
Multiplying two sparse matrices, denoted spmm, is a fundamental operation in linear algebra with several applications. Hence, efficient and scalable implementation of spmm has been a topic of immense research. Recent ...
详细信息
ISBN:
(纸本)9781467376846
Multiplying two sparse matrices, denoted spmm, is a fundamental operation in linear algebra with several applications. Hence, efficient and scalable implementation of spmm has been a topic of immense research. Recent efforts are aimed at implementations on GPUs, multicore architectures, FPGAs, and such emerging computational platforms. Owing to the highly irregular nature of spmm, it is observed that GPUs and CPUs can offer comparable performance (Lee et al. [12]). In this paper, we study CPU+ GPU heterogeneous algorithms for spmm where the matrices exhibit a scale-free nature. Focusing on such matrices, we propose an algorithm that multiplies two sparse matrices exhibiting scale-free nature on a CPU+ GPU heterogeneous platform. Our experiments on a wide variety of real-world matrices from standard datasets show an average of 25% improvement over the best possible algorithm on a CPU+GPU heterogeneous platform. We show that our approach is both architecture-aware, and workload-aware.
In recent decades, sequencing techniques have undergone rapid development, leading to a significant reduction in costs. Concurrently, the popularity of personalized medicine and portable devices has surged. These adva...
详细信息
ISBN:
(纸本)9789819756919;9789819756926
In recent decades, sequencing techniques have undergone rapid development, leading to a significant reduction in costs. Concurrently, the popularity of personalized medicine and portable devices has surged. These advancements have precipitated an exponential increase in genomic data volume, posing formidable challenges in terms of storage and transmission. Compression emerges as a viable solution to address these challenges. To date, the bulk of research within this domain has concentrated on refining compression algorithms. However, investigations into hardware acceleration of these algorithms have been scant, primarily focusing on Graphics Processing Units (GPUs) or Field-Programmable Gate Arrays (FPGAs). Given the prevalence of multiprocessor system-on-chips (MPSoCs) in portable devices, there is a pressing need to design algorithms that can leverage computing resources effectively, enhancing both performance and energy efficiency, which are crucial for portable devices. To address these challenges, our study introduces a novel heterogeneous compression algorithm that significantly advances the performance and energy efficiency of genomic data processing on mobile devices. By implementing a dual-stage pipeline approach for Gzip and leveraging both OpenCL and OpenMP, our framework optimally utilizes the disparate computational resources of MPSoCs. The empirical results underscore a robust performance enhancement, achieving an average increase of 15.7% in data processing speed to conventional CPU-based methods. This substantial leap in efficiency alleviates the computational burdens typically associated with mobile genomic applications.
Differential evolution (DE) algorithms compose an efficient type of evolutionary algorithm (EA) for the global optimization domain. Although it is well known that the population structure has a major influence on the ...
详细信息
Differential evolution (DE) algorithms compose an efficient type of evolutionary algorithm (EA) for the global optimization domain. Although it is well known that the population structure has a major influence on the behavior of EAs, there are few works studying its effect in DE algorithms. In this paper, we propose and analyze several DE variants using different panmictic and decentralized population schemes. As it happens for other EAs, we demonstrate that the population scheme has a marked influence on the behavior of DE algorithms too. Additionally, a new operator for generating the mutant vector is proposed and compared versus a classical one on all the proposed population models. After that, a new heterogeneous decentralized DE algorithm combining the two studied operators in the best performing studied population structure has been designed and evaluated. In total, 13 new DE algorithms are presented and evaluated in this paper. Summarizing our results, all the studied algorithms are highly competitive compared to the state-of-the-art DE algorithms taken from the literature for most considered problems, and the best ones implement a decentralized population. With respect to the population structure, the proposed decentralized versions clearly provide a better performance compared to the panmictic ones. The new mutation operator demonstrates a faster convergence on most of the studied problems versus a classical operator taken from the DE literature. Finally, the new heterogeneous decentralized DE is shown to improve the previously obtained results, and outperform the compared state-of-the-art DEs.
The Fast Multipole Method (FMM) allows O(N) evaluation to any arbitrary precision of N-body interactions that arises in many scientific contexts. These methods have been parallelized, with a recent set of papers attem...
详细信息
ISBN:
(纸本)9780769547497
The Fast Multipole Method (FMM) allows O(N) evaluation to any arbitrary precision of N-body interactions that arises in many scientific contexts. These methods have been parallelized, with a recent set of papers attempting to parallelize them on heterogeneous CPU/GPU architectures [1]. While impressive performance was reported, the algorithms did not demonstrate complete weak or strong scalability. Further, the algorithms were not demonstrated on nonuniform distributions of particles that arise in practice. In this paper, we develop an efficient scalable version of the FMM that can be scaled well on many heterogeneous nodes for nonuniform data. Key contributions of our work are data structures that allow uniform work distribution over multiple computing nodes, and that minimize the communication cost. These new data structures are computed using a parallel algorithm, and only require a small additional computation overhead. Numerical simulations on a heterogeneous cluster empirically demonstrate the performance of our algorithm.
The authors present a new approach to find stochastic bounds for a Markov chain - namely with the use of the GPU for computing the bounds. A known algorithm [1,2] is used and it is rewritten to suit the GPU architectu...
详细信息
ISBN:
(纸本)9783319194196;9783319194189
The authors present a new approach to find stochastic bounds for a Markov chain - namely with the use of the GPU for computing the bounds. A known algorithm [1,2] is used and it is rewritten to suit the GPU architecture with the cooperation of the CPU. The authors do some experiments with matrices from various models as well as some random matrices. The tests are analyzed and some future considerations are given.
暂无评论