Datalog, a bottom-up declarative logic programming language, has a wide variety of uses for deduction, modeling, and data analysis, across application domains. Datalog can be efficiently implemented using relational a...
详细信息
ISBN:
(纸本)9781665475068
Datalog, a bottom-up declarative logic programming language, has a wide variety of uses for deduction, modeling, and data analysis, across application domains. Datalog can be efficiently implemented using relational algebra primitives such as join, projection and union. While there exist several multithreaded and multi-core implementations of Datalog, targeting CPU-based systems, our work makes an inroad towards developing a Datalog implementation for GPUs. We demonstrate the feasibility of a high-performance relational algebra backend for a subset of Datalog applications that can effectively leverage the parallelism of GPUs using cuDF. cuDF is a library from the Rapids suite that uses the NVIDia CUDA programming model for GPU parallelism. It provides similar functionalities to Pandas, a popular data analysis engine. In this paper, we analyze and evaluate the performance of cuDF versus Pandas for two graphmining problems implemented in Datalog, (1) triangle counting and (2) transitive-closure computation.
applications where continuous streams of data are passed through large data structures are becoming of increasing importance. However, their execution on conventional architectures, especially when parallelism is desi...
详细信息
ISBN:
(纸本)9781665411264
applications where continuous streams of data are passed through large data structures are becoming of increasing importance. However, their execution on conventional architectures, especially when parallelism is desired to boost performance, is highly inefficient. The primary issue is often with the need to stream large numbers of disparate data items through the equivalent of very large hash tables distributed across many nodes. This paper builds on some prior work on the Firehose streaming benchmark where an emerging architecture using threads that can migrate through memory has shown to be much more efficient at such problems. This paper extends that work to use a second generation system to not only show that same improved efficiency (10X) for larger core counts, but even signantly higher raw performance (with FPGA-based cores running at 1/10th the clock of conventional systems). Further, this additional data yields insight into what resources represent the bottlenecks to even more performance, and make a reasonable projection that implementation of such an architecture with current technology would lead to 10X performance gain on an apples-to-apples basis with conventional systems.
Molecular docking is a key method in computer-aided drug design, where the rapid identification of drug candidates is crucial for combating diseases. AutoDock is a widely-used molecular docking program, having an irre...
详细信息
ISBN:
(纸本)9781665411264
Molecular docking is a key method in computer-aided drug design, where the rapid identification of drug candidates is crucial for combating diseases. AutoDock is a widely-used molecular docking program, having an irregular structure characterized by a divergent control flow and compute-intensive calculations. This work investigates porting AutoDock to the SX-Aurora TSUBASA vector engine and evaluates the achievable performance on a number of real-world input compounds. In particular, we discuss the platform-specific coding styles required to handle the high degree of irregularity in both local-search methods employed by AutoDock. These Solis-Wets and ADADELTA methods take up a large part of the total computation time. Based on our experiments, we achieved runtimes on the SX-Aurora TSUBASA VE 20B that are on average 3x faster than on modern dual-socket 64-core CPU nodes. Our solution is competitive with V100 GPUs, even though these already use newer chip fabrication technology (12 nm vs. 16 nm on the VE 20B).
The proceedings contain 11 papers. The topics discussed include: overcoming load imbalance for irregular sparse matrices;optimizing Word2Vec performance on multicore systems;parallel depth-first search for directed ac...
ISBN:
(纸本)9781450351362
The proceedings contain 11 papers. The topics discussed include: overcoming load imbalance for irregular sparse matrices;optimizing Word2Vec performance on multicore systems;parallel depth-first search for directed acyclic graphs;progressive load balancing of asynchronous algorithms;a case for migrating execution for irregularapplications;pressure-driven hardware managed thread concurrency for irregularapplications;an efficient data layout transformation algorithm for locality-aware parallel sparse FFT;spherical region queries on multicore architectures;evaluation of knight landing high bandwidth memory for HPC workloads;enabling work-efficiency for high performance vertex-centric graph analytics on GPUs;and accelerating energy games solvers on modern architectures.
Datalog, a bottom-up declarative logic programming language, has a wide variety of uses for deduction, modeling, and data analysis, across application domains. Datalog can be efficiently implemented using relational a...
详细信息
ISBN:
(纸本)9781665475075
Datalog, a bottom-up declarative logic programming language, has a wide variety of uses for deduction, modeling, and data analysis, across application domains. Datalog can be efficiently implemented using relational algebra primitives such as join, projection and union. While there exist several multi-threaded and multi-core implementations of Datalog, targeting CPU-based systems, our work makes an inroad towards developing a Datalog implementation for GPUs. We demonstrate the feasibility of a high-performance relational algebra backend for a subset of Datalog applications that can effectively leverage the parallelism of GPUs using cuDF. cuDF is a library from the Rapids suite that uses the NVIDia CUDA programming model for GPU parallelism. It provides similar functionalities to Pandas, a popular data analysis engine. In this paper, we analyze and evaluate the performance of cuDF versus Pandas for two graph-mining problems implemented in Datalog, (1) triangle counting and (2) transitive-closure computation.
Welcome to the 2021 edition of ia 3 , the workshop on irregularapplications: architectures and algorithms, co-located with SC21. While the situation appears to be finally improving, we are still experiencing the long...
Welcome to the 2021 edition of ia 3 , the workshop on irregularapplications: architectures and algorithms, co-located with SC21. While the situation appears to be finally improving, we are still experiencing the long tail of the COVID-19 pandemic. However, while our entire community may not be able yet to fully reconvine at an in person meeting, our area of research is more vital than ever. Accelerating data analysis has become a key issue to address with domain specific architectures and new computing paradigm. Algorithmic research, looking at new methods and approximation that can provide scalability and speed on these novel architectures, is more needed than ever. And the rest of the software stack, from languages, to compilers, to runtime, is subject to intense interest to enable the whole system view.
暂无评论