Various partitioned global address space (PGAS) languages capable of providing global-view programming environments on multi-node computer systems have been proposed to improve programming productivity in high-perform...
详细信息
ISBN:
(纸本)9781728127941
Various partitioned global address space (PGAS) languages capable of providing global-view programming environments on multi-node computer systems have been proposed to improve programming productivity in high-performance computing. However, several PGAS languages often require a detailed description of the remote data access, similar to descriptions used in message passing interface one-sided communications. Some PGAS languages have limitations pertaining to remote data access and recommend their local-view programming models, rather than the global-view ones, due to performance-related reasons. In this study, we propose SMint, which is an application programming interface that provides a global-view programming model with a software distributed shared memory mSMS as the runtime. Using stencil computation as a typical processing method, the performance and programmability of SMint have been compared with those of XcalableMP and Unified Parallel C, which are well-known examples of PGAS languages based on the C language. It was found that SMint achieved the best performance under the ideal global-view programming model.
Today's Quantum Computers (QCs) face significant engineering challenges that limit their size and fidelity. To execute realistic applications, researchers must develop ways to move past these constraints. CutQC en...
详细信息
ISBN:
(纸本)9798331541378
Today's Quantum Computers (QCs) face significant engineering challenges that limit their size and fidelity. To execute realistic applications, researchers must develop ways to move past these constraints. CutQC enables small QCs to run larger quantum circuits by cutting circuits into smaller subcircuits and then utilizing classical resources and Kronecker products to reproduce the desired output. In this work, we enhance the reconstruction process. In particular, we develop a distributed PyTorch implementation of CutQC's classical post-processing step, which can run across multiple nodes on either GPU or CPU devices. Our results show that our PyTorch implementation executes circuits of up to 35 qubits on 10- and 15-qubit QCs with efficient reconstruction. We use single-node computation as a baseline and then compare the results to the runtimes of our PyTorch workflows. By utilizing parallelism and specified devices, the reconstruction step of hybrid quantum-classical circuit execution can be improved. The original CutQC paper demonstrated a 60X to 8600X runtime speedup over classical simulation, and our work increases this speedup. These results show that large quantum circuits can be run on smaller QCs by leveraging circuit cutting with data parallelism techniques to maintain a reasonable runtime. This allows researchers to maintain the high fidelity of smaller machines while executing larger circuits.
暂无评论