Experimental development of gate-all-around silicon nanowire field-effect transistors (NWFETs), a viable replacement for FinFETs, can be complemented by technology computer-aided design. This requires the availability...
详细信息
ISBN:
(数字)9798350352917
ISBN:
(纸本)9798350352924;9798350352917
Experimental development of gate-all-around silicon nanowire field-effect transistors (NWFETs), a viable replacement for FinFETs, can be complemented by technology computer-aided design. This requires the availability of advanced device simulators relying on a quantum transport (QT) approach without any empirical parameters as inputs. Concretely, all material properties should be described from first-principles, and the whole physics at play should be accurately modeled, particularly the strong electron-electron interactions occurring in highly confined structures such as NWFETs. To shed light on these many-body effects, we implement them within the self-consistent GW approximation into an ab initio QT solver called QuaTrEx, based on density functional theory and the Non-equilibrium Green's Function formalism. We then simulate transistors made of up to 10,560 atoms on the LUMI supercomputer's GPU partition, reaching a parallel efficiency of 74% (60%) in weak (strong) scaling and an overall computational performance of 69.3 Pflop/s in double precision on 1,800 nodes.
Named data networking (NDN) is a typical solution for next-generation internet systems, and its applications and architectures are now widely demonstrated in both wired and wireless communications. Optical named data ...
详细信息
ISBN:
(纸本)9798350371000;9798350370997
Named data networking (NDN) is a typical solution for next-generation internet systems, and its applications and architectures are now widely demonstrated in both wired and wireless communications. Optical named data networking (ONDN) is an NDN architectural scheme proposed in recent years for use on an optical transmission network, which is an important initiative of NDN for the future development of high-speed broadband internet. Although the idea of ONDN data interaction based on I/R/D protocols has been proposed, the network nodes cannot adopt packet aggregation or division of prioritized packets. In this paper, based on I/R/D communication protocols, fully considering the characteristics of the NDN network architecture on an optical transmission network, we give a feasible method of performing aggregation or division of prioritized packets, analyze the performance of a network that adopts more than one strategy, and prove the feasibility of the method proposed in this paper and the method's enhancement of network performance through a simulation platform to provide certain experimental ideas for the future of the NDN architecture on the optical transmission network.
The proceedings contain 13 papers. The topics discussed include: quantum algorithms and simulation for parallel and distributed quantum computing;tensor network circuit simulation at exascale;Illinois express quantum ...
ISBN:
(纸本)9781728186740
The proceedings contain 13 papers. The topics discussed include: quantum algorithms and simulation for parallel and distributed quantum computing;tensor network circuit simulation at exascale;Illinois express quantum network for distributing and controlling entanglement on metro-scale;exploring affine abstractions for qubit mapping;scalable programming workflows for validation of quantum computers;and mapping constraint problems onto quantum gate and annealing devices.
The proceedings contain 6 papers. The topics discussed include: project 38: innovative architectures for high-performancecomputing systems;implementing performance portable graph algorithms using task-based execution...
ISBN:
(纸本)9781665411264
The proceedings contain 6 papers. The topics discussed include: project 38: innovative architectures for high-performancecomputing systems;implementing performance portable graph algorithms using task-based execution;greatly accelerated scaling of streaming problems with a migrating thread architecture;sparse exact factorization update;no more leaky PageRank;and towards scalable data processing in python with CLIPPy.
Error-bounded lossy compression is a critical technique for significantly reducing scientific data volumes. Compared to CPU-based compressors, GPU-based compressors exhibit substantially higher throughputs, fitting be...
详细信息
ISBN:
(数字)9798350352917
ISBN:
(纸本)9798350352924;9798350352917
Error-bounded lossy compression is a critical technique for significantly reducing scientific data volumes. Compared to CPU-based compressors, GPU-based compressors exhibit substantially higher throughputs, fitting better for today's HPC applications. However, the critical limitations of existing GPU-based compressors are their low compression ratios and qualities, severely restricting their applicability. To overcome these, we introduce a new GPU-based error-bounded scientific lossy compressor named CUSZ-i, with the following contributions: (1) A novel GPU-optimized interpolation-based prediction method significantly improves the compression ratio and decompression data quality. (2) The Huffman encoding module in CUSZ-i is optimized for better efficiency. (3) CUSZ-i is the first to integrate the NVIDIA Bitcomp-lossless as an additional compression-ratio-enhancing module. Evaluations show that CUSZ-i significantly outperforms other latest GPU-based lossy compressors in compression ratio under the same error bound (hence, the desired quality), showcasing a 476% advantage over the second-best. This leads to CUSZ-i's optimized performance in several real-world use cases.
Multiplying two sparse matrices (SpGEMM) is a common computational primitive used in many areas including graph algorithms, bioinformatics, algebraic multigrid solvers, and randomized sketching. Distributed-memory par...
详细信息
ISBN:
(数字)9798350352917
ISBN:
(纸本)9798350352924;9798350352917
Multiplying two sparse matrices (SpGEMM) is a common computational primitive used in many areas including graph algorithms, bioinformatics, algebraic multigrid solvers, and randomized sketching. Distributed-memory parallel algorithms for SpGEMM have mainly focused on sparsity-oblivious approaches that use 2D and 3D partitioning. Sparsity-aware 1D algorithms can theoretically reduce communication by not fetching nonzeros of the sparse matrices that do not participate in the multiplication. Here, we present a distributed-memory 1D SpGEMM algorithm and implementation. It uses MPI RDMA operations to mitigate the cost of packing/unpacking submatrices for communication, and it uses a block fetching strategy to avoid excessive fine-grained messaging. Our results show that our 1D implementation outperforms state-of-the-art 2D and 3D implementations within CombBLAS for many configurations, inputs, and use cases, while remaining conceptually simpler.
The proceedings contain 4 papers. The topics discussed include: distributing higher-dimensional simulations across compute systems: a widely distributed combination technique;benchmarking and extending SYCL hierarchic...
ISBN:
(纸本)9781665411325
The proceedings contain 4 papers. The topics discussed include: distributing higher-dimensional simulations across compute systems: a widely distributed combination technique;benchmarking and extending SYCL hierarchical parallelism;did the GPU obfuscate the load imbalance in my MPI simulation?;and PPIR: parallel pattern intermediate representation.
We present ExaDigiT, an open-source framework for developing comprehensive digital twins of liquid-cooled supercomputers. It integrates three main modules: (1) a resource allocator and power simulator, (2) a transient...
详细信息
ISBN:
(数字)9798350352917
ISBN:
(纸本)9798350352924;9798350352917
We present ExaDigiT, an open-source framework for developing comprehensive digital twins of liquid-cooled supercomputers. It integrates three main modules: (1) a resource allocator and power simulator, (2) a transient thermo-fluidic cooling model, and (3) an augmented reality model of the supercomputer and central energy plant. The framework enables the study of "what-if" scenarios, system optimizations, and virtual prototyping of future systems. Using Frontier as a case study, we demonstrate the framework's capabilities by replaying six months of system telemetry for systematic verification and validation. Such a comprehensive analysis of a liquid-cooled exascale supercomputer is the first of its kind. ExaDigiT elucidates complex transient cooling system dynamics, runs synthetic or real workloads, and predicts energy losses due to rectification and voltage conversion. Throughout our paper, we present lessons learned to benefit HPC practitioners developing similar digital twins. We envision the digital twin will be a key enabler for sustainable, energy-efficient supercomputing.
Existing GPU lossy compressors suffer from expensive data movement overheads, inefficient memory access patterns, and high synchronization latency, resulting in limited throughput. This work proposes CUSZP2, a generic...
详细信息
ISBN:
(数字)9798350352917
ISBN:
(纸本)9798350352924;9798350352917
Existing GPU lossy compressors suffer from expensive data movement overheads, inefficient memory access patterns, and high synchronization latency, resulting in limited throughput. This work proposes CUSZP2, a generic single-kernel error-bounded lossy compressor purely on GPUs designed for applications that require high speed, such as large-scale GPU simulation and large language model training. In particular, CUSZP2 proposes a novel lossless encoding method, optimizes memory access patterns, and hides synchronization latency, achieving extreme end-to-end throughput and optimized compression ratio. Experiments on NVIDIA A100 GPU with 9 real-world HPC datasets demonstrate that, even with higher compression ratios and data quality, CUSZP2 can deliver on average 332.42 and 513.04 GB/s end-to-end throughput for compression and decompression, respectively, which is around 2x of existing pure-GPU compressors and 200x of CPU-GPU hybrid compressors.
In the field of computational science, effectively supporting researchers necessitates a deep understanding of how they utilize computational resources. Building upon a decade-old survey that explored the practices an...
详细信息
ISBN:
(数字)9798350352917
ISBN:
(纸本)9798350352924;9798350352917
In the field of computational science, effectively supporting researchers necessitates a deep understanding of how they utilize computational resources. Building upon a decade-old survey that explored the practices and challenges of research computation, this study aims to bridge the understanding gap between providers of computational resources and researchers who rely on them. This study revisits key survey questions and gathers feedback on open-ended topics from over a hundred interviews. Quantitative analyses of present and past results illuminate the landscape of research computation. Qualitative analyses, including careful use of large language models, highlight trends and challenges with concrete evidence. Given the rapid evolution of computational science, this paper offers a toolkit with methodologies and insights to simplify future research and ensure ongoing examination of the landscape. This study, with its findings and toolkit, guides enhancements to computational systems, deepens understanding of user needs, and streamlines reassessment of the computational landscape.
暂无评论