the proceedings contain 30 papers. the topics discussed include: adaptive compressed caching: design and implementation;enabling dual-core mode in BlueGene/L: challenges and solutions;complex branch profiling for dyna...
ISBN:
(纸本)0769520464
the proceedings contain 30 papers. the topics discussed include: adaptive compressed caching: design and implementation;enabling dual-core mode in BlueGene/L: challenges and solutions;complex branch profiling for dynamic conditional execution;the limits of speculative trace reuse on deeply pipelined processors;performance analysis of DECK collective communication service;a modeling methodology and pre-runtime scheduling for embedded real-time software;performance issues of bandwidth reservations for grid computing;an evaluation of cJava system architecture;ProGrid: a proxy-based architecture for grid operation and management;optimizing packet capture on symmetric multiprocessing machines;a parallel implementation of the LTSn method for a radiative transfer problem;and parallel implementation of a lattice-gauge-theory code: studying quark confinement on PC Clusters.
this paper presents a low-power, Clock-Gated Integrated CRC-BCH (Cyclic Redundancy Check - Bose-Chaudhuri-Hocquenghem) Error Correction Code (ECC) architecture designed to address single-event upsets (SEUs) and multi-...
详细信息
In this paper we study the power consumption of quantum computing platforms when integrated into high-performancecomputing (HPC) centers. We analyze the key components of leading quantum computers (superconducting ci...
详细信息
In biomedical data analysis, feature selection is crucial, particularly for high-dimensional datasets where redundant or irrelevant features might affect model performance. Biomedical datasets introduce additional cha...
详细信息
Misinformation, especially during global crises like the COVID-19 pandemic, has posed significant challenges by spreading harmful and baseless claims at an unprecedented scale. Addressing this critical issue requires ...
详细信息
GPU-based fast Fourier transform (FFT) is extremely important for scientific computing and signal processing. However, we find the inefficiency of existing FFT libraries and the absence of fault tolerance against soft...
详细信息
ISBN:
(纸本)9798400714436
GPU-based fast Fourier transform (FFT) is extremely important for scientific computing and signal processing. However, we find the inefficiency of existing FFT libraries and the absence of fault tolerance against soft error. To address these issues, we introduce TurboFFT, a new FFT prototype co-designed for highperformance and online fault tolerance. For FFT, we propose an architecture-aware, padding-free, and template-based prototype to maximize hardware resource utilization, achieving a competitive or superior performance compared to the state-of-the-art closed-source library, cuFFT. For fault tolerance, we 1) explore algorithm-based fault tolerance (ABFT) at the thread and threadblock levels to reduce additional memory footprint, 2) address the error propagation by introducing a two-side ABFT with location encoding, and 3) further modify the threadblock-level FFT from 1-transaction to multi-transaction in order to bring more parallelism for ABFT. Our two-side strategy enables online correction without additional global memory while our multi-transaction design averages the expensive threadblock-level reduction in ABFT with zero additional operations. Experimental results on an NVIDIA A100 server GPU and a Tesla Turing T4 GPU demonstrate that TurboFFT without fault tolerance is comparable to or up to 300% faster than cuFFT and outperforms VkFFT. TurboFFT with fault tolerance maintains an overhead of 7% to 15%, even under tens of error injections per minute for both FP32 and FP64.
the proceedings contain 32 papers from the 16thsymposium on computerarchitecture and highperformancecomputing. the topics discussed include: self-monitored adaptive cache warm up for microprocessor simulation;the ...
详细信息
the proceedings contain 32 papers from the 16thsymposium on computerarchitecture and highperformancecomputing. the topics discussed include: self-monitored adaptive cache warm up for microprocessor simulation;the eDRAM based L3-Chache of the BlueGene/L supercomputer processor node;multi-profile instruction based compression;a study of errant pipeline flushes caused by value misspeculation;design space exploration using T&D-bench;value predictors for reuse through speculation on traces;optimizations for compiled simulation using instruction type information;and highperformance communication system based on generic programming.
the article discusses various reports published within the issue, including one on a dual-thread speculation system and another on a parallel version of the Tricluster algorithm.
the article discusses various reports published within the issue, including one on a dual-thread speculation system and another on a parallel version of the Tricluster algorithm.
Kidney-related diseases such as tumors, cysts, and stones are common, so prompt and precise diagnosis is essential for better patient outcomes. In this work, we propose developing an automatic Deep Learning (DL) model...
详细信息
暂无评论