The increasing demand for real-time data analysis in Internet of Things (IoT) ecosystems has created several challenges, particularly in environments where resources are limited, and minimizing data processing latency...
详细信息
In the era of Big Data, the computational demands of machine learning (ML) algorithms have grown exponentially, necessitating the development of efficient parallel computing techniques. This research paper delves into...
详细信息
In recent years, large-scale pretrained natural language processing models such as BERT, GPT3 have achieved good results in processing tasks. However, in daily applications, these large-scale language models usually e...
详细信息
distributed deep learning systems commonly use synchronous data parallelism to train models. However, communication overhead can be costly in distributed environments with limited communication bandwidth. To reduce co...
详细信息
Pixel-level sea-land segmentation on high-resolution remote sensing images is a basic task in remote sensing applications and is of great significance for coastline extraction and near-shore marine target detection. T...
详细信息
Structured dense matrices result from boundary integral problems in electrostatics and geostatistics, and also Schur complements in sparse preconditioners such as multi-frontal methods. Exploiting the structure of suc...
详细信息
ISBN:
(纸本)9798400708435
Structured dense matrices result from boundary integral problems in electrostatics and geostatistics, and also Schur complements in sparse preconditioners such as multi-frontal methods. Exploiting the structure of such matrices can reduce the time for dense direct factorization from O(N-3) to O(N). The Hierarchically Semi-Separable (HSS) matrix is one such low rank matrix format that can be factorized using a Cholesky-like algorithm called ULV factorization. The HSS-ULV algorithm is highly parallel because it removes the dependency on trailing sub-matrices at each HSS level. However, a key merge step that links two successive HSS levels remains a challenge for efficient parallelization. In this paper, we use an asynchronous runtime system PaRSEC with the HSS-ULV algorithm. We compare our work with STRUMPACK and LORAPO, both state-of-the-art implementations of dense direct low rank factorization, and achieve up to 2x better factorization time for matrices arising from a diverse set of applications on up to 128 nodes of Fugaku for similar or better accuracy for all the problems that we survey.
Reconstructing the damaged images with perspective views has an extensive range in the field of image inpainting. However, most existing methods generated inadequately realistic restored images. Accomplishing this pro...
详细信息
processing-in-Memory (PIM) architectures have emerged as a promising solution for data-intensive applications, providing significant speedup by processing data directly within the memory. However, the impact of PIM on...
详细信息
ISBN:
(纸本)9783031488023;9783031488030
processing-in-Memory (PIM) architectures have emerged as a promising solution for data-intensive applications, providing significant speedup by processing data directly within the memory. However, the impact of PIM on energy efficiency is not well characterized. In this paper, we provide a comprehensive review of workloads ported to the first PIM product available on the market, namely the UPMEM architecture, and quantify the impact on each workload in terms of energy efficiency. Less than the half of the reviewed papers provide insights on the impact of PIM on energy efficiency, and the evaluation methods differ from one paper to the other. To provide a comprehensive overview, we propose a methodology for estimating energy consumption and efficiency for both the PIM and baseline systems at data center level, enabling a direct comparison of the two systems. Our results show that PIM can provide significant energy savings for data intensive workloads. We also identify key factors that impact the energy efficiency of UPMEM PIM, including the workload characteristics. Overall, this paper provides valuable insights for researchers and practitioners looking to optimize energy efficiency in data-intensive applications using UPMEM PIM architecture.
Light Field (LF) image Super-Resolution (SR) requires leveraging the spatial-angular relationship to super-resolve low-resolution LF images into corresponding high-resolution counterparts. Recently, many Transformer-b...
详细信息
ISBN:
(纸本)9798350390155;9798350390162
Light Field (LF) image Super-Resolution (SR) requires leveraging the spatial-angular relationship to super-resolve low-resolution LF images into corresponding high-resolution counterparts. Recently, many Transformer-based methods have been proposed for LFSR. However, these methods struggle to recover sharp edges and intricate structures due to the SelfAttention (SA) mechanism's intrinsic defects of capturing highfrequency information. Additionally, most of them fail to excavate the global spatial-angular information across all views hindered by the expensive computational cost of SA on 4D LF data. To tackle these issues, we introduce Trident Transformer (TriFormer) with three parallel branches: the high-frequency branch, which utilizes convolution and max-pooling for recovering finegrained textures;the low-frequency branch, which adopts vanilla SA to preserve the low-frequency component;and the interactivefrequency branch, which interacts the frequency information and enhances full-frequency feature, aiding in capturing global information across all angular views. A progressive feature fusion approach is then applied to integrate all distinct information. Experimental results demonstrate our TriFormer's superiority over leading LFSR methods on five benchmarks, while maintaining a compact model size and computational efficiency. The code is publicly available at https://***/wziqi/TriFormer.
The strong consistency and stateful workflow are seen as the major factors for limiting parallel I/O performance because of the need for locking and state management. While the POSIX-based I/O model dominates modern H...
详细信息
ISBN:
(纸本)9781665469586
The strong consistency and stateful workflow are seen as the major factors for limiting parallel I/O performance because of the need for locking and state management. While the POSIX-based I/O model dominates modern HPC storage infrastructure, emerging object storage technology can potentially improve I/O performance by eliminating these bottlenecks. Despite a wide deployment on the cloud, its adoption in HPC remains low. We argue one reason is the lack of a suitable programming interface for parallel I/O in scientific applications. In this work, we introduce NoaSci, a Numerical Object Array library for scientific applications. NoaSci supports different data formats (e.g. HDF5, binary), and focuses on supporting node-local burst buffers and object stores. We demonstrate for the first time how scientific applications can perform parallel I/O on Seagate's Motr object store through NoaSci. We evaluate NoaSci's preliminary performance using the iPIC3D space weather application and position against existing I/O methods.
暂无评论