Existing block-based parallel file systems, which are deployed in the storage area network (SAN), blend metadata with data in underlying disks. Unfortunately, such symmetric architecture is prone to system-level failu...
详细信息
Existing block-based parallel file systems, which are deployed in the storage area network (SAN), blend metadata with data in underlying disks. Unfortunately, such symmetric architecture is prone to system-level failures, as metadata on shared disks can be damaged by a malfunctioning client. In this paper, we present an asymmetric block-based parallel file system, Redbud, which isolates the metadata storage in the metadata server (MDS) access domain. Although centralized metadata management can effectively improve the reliability of the system, it faces some challenges in providing high performance and availability. Towards this end, we introduce an embedded directory mechanism to explore the disk bandwidth of the metadata storage;we also introduces adaptive layout operations to deliver high I/O throughput for various file access pattern. Besides, by taking the MDS's load into consideration, we propose an adaptive timeout algorithm to make the MDS failure detection adaptive to the evolving workloads, improving the system availability. Measurements of a wide range of workloads demonstrate the benefit of our design and that Redbud gains good scalability.
parallel file systems are experiencing more and more applications from various fields. Various applications have different I/O workload characteristics, which have diverse requirements on accessing storage resources. ...
详细信息
parallel file systems are experiencing more and more applications from various fields. Various applications have different I/O workload characteristics, which have diverse requirements on accessing storage resources. However, parallel file systems often adopt the "one-size-fits-all" solution, which fails to meet specific application needs and hinders the full exploitation of potential performance. This paper presents a framework to enable dynamic file I/O path selection with fine granularity at runtime. The framework adopts a file handle-rich scheme to allow filesystems choose corresponding optimizations to serve I/O requests. Consistency control algorithms are proposed to ensure data consistency while changing optimizations at runtime. One case study on our prototype shows that choosing proper optimizations can improve the I/O performance for small files and large files by up to 40 and 64.4 %, respectively. Another case study shows that the data prefetch performance for real-world application traces can be improved by up to 193 % by selecting correct prefetch patterns. Simulations in large-scale environment also show that our method is scalable and both the memory consumption and the consistency control overhead can be negligible.
The imbalanced I/O load on large parallel file systems affects the parallel I/O performance of high-performance computing (HPC) applications. One of the main reasons for I/O imbalances is the lack of a global view of ...
详细信息
The imbalanced I/O load on large parallel file systems affects the parallel I/O performance of high-performance computing (HPC) applications. One of the main reasons for I/O imbalances is the lack of a global view of system-wide resource consumption. While approaches to address the problem already exist, the diversity of HPC workloads combined with different file striping patterns prevents widespread adoption of these approaches. In addition, load-balancing techniques should be transparent to client applications. To address these issues, we propose Tarazu, an end-to-end control plane where clients transparently and adaptively write to a set of selected I/O servers to achieve balanced data placement. Our control plane leverages real-time load statistics for global data placement on distributed storage servers, while our design model employs trace-based optimization techniques to minimize latency for I/O load requests between clients and servers and to handle multiple striping patterns in files. We evaluate our proposed system on an experimental cluster for two common use cases: the synthetic I/O benchmark IOR and the scientific application I/O kernel HACC-I/O. We also use a discrete-time simulator with real HPC application traces from emerging workloads running on the Summit supercomputer to validate the effectiveness and scalability of Tarazu in large-scale storage environments. The results show improvements in load balancing and read performance of up to 33% and 43%, respectively, compared to the state-of-the-art.
The semantics of HPC storage systems are defined by the consistency models to which they abide. Storage consistency models have been less studied than their counterparts in memory systems, with the exception of the PO...
详细信息
The semantics of HPC storage systems are defined by the consistency models to which they abide. Storage consistency models have been less studied than their counterparts in memory systems, with the exception of the POSIX standard and its strict consistency model. The use of POSIX consistency imposes a performance penalty that becomes more significant as the scale of parallel file systems increases and the access time to storage devices, such as node-local solid storage devices, decreases. While some efforts have been made to adopt relaxed storage consistency models, these models are often defined informally and ambiguously as by-products of a particular implementation. In this work, we establish a connection between memory consistency models and storage consistency models and revisit the key design choices of storage consistency models from a high-level perspective. Further, we propose a formal and unified framework for defining storage consistency models and a layered implementation that can be used to easily evaluate their relative performance for different I/O workloads. Finally, we conduct a comprehensive performance comparison of two relaxed consistency models on a range of commonly seen parallel I/O workloads, such as checkpoint/restart of scientific applications and random reads of deep learning applications. We demonstrate that for certain I/O scenarios, a weaker consistency model can significantly improve the I/O performance. For instance, in small random reads that are typically found in deep learning applications, session consistency achieved a 5x improvement in I/O bandwidth compared to commit consistency, even at small scales.
Log-based anomaly detection has been extensively studied to help detect complex runtime anomalies in production systems. However, existing techniques exhibit several common issues. First, they rely heavily on expert-l...
详细信息
ISBN:
(纸本)9798400701559
Log-based anomaly detection has been extensively studied to help detect complex runtime anomalies in production systems. However, existing techniques exhibit several common issues. First, they rely heavily on expert-labeled logs to discern anomalous behavior patterns. But labelling enough log data manually to effectively train deep neural networks may take too long. Second, they rely on numeric model prediction based on numeric vector input which causes model decisions to be largely non-interpretable by humans which further rules out targeted error correction. In recent years, we have witnessed groundbreaking advancements in large language models (LLMs) such as ChatGPT. These models have proven their ability to retain context and formulate insightful responses over entire conversations. They also present the ability to conduct few-shot and in-context learning with reasoning ability. In light of these abilities, it is only natural to explore their applicability in understanding log content and conducting anomaly classification among parallel file system logs.
Driven by the increasing requirements of high-performance computing applications,supercomputers are prone to containing more and more computing *** running on such a large-scale computing system are likely to spawn mi...
详细信息
Driven by the increasing requirements of high-performance computing applications,supercomputers are prone to containing more and more computing *** running on such a large-scale computing system are likely to spawn millions of parallel processes,which usually generate a burst of I/O requests,introducing a great challenge into the metadata management of underlying parallelfile *** traditional method used to overcome such a challenge is adopting multiple metadata servers in the scale-out manner,which will inevitably confront with serious network and consistence *** work instead pursues to enhance the metadata performance in the scale-up ***,we propose to improve the performance of each individual metadata server by employing GPU to handle metadata requests in *** proposal designs a novel metadata server architecture,which employs CPU to interact with filesystem clients,while offloading the computing tasks about metadata into *** take full advantages of the parallelism existing in GPU,we redesign the in-memory data structure for the name space of file *** new data structure can perfectly fit to the memory architecture of GPU,and thus helps to exploit the large number of parallel threads within GPU to serve the bursty metadata requests *** implement a prototype based on BeeGFS and conduct extensive experiments to evaluate our proposal,and the experimental results demonstrate that our GPU-based solution outperforms the CPU-based scheme by more than 50%under typical metadata *** superiority is strengthened further on high concurrent scenarios,e.g.,the high-performance computing systems supporting millions of parallel threads.
For modern HPC systems, failures are treated as the norm instead of exceptions. To avoid rerunning applications from scratch, checkpoint/restart techniques are employed to periodically checkpoint intermediate data to ...
详细信息
ISBN:
(纸本)9781510860162
For modern HPC systems, failures are treated as the norm instead of exceptions. To avoid rerunning applications from scratch, checkpoint/restart techniques are employed to periodically checkpoint intermediate data to parallel file systems. To increase HPC checkpointing speed, distributed burst buffers (DBB) have been proposed to use node-local NVRAM to absorb the bursty checkpoint data. However, without proper coordination, DBB is prone to suffer from low resource utilization. To solve this problem, we propose an NVRAM-based burst buffer coordination system, named collaborative distributed burst buffer (CDBB). CDBB coordinates all the available burst buffers, based on their priorities and states, to help overburdened burst buffers and maximize resource utilization. We built a proof-of-concept prototype and tested CDBB at the Minnesota Supercomputing Institute. Compared with a traditional DBB system, CDBB can speed up checkpointing by up to 8.4x under medium and heavy workloads and only introduces negligible overhead.
Modern High-Performance Computing (HPC) environments face mounting challenges due to the shift from large to small file datasets, along with an increasing number of users and parallelized applications. As HPC systems ...
详细信息
Modern High-Performance Computing (HPC) environments face mounting challenges due to the shift from large to small file datasets, along with an increasing number of users and parallelized applications. As HPC systems rely on parallel file systems (PFS), such as Lustre for data processing, performance bottlenecks stemming from Object Storage Target (OST) contention have become a significant concern. Existing solutions, such as LADS with its object-level scheduling approach, fall short in large-scale HPC environments due to their inability to effectively address metadata I/O bottlenecks and the growing number of I/O processes. This study highlights the pressing need for a comprehensive solution that tackles both OST contention and metadata I/O challenges in diverse HPC workloads. To address these challenges, we propose SwiftLoad, an object-level I/O scheduling framework that leverages a metadata catalog to enhance the performance and efficiency of parallel HPC utilities. The adoption of the metadata catalog mitigates the metadata I/O bottlenecks that commonly occur in HPC utilities, a challenge that is particularly pronounced in object-level I/O scheduling. SwiftLoad addresses OST contention and the uneven distribution of I/O processes across different OSTs through mathematical modeling and incorporates a Loader Configuration Module to regulate the number of I/O processes. Evaluated with two representative utilities-data deduplication profiling and data augmentation-SwiftLoad achieved performance improvements of up to $5.63\times $ and $11.0\times $ , respectively, on a production supercomputer.
ExSeisDat is designed using standard message passing interface (MPI) library for seismic data processing on high-performance super-computing clusters. These clusters are generally designed for efficient execution of c...
详细信息
ExSeisDat is designed using standard message passing interface (MPI) library for seismic data processing on high-performance super-computing clusters. These clusters are generally designed for efficient execution of complex tasks including large size IO. The IO performance degradation issues arise when multiple processes try accessing data from parallel networked storage. These complications are caused by restrictive protocols running by a parallel file system (PFS) controlling the disks and due to less advancement in storage hardware itself as well. This requires and leads to the tuning of specific configuration parameters to optimize the IO performance, commonly not considered by users focused on writing parallel application. Despite its consideration, the changes in configuration parameters are required from case to case. It adds up to further degradation in IO performance for a large SEG-Y format seismic data file scaling to petabytes. The SEG-Y IO and file sorting operations are the two of the main features of ExSeisDat. This research paper proposes technique to optimize these SEG-Y operations based on artificial neural networks (ANNs). The optimization involves auto-tuning of the related configuration parameters, using IO bandwidth prediction by the trained ANN models through machine learning (ML) process. Furthermore, we discuss the impact on prediction accuracy and statistical analysis of auto-tuning bandwidth results, by the variation in hidden layers nodes configuration of the ANNs. The results have shown the overall improvement in bandwidth performance up to 108.8% and 237.4% in the combined SEG-Y IO and file sorting operations test cases, respectively. Therefore, this paper has demonstrated the significant gain in SEG-Y seismic data bandwidth performance by auto-tuning the parameters settings on runtime by using an ML approach.
Burst Buffer is widely used in supercomputer centers to bridge the performance gap between computational power and the high-performance I/O systems. The primary role of Burst Buffer is to temporarily absorb the bursty...
详细信息
Burst Buffer is widely used in supercomputer centers to bridge the performance gap between computational power and the high-performance I/O systems. The primary role of Burst Buffer is to temporarily absorb the bursty I/O and reduce the heavy access on parallel file system (PFS). However, the job resource manager on High-Performance Computer (HPC) systems prefers to use a dedicated Burst Buffer allocation approach, which eventually leads to the severely underutilized Burst Buffer resource. To improve the efficiency of using the expensive Burst Buffer resource, we analyze the I/O patterns on Burst Buffer in depth. We propose Burst Buffer over-subscription allocation method, which improves Burst Buffer utilization by allowing each job to access Burst Buffer only during its I/O phases so that the jobs can overlap each other. Furthermore, we develop a new I/O congestion-aware scheduler and a transparent data management system between Burst Buffer and PFS. Our approach also reduces the memory overhead and improves the data persistence of the data management system by adapting the persistent memory. With the proposed approach, not only the Burst Buffer utilization can be improved, but also HPC applications can achieve high I/O performance by exploiting the powerful Burst Buffer hardware capabilities. Experimental results show that BBOS can improve Burst Buffer utilization by up to 120% while more stable and higher checkpoint performance is guaranteed even under high I/O loads compared to other state-of-the-art schedulers. Besides, our approach can improve the hit ratio of restart requests by up to 96.4% and provides up to 210% higher restart throughput on Burst Buffer.
暂无评论