Efficient resource allocation methods are crucial in cloud computing environments during the Internet of Things (IoT) era. Significant challenges in processing, analyzing, and storing data arise from the vast amounts ...
详细信息
Gzip is a file compression format, which is ubiquitously used. Although a multitude of gzip implementations exist, only pugz can fully utilize current multi-core processor architectures for decompression. Yet, pugz ca...
详细信息
ISBN:
(纸本)9798400701559
Gzip is a file compression format, which is ubiquitously used. Although a multitude of gzip implementations exist, only pugz can fully utilize current multi-core processor architectures for decompression. Yet, pugz cannot decompress arbitrary gzip files. It requires the decompressed stream to only contain byte values 9-126. In this work, we present a generalization of the parallelization scheme used by pugz that can be reliably applied to arbitrary gzip-compressed data without compromising performance. We show that the requirements on the file contents posed by pugz can be dropped by implementing an architecture based on a cache and a parallelized prefetcher. This architecture can safely handle faulty decompression results, which can appear when threads start decompressing in the middle of a gzip file by using trial and error. Using 128 cores, our implementation reaches 8.7 GB/s decompression bandwidth for gzip-compressed base64-encoded data, a speedup of 55 over the single-threaded GNU gzip, and 5.6 GB/s for the Silesia corpus, a speedup of 33 over GNU gzip.
With the increasing use of high-performance computing, users are turning to programs for concurrent execution enhance the speed and overall performance of large-scale programs. This trend is supported through the use ...
详细信息
Sorting algorithms are fundamental tools in data *** has been a deep area for algorithmic researchers, and many resources have been invested in more work on sorting *** this purpose, many existing sorting algorithms h...
详细信息
The development of cloud computing has led to the explosion of network traffic. The switches of cloud computing make it hard to process large-scale network traffic. Prior approaches proposed flow rules compression met...
详细信息
This poster investigates the challenges of dynamic memory allocation in a hierarchical parallel context for the GYSELA code, a gyrokinetic simulation tool for studying plasma turbulence. Using the SYCL 2020 programmin...
详细信息
In modern HPC systems with deep hierarchical architectures, large-scale applications often struggle to efficiently utilize the abundant cores due to the saturation of resources such as memory. Co-allocating multiple a...
详细信息
ISBN:
(纸本)9783031488023;9783031488030
In modern HPC systems with deep hierarchical architectures, large-scale applications often struggle to efficiently utilize the abundant cores due to the saturation of resources such as memory. Co-allocating multiple applications to share compute nodes can mitigate these issues and increase system throughput. However, co-allocation may harm the performance of individual applications due to resource contention. Past research suggests that topology-aware mappings can improve the performance of parallel applications that do not share resources. In this work, we implement application-oblivious, topology-aware process-to-core mappings via different core enumerations that support the co-allocation of parallel applications. We show that these mappings have a significant impact on the available memory bandwidth. We explore how these process-to-core mappings can affect the individual application duration as well as the makespan of job schedules when they are combined with co-allocation. Our main objective is to assess whether co-allocation with a topology-aware mapping can be a viable alternative to the exclusive node allocation policies that are currently common in HPC clusters.
distributed locks are used to guarantee the distributed client-cache coherence in parallel file systems. However, they lead to poor performance in the case of parallel writes under high-contention workloads. We analyz...
详细信息
ISBN:
(纸本)9781665454445
distributed locks are used to guarantee the distributed client-cache coherence in parallel file systems. However, they lead to poor performance in the case of parallel writes under high-contention workloads. We analyze the distributed lock manager and find out that lock conflict resolution is the root cause of the poor performance, which involves frequent lock revocations and slow data flushing from client caches to data servers. We design a distributed lock manager named SeqDLM by exploiting the sequencer mechanism. SeqDLM mitigates the lock conflict resolution overhead using early grant and early revocation while keeping the same semantics as traditional distributed locks. To evaluate SeqDLM, we have implemented a parallel file system called ccPFS using both SeqDLM and traditional distributed locks. Evaluations on 96 nodes show SeqDLM outperforms the traditional distributed locks by up to 10.3x for high-contention parallel writes on a shared file with multiple stripes.
Tremendous progress in Internet of Things (IoT) has fuelled the transmission of data through several devices including medical equipment's. Precise diagnosis of diseases through medical images obtained through CT-...
详细信息
暂无评论