This paper introduces Speedcode, an online programming platform that aims to improve the accessibility of software performance-engineering education. At its core, Speedcode provides a platform that lets users gain han...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
This paper introduces Speedcode, an online programming platform that aims to improve the accessibility of software performance-engineering education. At its core, Speedcode provides a platform that lets users gain hands-on experience in software performance engineering and parallel programming by completing short programming exercises. Speedcode challenges users to develop fast multicore solutions for short programming problems and evaluates their code's performance and scalability in a quiesced cloud environment. Speedcode supports parallel programming using OpenCilk, task-parallelcomputing platform that is open-source and easy to program, teach and use for research. Speedcode aims to reduce barriers to learning and teaching software performance engineering. It allows users to run and evaluate their code on modern multicore machines from their own computer without installing any software. This provides users an easy introduction to the topic, and enables teachers to more easily incorporate lessons on software performance engineering into their courses without incurring the onerous overhead of needing to setup computing environments for their students.
Sparse general matrix-matrix multiplication (SpGEMM) is challenging especially on graphic accelerators. Existing solutions do not fully utilize the shared memory of the graphics accelerator. Our proposal could effecti...
详细信息
ISBN:
(纸本)9798400701559
Sparse general matrix-matrix multiplication (SpGEMM) is challenging especially on graphic accelerators. Existing solutions do not fully utilize the shared memory of the graphics accelerator. Our proposal could effectively utilize the graphics accelerator's on-chip shared memory and dynamically assign the device resources by grouping the rows based on a hybrid strategy for load balancing. Experiments show that our proposal achieves speedups of up to x7.43 in double precision compared to existing SpGEMM libraries. Our implementation is fully general and our optimization strategy adaptively processes the SpGEMM workload row-wise to substantially improve performance by decreasing the work complexity and utilizing the memory hierarchy more effectively.
The representation of the base station radiation prediction range based on the three-dimensional triangulation grid can more comprehensively and quickly reflect the distribution details of base station radiation predi...
详细信息
Gzip is a file compression format, which is ubiquitously used. Although a multitude of gzip implementations exist, only pugz can fully utilize current multi-core processor architectures for decompression. Yet, pugz ca...
详细信息
ISBN:
(纸本)9798400701559
Gzip is a file compression format, which is ubiquitously used. Although a multitude of gzip implementations exist, only pugz can fully utilize current multi-core processor architectures for decompression. Yet, pugz cannot decompress arbitrary gzip files. It requires the decompressed stream to only contain byte values 9-126. In this work, we present a generalization of the parallelization scheme used by pugz that can be reliably applied to arbitrary gzip-compressed data without compromising performance. We show that the requirements on the file contents posed by pugz can be dropped by implementing an architecture based on a cache and a parallelized prefetcher. This architecture can safely handle faulty decompression results, which can appear when threads start decompressing in the middle of a gzip file by using trial and error. Using 128 cores, our implementation reaches 8.7 GB/s decompression bandwidth for gzip-compressed base64-encoded data, a speedup of 55 over the single-threaded GNU gzip, and 5.6 GB/s for the Silesia corpus, a speedup of 33 over GNU gzip.
The increasing penetrations of distributed energy resources (DERs) at the power distribution level augments the complexity of optimally operating the grid edge assets, primarily because of the nonlinearity and scale o...
详细信息
ISBN:
(纸本)9780998133164
The increasing penetrations of distributed energy resources (DERs) at the power distribution level augments the complexity of optimally operating the grid edge assets, primarily because of the nonlinearity and scale of the system. An alternative is to solve the relaxed convex or linear-approximated problem, but these methods lead to sub-optimal or power-flow infeasible solutions. This paper proposes a scalable and fast approach to solve the large nonlinear optimal power flow (OPF) problem using a developed distributed method. The full network-level OPF problem is decomposed into multiple smaller sub-problems that are easy to solve - the distributed method attains network-level optimal solutions upon consensus. This effective decomposition technique reduces the number of iterations required for a consensus by order of magnitude compared to traditional distributed algorithms. We demonstrate the proposed approach by solving different nonlinear OPF problems (different problem objectives) for a distribution system with more than fifty-thousands (50,000) problem variables.
The development of cloud computing has led to the explosion of network traffic. The switches of cloud computing make it hard to process large-scale network traffic. Prior approaches proposed flow rules compression met...
详细信息
Maintaining electric power system stability is paramount, especially in extreme contingencies involving unexpected outages of multiple generators or transmission lines that are typical during severe weather events. Su...
详细信息
ISBN:
(纸本)9781450394451
Maintaining electric power system stability is paramount, especially in extreme contingencies involving unexpected outages of multiple generators or transmission lines that are typical during severe weather events. Such outages often lead to large supply-demand mismatches followed by subsequent system frequency deviations from their nominal value. The extent of frequency deviations is an important metric of system resilience, and its timely mitigation is a central goal of power system operation and control. This paper develops a novel nonlinear model predictive control (NMPC) method to minimize frequency deviations when the grid is affected by an unforeseen loss of multiple components. Our method is based on a novel multi-period alternating current optimal power flow (ACOPF) formulation that accurately models both nonlinear electric power flow physics and the primary and secondary frequency response of generator control mechanisms. We develop a distributedparallel Julia package for solving the large-scale nonlinear optimization problems that result from our NMPC method and thereby address realistic test instances on existing high-performance computing architectures. Our method demonstrates superior performance in terms of frequency recovery over existing industry practices, where generator levels are set based on the solution of single-period classical ACOPF models.
This poster investigates the challenges of dynamic memory allocation in a hierarchical parallel context for the GYSELA code, a gyrokinetic simulation tool for studying plasma turbulence. Using the SYCL 2020 programmin...
详细信息
Providing efficient Functions as a Service (FaaS) is challenging due to the serverless programming model and highly heterogeneous and dynamic workloads. Great strides have been made in optimizing FaaS performance thro...
详细信息
ISBN:
(纸本)9798400701559
Providing efficient Functions as a Service (FaaS) is challenging due to the serverless programming model and highly heterogeneous and dynamic workloads. Great strides have been made in optimizing FaaS performance through scheduling, caching, virtualization, and other resource management techniques. The combination of these advances and growing FaaS workloads have pushed the performance bottleneck into the control plane itself. Current FaaS control planes like OpenWhisk introduce 100s of milliseconds of latency overhead, and are becoming unsuitable for high performance FaaS research and deployments. We present the design and implementation of Iluvatar, a fast, modular, extensible FaaS control plane which reduces the latency overhead by more than two orders of magnitude. Iluvatar has a worker-centric architecture and introduces a new function queue technique for managing function scheduling and overcommitment. Iluvatar is implemented in Rust in about 13,000 lines of code, and introduces only 3ms of latency overhead under a wide range of loads, which is more than 2 orders of magnitude lower than OpenWhisk.
暂无评论