Recent trends in HPC systems increasingly emphasize accelerators, particularly GPUs, as autonomous execution units, shifting control of entire program execution to GPUs. Communication libraries enable devices to move ...
详细信息
ISBN:
(纸本)9798350355543
Recent trends in HPC systems increasingly emphasize accelerators, particularly GPUs, as autonomous execution units, shifting control of entire program execution to GPUs. Communication libraries enable devices to move data independently among one another, bringing forth latency improvements, and first-party GPU runtimes expose APIs for kernels to organize their execution. Despite the trends and advancements, current high-level frameworks and compilers lack support for constructs enabling this autonomous execution. In this work, we aim to bridge this gap with a compiler and provide a productive method for writing efficient GPU-first code. We design and develop a code generator that efficiently fuses and schedules persistent kernels, provides high-level abstractions over device resources, and enables GPU-initiated communication within Python code using NVSHMEM to realize autonomous multi-GPU execution. We compare our implementation to other accelerated Python compilers including CuPy, DaCe, and cuNumeric on 22 NPBench kernels. We additionally perform a scaling study of distributed 2D/3D Jacobi and observe a speedup of 6.1x and 30.8x over DaCe and cuNumeric, respectively, on 8 GPUs for the 3D case with a scaling efficiency of 98%.
In this paper, we demonstrate how to conduct OD analysis based on big taxi trajectory data with XStar in an efficient manner. XStar, originally developed by the first author, is a standalone software system dedicated ...
详细信息
Non-volatile storage (NVM) technologies provide faster data access compared to traditional hard disk drives and can benefit applications executing on accelerators like general purpose graphics processing units (GPGPUs...
详细信息
ISBN:
(纸本)9781450383943
Non-volatile storage (NVM) technologies provide faster data access compared to traditional hard disk drives and can benefit applications executing on accelerators like general purpose graphics processing units (GPGPUs). Many contemporary GPU-friendly applications process huge volumes of data residing in the secondary storage. Several research work propose techniques to optimize data transfer overheads between devices connected to the same bus e.g., peer-to-peer data transfer between NVMe-SSD and GPU connected to a PCI bus. The applicability of these techniques, extent of their benefit and associated costs in virtualized systems is the scope of this paper. In this paper, we present a comprehensive empirical analysis of different combinations of NVMe-SSD virtualization techniques and data transfer mechanisms between NVMe-SSDs and GPUs. Further, the impact of different data transfer parameters and, root-cause analysis of the resulting performance in terms of data transfer throughput and CPU utilization for different combinations of techniques is presented. Based on the empirical analysis, we provide insights to address several bottlenecks related to different GPU data transfer techniques in different virtualization setups and motivate an alternate design by extending the VirtIO framework for efficient peer-to-peer data transfer.
ADM systems can be used to perform a task as inconsequential as recommending a song on Spotify, to making a decision that is instrumental to someone's life, such as determining their candidacy for college. If an a...
详细信息
ISBN:
(纸本)9781450394338
ADM systems can be used to perform a task as inconsequential as recommending a song on Spotify, to making a decision that is instrumental to someone's life, such as determining their candidacy for college. If an algorithm is trained on biased data, it can propagate prejudice. Thus, it is pertinent to find methods to decrease ADM bias. This paper presents a way to potentially mitigate ADM bias by teaching high school students a intersectional data analysis activity that incorporates the second pillar of the liberatory computing framework, critical consciousness. This activity is designed to enable high school students to understand the bias and history behind the college admission process, which allows students to develop a critical consciousness. Establishing a critical consciousness will diversify the computing field and the data incorporated into ADM systems by encouraging minoritized high school students to get a degree in computer science. The National Institute of Standards and Technology (NIST) suggests that diversifying the computing field has the potential to reduce bias in ADM systems. Thus, the activity is focused on students developing a critical consciousness. This paper discusses the preliminary findings from teaching a two-day computing activity to high school students.
Estimating safe upper bounds on task execution times is required in the design of predictable real-time systems. When multi-core, instruction pipeline, branch prediction, or cache memory are in place, due to the consi...
详细信息
ISBN:
(纸本)9781665443111
Estimating safe upper bounds on task execution times is required in the design of predictable real-time systems. When multi-core, instruction pipeline, branch prediction, or cache memory are in place, due to the considerable complexity static timing analysis faces, measurement-based timing analysis (MBTA) is a more tractable option. MBTA estimates upper bounds on execution times using data measured under the execution of representative scenarios. In this context, it is paramount understanding not only how the task execution time is affected during its execution but also what kind of interference the task is sensitive to. Events such as cache misses or pipeline stalls, for example, may lead to large variability in task execution times. Based on the fact that current platforms offer Performance Monitoring Units (PMUs) capable of counting hardware-level event occurrences, in this paper, we focus on the problem of selecting the events that have the most impact on task execution with the goal of enriching the collected information to better support MBTA. Unfortunately, PMU usually have a limited number of monitoring registers, making them unable to monitor all events at once. Our approach describes how to carry out the events selection even under this limitation. Results from our experiments, considering 15 different programs running on a Raspberry Pi, indicate that five selected events can explain the execution behavior of the programs with reasonable accuracy.
On High Performance computing (HPC) systems, where multiple concurrent workloads may read and write vast amounts of data stored through a shared network on storage servers, competition for I/O resources between worklo...
ISBN:
(纸本)9798350355543
On High Performance computing (HPC) systems, where multiple concurrent workloads may read and write vast amounts of data stored through a shared network on storage servers, competition for I/O resources between workloads is inevitable. Previous work has thoroughly recognized the impact of such competition-introduced resource contention, highlighting its potential to impact the performance of individual applications significantly. However, no prior work on such an issue has investigated the quantitative impact of inter-application I/O contention on individual applications, impeding a more efficient resource provision strategy. In this work, we first exemplify the dynamics of I/O interference towards I/O patterns and system status. We then propose a framework for collecting fine-grained I/O traces from applications and concurrent server-side metrics and train a machine learning model to accurately predict the existence of I/O interference and its quantitative impacts. Our results show that it is feasible to learn the complex factors and relationships which cause applications to underperform in the presence of I/O interference. Additionally, we show that a trained model can accurately predict the impact of I/O interference on HPC applications with F1 scores exceeding 90% for both synthetic benchmarks and real-world applications.
Expertise-centric citizen science games (ECCSGs) can be powerful tools for crowdsourcing scientific knowledge production. However, to be effective these games must train their players on how to become experts, which i...
详细信息
ISBN:
(纸本)9781450391573
Expertise-centric citizen science games (ECCSGs) can be powerful tools for crowdsourcing scientific knowledge production. However, to be effective these games must train their players on how to become experts, which is difficult in practice. In this study, we investigated the path to expertise and the barriers involved by interviewing players of three ECCSGs: Foldit, Eterna, and Eyewire. We then applied reflexive thematic analysis to generate themes of their experiences and produce a model of expertise and its barriers. We found expertise is constructed through a cycle of exploratory and social learning but prevented by instructional design issues. Moreover, exploration is slowed by a lack of polish to the game artifact, and social learning is disrupted by a lack of clear communication. Based on our analysis we make several recommendations for CSG developers, including: collaborating with professionals of required skill sets;providing social features and feedback systems;and improving scientific communication.
A new mobile computing paradigm, dubbed mini-app, has been growing rapidly over the past few years since being introduced by WeChat in 2017. In this paradigm, a host app allows its end-users to install and run mini-ap...
详细信息
Scenario: The particulate matter (PM) is associated with all particles (solid and liquid) suspended in the air. Depending on the kind and size of the particle, each one represents different kinds of risks for human he...
详细信息
Scenario: The particulate matter (PM) is associated with all particles (solid and liquid) suspended in the air. Depending on the kind and size of the particle, each one represents different kinds of risks for human health. The emerging of tiny, available, and accessible devices related to the Internet of Things (IoT) has allowed the implementation of different monitoring strategies. Objective: To identify and characterize the IoT-based real-time monitoring strategies that have implemented a measurement process to study the effect of the PM on human health. Methodology: A wide analysis based on the systematic mapping study was performed on September 4, 2020. The Association for computing Machinery (acm), IEEE, ScienceDirect, SpringerLink, Scopus, and Wiley databases were considered in the exploration. Results: 48 articles addressing the IoT-based PM measurement were obtained, falling them between 2010 and 2020 with growing interest. The main use of this technology is related to increase the coverage and density of environmental monitoring stations due to the impact of PM on human health. Also, approaches to monitoring air quality and their potential effects on people's affections are described. Conclusions: Collaborative, people-aware, global proposals tend to get increasing interest. Only six (12.5%) articles incorporated some recommendation system based on PM measures. The accuracy and precision are the main concern around low-cost sensors for measuring PM. Thus, the calibration process is highlighted in 64.44% of articles. The main challenges reside in a combination of uncertainties in PM measurement, health impacts, data quality, and the influence of environmental variables on all of them.
The article presents theoretical and methodological approaches to the formation of an ensemble of intelligent measurements for the tasks of metrological analysis and synthesis using virtual measuring circuits. A compa...
详细信息
暂无评论