We investigated the computational capabilities of FABRIC, a nationwide research infrastructure with nearly 40 sites, for scaling neuroscience simulations. From the hardware standpoint, single-site characterization sho...
详细信息
ISBN:
(纸本)9798350355543
We investigated the computational capabilities of FABRIC, a nationwide research infrastructure with nearly 40 sites, for scaling neuroscience simulations. From the hardware standpoint, single-site characterization showed that FABRIC is a promising alternative to conventional neuroscience setups, particularly due to the availability of powerful graphics processing units (GPUs). While multi-site simulations are affected by network latency, it becomes less critical for larger networks. From the software perspective, we found that in the popular CoreNEURON library, cell distribution strategy (for parallel execution) does not affect the simulation time for biologically realistic networks, while other cases can be addressed with a minimum k-cut graph partitioning algorithm. Overall, scalability experiments revealed that FABRIC can be used to simulate networks of up to twenty-five thousand cells, with the limiting issue being GPU memory.
Determinacy races are concurrent programming hazards occurring when two accesses on the same memory address are not ordered, and at least one is writing. Their presence hints at a correctness error, particularly under...
详细信息
ISBN:
(纸本)9798350355543
Determinacy races are concurrent programming hazards occurring when two accesses on the same memory address are not ordered, and at least one is writing. Their presence hints at a correctness error, particularly under asynchronous task-based parallel programming models. This paper introduces Taskgrind: a Valgrind tool for memory access analysis of parallel programming models such as Cilk or OpenMP. We illustrate the tool's capabilities with a determinacy-race analysis and confront it with state-of-the-art tools. Results show fewer false negatives and memory overheads on a set of microbenchmarks and LULESH, with meaningful error reports toward assisting programmers when parallelizing programs.
Cluster computing Systems (CCS) is a type of technology that not only causes computing power improvement but also utilizes energy to a lesser degree by taking advantage of parallel programming while processing and rea...
详细信息
ISBN:
(数字)9798331596651
ISBN:
(纸本)9798331596668
Cluster computing Systems (CCS) is a type of technology that not only causes computing power improvement but also utilizes energy to a lesser degree by taking advantage of parallel programming while processing and reading massive amounts of data. We can have multiple Central Processing Units (CPUs) and storage devices (disks) where the massive size of data can be processed. However, Cluster computing System also comes with its own set of challenges such as if for a reason the node stops operating, nodes stops communicating with each other and the data transfer doesn’t happen due to poor network which can lead to bottleneck while processing massive amounts of data. To overcome these issues, a well reputed tech giant known as Google, came up with a solution known as MapReduce. MapReduce is a framework designed for Big Data which takes care of processing large amounts of data over various servers. In this paper, we outline how CCS works and the challenges it faces today in the age of massive data. The introduction to some well received measures of Big Data are presented by us in this paper. These solutions show us the way we can address the issues we face in CSS. The primary goal of this writing is to look into the issues that we might face and the most efficient ways to resolve it in CSS.
Celestial objects are known to be change in brightness over time, driven by a diverse combination of physical processes, whose time scales range from sub-milliseconds to billions of years. Stingray is an open-source P...
详细信息
ISBN:
(数字)9798331524937
ISBN:
(纸本)9798331524944
Celestial objects are known to be change in brightness over time, driven by a diverse combination of physical processes, whose time scales range from sub-milliseconds to billions of years. Stingray is an open-source Python package that brings advanced time series analysis techniques to the astronomical community, with a focus on high-energy astrophysics, but built on top of general-purpose classes and methods that are designed to be easily adapted and extended to other use cases. We describe the work being done to adapt Stingray to the analysis of large data archives. In particular, we measure the performance and scalability of Stingray and use parallelcomputing to speed up selected parts of the code.
As the importance of concurrent and multithreaded programming continues to grow, many universities have incorporated these concepts into their introductory courses. Sonic Pi, a programming language designed for music ...
详细信息
ISBN:
(数字)9798331524937
ISBN:
(纸本)9798331524944
As the importance of concurrent and multithreaded programming continues to grow, many universities have incorporated these concepts into their introductory courses. Sonic Pi, a programming language designed for music creation, provides valuable support for exploring concurrency due to its simplified multithreading abstractions and its domain-specific nature. In this paper, we outline several teaching experiments aimed at undergraduate computer science students, using an interdisciplinary pedagogical approach that introduces concurrency early using Sonic Pi. The activities consist of code comprehension and code composition tasks in a collaborative learning environment. Our primary research goal is to explore and discuss students’ misconceptions about concurrency, and then draw some preliminary considerations and connections to analogous misconceptions in traditional concurrent programming languages.
The rise of cloud computing has transformed application development and deployment, with Kubernetes emerging as a key platform for managing containerized applications. This paper explores the use of Graph Neural Netwo...
详细信息
ISBN:
(数字)9798331524937
ISBN:
(纸本)9798331524944
The rise of cloud computing has transformed application development and deployment, with Kubernetes emerging as a key platform for managing containerized applications. This paper explores the use of Graph Neural networks (GNNs) to detect anomalies in Kubernetes clusters and proposes GNN-based Anomaly Detection in Kubernetes (GATAKU), which is instrumental in maintaining security and performance. Our work involves integrating Cilium for detailed network monitoring and data collection, setting up a Kubernetes cluster with k3s and Traefik, and simulating attack scenarios to generate realistic data. Data preprocessing and feature engineering prepare this data for the GNN training. We present the GATAKU model’s performance, highlighting metrics such as accuracy, precision, recall, and F 1 -score and compare it to baseline ML models, namely Support Vector Machine (SVM) and Random Forest (RF). Moreover, we discuss these findings and emerging challenges, including handling high-dimensional data, and explore practical implications for cybersecurity.
The Barnes-Hut approximation for N-body simulations reduces the time complexity of the naive all-pairs approach from O(N2) to O(N log N) by hierarchically aggregating nearby particles into single entities using a tree...
ISBN:
(纸本)9798350355543
The Barnes-Hut approximation for N-body simulations reduces the time complexity of the naive all-pairs approach from O(N2) to O(N log N) by hierarchically aggregating nearby particles into single entities using a tree data structure. This inherently irregular algorithm poses substantial challenges for performance portable implementations on multi-core CPUs and GPUs. We introduce two portable fully-parallel Barnes-Hut implementation strategies that trade-off different levels of GPU support for performance: an unbalanced concurrent octree, and a balanced bounding volume hierarchy sorted by a Hilbert spacefilling curve. We implement these algorithms in portable ISO C++ using parallel algorithms and concurrency primitives like atomics. The results demonstrate competitive performance on a range of CPUs and GPUs. Additionally, they highlight the effectiveness of the par execution policy for highly concurrent irregular algorithms, outperforming par_unseq on CPUs and GPUs with Independent Thread Scheduling.
This paper describes how we use n-body simulations as an interesting and visually compelling way to teach efficient, parallel, and distributed programming. Our first course focuses on bachelor students introducing the...
详细信息
ISBN:
(纸本)9798350355543
This paper describes how we use n-body simulations as an interesting and visually compelling way to teach efficient, parallel, and distributed programming. Our first course focuses on bachelor students introducing them to algorithmic complexities and their implications in real-world problems and state-of-the-art tools like Git, remote development, or C++, by simulating the collision of two galaxies. This project teaches the mapping of mathematical functions into code, efficient implementations, and the pitfalls around complexity and scaling. Our second course targets master students introducing them to intra- and inter-node parallelization. Here, the students simulate our solar system using OpenMP and MPI. The master students further deepen their knowledge of parallelization and scientific computing by choosing custom projects like simulating water molecules or implementing an interactive live visualization using *** courses are structured such that they can easily be adapted by other instructors. All our material is publicly available at https://***/orgs/SCTeaching-NBody.
The RISC-V instruction set architecture has become increasingly popular due to its open source and extensible design, making it a competitive choice in high-performance computing and embedded systems. The RISC-V Vecto...
详细信息
ISBN:
(数字)9798331524937
ISBN:
(纸本)9798331524944
The RISC-V instruction set architecture has become increasingly popular due to its open source and extensible design, making it a competitive choice in high-performance computing and embedded systems. The RISC-V Vector extension (RVV) empowers RISC-V processors with length-agnostic vectorization capabilities, a critical feature for efficiently handling parallel processing demands across different hardware. Compiler support for autovectorization allows to generate vector instructions automatically without requiring any effort to programmers. Given the limited yet evolving compiler support for RVV, this paper offers an in-depth examination of autovectorization capabilities in GCC and LLVM, for RVV version 0.7 and 1.0. We evaluated the autovectorization performance of LLVM, LLVM-EPI and GCC compilers across 151 loops from the Test Suite for Vectorizing Compilers (TSVC) ans seven real-world applications on the AllWinner D1 and BananaPi-F3 boards, representing RISC-V vector hardware. Our study focuses on quantifying and comparing the level of vectorization each compiler achieves across a diverse range of vectorization patterns and workloads, providing insight into their strengths and limitations with respect to RISC-V RVV. Our findings highlight that the LLVM-19 compiler outperforms GCC-14 in 76 out of 151 loops, and its performance is more sensitive to the selection of vector length. Additionally, tuning the vector Length Multiplier (LMUL) parameter can lead to performance improvements of up to $3 x$, and leveraging knowledge of the vector length can further enhance LMUL optimization in compilers.
In many fields, such as healthcare, finance, and scientific research, data sharing and collaboration are critical to achieving better outcomes. However, the sharing of personal data often involves privacy risks, so pr...
详细信息
暂无评论