Celestial objects are known to be change in brightness over time, driven by a diverse combination of physical processes, whose time scales range from sub-milliseconds to billions of years. Stingray is an open-source P...
详细信息
ISBN:
(数字)9798331524937
ISBN:
(纸本)9798331524944
Celestial objects are known to be change in brightness over time, driven by a diverse combination of physical processes, whose time scales range from sub-milliseconds to billions of years. Stingray is an open-source Python package that brings advanced time series analysis techniques to the astronomical community, with a focus on high-energy astrophysics, but built on top of general-purpose classes and methods that are designed to be easily adapted and extended to other use cases. We describe the work being done to adapt Stingray to the analysis of large data archives. In particular, we measure the performance and scalability of Stingray and use parallel computing to speed up selected parts of the code.
Proof of Data Possession is a technique for ensuring the integrity of data stored in cloud storage. However, most audit schemes assume only one role for data owners, which is not suitable for complex Smart Healthcare ...
详细信息
paralleldistributedapplications running on large-scale high-performance computing systems depend on effective point-to-point and collective communication to meet performance goals. Beginning with version 4.0, the Me...
详细信息
ISBN:
(数字)9798331524937
ISBN:
(纸本)9798331524944
paralleldistributedapplications running on large-scale high-performance computing systems depend on effective point-to-point and collective communication to meet performance goals. Beginning with version 4.0, the Message Passing Interface (MPI) introduced the partitioned communication API, providing tools for addressing communication bottlenecks raised by hybrid communication models. This API allows individual actors (CPU threads, GPU threads, etc.) to initiate communication on portions of complete buffers, enabling additional communication/computation overlap. Intuitively, the utility of partitioned communication could benefit from network-level support: If there are multiple paths between endpoints, an MPI-aware network could disperse partitions across these paths, avoiding the data serialization entailed by a dependency on a single path. The Cerio Rockport Ethernet Fabric has the ability to expose this capability to communication middleware. In this work we develop this capability to allow for user-level path selection for MPI partitioned communication and explore how this capability impacts point-to-point performance, collective design, and Allreduce efficiency in a Large Language Model task
Deep learning applications have become crucially important for the analysis and prediction of massive volumes of data. However, these applications impose substantial input/output (I/O) loads on computing systems. Spec...
详细信息
In the Cloud-Edge Continuum, dynamic infrastructure change and variable workloads complicate efficient resource management. Centralized methods can struggle to adapt, whilst purely decentralized policies lack global o...
详细信息
ISBN:
(数字)9798331524937
ISBN:
(纸本)9798331524944
In the Cloud-Edge Continuum, dynamic infrastructure change and variable workloads complicate efficient resource management. Centralized methods can struggle to adapt, whilst purely decentralized policies lack global oversight. This paper proposes a hybrid framework using Graph Neural Network (GNN) embeddings and collaborative multi-agent reinforcement learning (MARL). Local agents handle neighbourhood-level decisions, and a global orchestrator coordinates system-wide. This work contributes to decentralized application placement strategies with centralized oversight, GNN integration and collaborative MARL for efficient, adaptive and scalable resource management.
Alphanumeric sign language recognition plays a vital role in enabling effective communication between individuals who are deaf and the general population. This research focuses on classifying alphanumeric hand gesture...
详细信息
Radiomics encompasses a variety of image attributes, including intensity, texture, shape, and spatial relationships among pixels. In this study, we applied four feature selection techniques: Principal Component Analys...
详细信息
Early anomaly detection in automotive systems is crucial for enhancing user safety and enabling timely corrective actions, thereby minimizing the risks associated with system malfunctions. This paper presents an appro...
详细信息
ISBN:
(数字)9798331524937
ISBN:
(纸本)9798331524944
Early anomaly detection in automotive systems is crucial for enhancing user safety and enabling timely corrective actions, thereby minimizing the risks associated with system malfunctions. This paper presents an approach for implementing Artificial Intelligence (AI)-based algorithms for anomaly detection in the automotive domain, leveraging the RISC-V architecture in conjunction with Domain-Specific Accelerators (DSAs). By exploiting the efficiency of DSAs, the proposed system aims to achieve faster anomaly detection compared to traditional processing methods. A detailed comparison is conducted between the performance of executing the AI-based anomaly detection algorithm on the RISC-V core versus offloading it to an optimized hardware accelerator tailored to the specific AI model. The goal of this work is to provide valuable insights into the potential of RISCV and DSAs to enhance AI-driven safety mechanisms, contributing to the development of more reliable automotive systems.
Network-based applications rely on the underlying network infrastructure to reliably forward packets between nodes. The way packets are forwarded has a significant impact on service quality. Therefore, it is important...
详细信息
ISBN:
(数字)9798331524937
ISBN:
(纸本)9798331524944
Network-based applications rely on the underlying network infrastructure to reliably forward packets between nodes. The way packets are forwarded has a significant impact on service quality. Therefore, it is important to gain a better understanding of data packet routes. To obtain detailed information about network paths, continuous and long-term packet analysis is required. To achieve this, we present our open-source framework HiPerConTracer 3.0 for large-scale IP trace analysis. It performs Ping and Traceroute measurements to provide detailed insights into packet routes and packet timing by tracing routes between senders and receivers in public and private networks. Particularly, it runs its own measurements, without need to obtain data, or cooperation from, the underlying network service providers or remote server owners. Our tool supports large-scale data collection, storage, and post-processing stages. It supports easy-to-understand route visualization, round-trip time measurements, and hop counts. A proof-of-concept analysis revealed that packet route lengths can change drastically when traveling through unexpected countries, regions, and network operators.
Graphics processing Units (GPUs) have become the standard in accelerating scientific applications on heterogeneous systems. However, as GPUs are getting faster, one potential performance bottleneck with GPU-accelerate...
详细信息
ISBN:
(数字)9798331524937
ISBN:
(纸本)9798331524944
Graphics processing Units (GPUs) have become the standard in accelerating scientific applications on heterogeneous systems. However, as GPUs are getting faster, one potential performance bottleneck with GPU-accelerated applications is the overhead from launching several fine-grained kernels. CUDA Graph addresses these performance challenges by enabling a graph-based execution model that captures operations as nodes and dependence as edges in a static graph. Thereby consolidating several kernel launches into one graph launch. We propose a performance optimization strategy for iteratively launched kernels. By grouping kernel launches into iteration batches and then unrolling these batches into a CUDA Graph, iterative applications can benefit from CUDA Graph for performance boosting. We analyze the performance gain and overhead from this approach by designing a skeleton application. The skeleton application also serves as a generalized example of converting an iterative solver to CUDA Graph, and for deriving a performance model. Using the skeleton application, we show that when unrolling iteration batches for a given platform, there is an optimal size of the iteration batch, which is independent of workload, balancing the extra overhead from graph creation with the performance gain of the graph execution. Depending on workload, we show that the optimal iteration batch size gives more than $1.4 \times$ speed-up in the skeleton application. Furthermore, we show that similar speed-up can be gained in Hotspot and Hotspot3D from the Rodinia benchmark suite and a Finite-Difference Time-Domain (FDTD) Maxwell solver.
暂无评论